Congestion or traffic jam is one of the main problems in cities around the world. Motorized vehicles are easier to obtain while roads are increasingly difficult to widen. The drivers also do not know which is the best road they should choose because traffic is difficult to predict. One solution is to use CCTV to monitor, but this solution requires considerable human resources and costs. For this reason, Dwi Aji Kurniawan, a UGM student, along with two of his supervisors, tried to solve this congestion problem by utilizing one of the social media, namely Twitter. This research initially became the topic of the student’s thesis, but also made it into the ICITEE ( International Conference on Information Technology and Electrical Engineering) 2016. Even so, why choose to use Twitter?
Twitter has more than 320 million active monthly users with 500 million tweets every day. Twitter also detects events faster than traditional media. Actually, there have been several studies that use Twitter to monitor traffic conditions. However, the research is more focused on events such as accidents and road repairs and has not worked in real time. This research was also carried out to develop a traffic monitoring system in real time .  How?
Figure 1: Flow Chart for the Process 
Figure 1 shows a flow chart showing the process. In general, data from tweets will be processed first. After that, the classification process is carried out to get a tweet that is related to the conditions on the road. To do the classification process, we need a classifier program . The classifier program needs to be made through the process in the left flow chart. After the classification process is done, the tweets that have been selected are then processed to be displayed on a website.  The following is a further explanation of the process.
Processing before Classification
The first step is to collect tweets using Twitter’s Streaming API. API ( Application Programming Interface ) is a collection of instructions that allows us to access and utilize data from a program such as Twitter. Meanwhile, Twitter’s Streaming API allows us to update data from tweets in real time. So, we can follow the latest updates about tweets with certain features or from certain accounts. 
After getting the desired tweet , the second step is preprocessing. This step is done to clear tweets from things that are not needed for the next process, such as username, web address, characters other than letters / numbers, etc. 
The next step is feature selection. One of the things done in this step is to look for the words about the road conditions that most appear in the tweet . The most frequent words that appear later are used to train the classifier program  so that the program can recognize and be able to choose words related to road conditions. How does the classifier program work? How can the program distinguish between tweets related to road conditions from other tweets ?
Algorithm for Classification
The trick is to use several machine learning algorithms . An algorithm is basically a rule or steps. Machine learning algorithm means the steps used so that the machine , which is a program, can learn on its own. We don’t need to make complicated programs because the program will learn on its own. Even so, we need to give the program some data so the program will learn from the data we provide. There are three algorithms to be used for classification: Naïve Bayes (NB) , Support Vector Machine (SVM) , and Decision Tree (DT)  .
Figure 2: Naïve Bayes Algorithm 
The NB algorithm calculates the probability of a data entering a particular group as illustrated in Figure 2. Initially, there are two categories of data: red and blue. The existing data is the result of the training we provide. When new data comes in, the green dot; NB algorithm will calculate the chance of new data entering the red group and the blue group. Then the odds are compared and the biggest is chosen. 
Figure 3: Support Vector Machine Algorithm 
The SVM algorithm basically aims to maximize the hyperplane limit . Hyperplane in this context is a separator that can be any shape and can be of any dimension. Figure 3a shows that there are several hyperplane options for separating data. The existing data is the result of the training we provide. SVM algorithm is also performed to get the hyperplane with the maximum margin as Figure 3b.  By determining the correct hyperplane , when new data comes in, the program can classify with better accuracy. If connected with Figure 3b; when there is new data coming in, that data will go into one of the groups. Can enter the above grouphyperplane, or to the group below it.
Figure 4: Decision Tree Algorithm [ http://www.lewisgavin.co.uk/Machine-Learning-Decision-Tree/ ]
The DT algorithm , as the name suggests, uses a decision tree to classify. Top point is root / root , then branching, and the end is leaf / leaves. Each branch describes the decision rules while the leaves describe the final decision obtained.  When new data enters through the root, there are various possible paths that can be passed to reach one of the leaves. The choice of path is certainly related to the characteristics of the incoming data, also related to the existing if-then rules. In the end, the data will reach a leaf so we know which data belongs to which group.
Actually there are many other algorithms that can be used to do the classification. The three algorithms above were chosen because they tend to be easy or because they can classify tweets well. To work, the above algorithm needs to be implemented with a programming language or a special program for machine learning.
After discussing the algorithm used, we return to the discussion of tweets. When the tweet has been successfully classified, basically this Twitter-based traffic monitoring system has been successfully created. Twitter related to road conditions are displayed on a website so users can see the latest updates on road conditions.
For testing, used 110,449 tweets with 17,532 tweets being tweets related to road conditions. After testing, it was found that the accuracy of the three types of algorithms used was always above 99%. The SVM and DT algorithms have the highest accuracy while the NB algorithm has the fastest training time . That’s because the NB algorithm is indeed simpler than the other two. 
This research shows that social media like Twitter can be used for unexpected useful things. By utilizing Twitter data and using some machine learning algorithms , Twitter can be used to monitor traffic more easily and very cheaply.
 Kurniawan, DA 2016. Analysis of Twitter Social Network Data for Mapping Road Congestion Conditions in the Province of DIY with the Text Mining Method. Thesis. Not published. Gadjah Mada University: Yogyakarta.
 Kurniawan, DA, Wibirama, S. & Setiawan, NA, 2016. Real-time traffic classification with Twitter data mining. Yogyakarta, 2016 8th International Conference on Information Technology and Electrical Engineering (ICITEE). doi: 10.1109 / ICITEED.2016.7863251
 Anon., 2013. Bright Planet. [Online] Available at: https://brightplanet.com/2013/06/25/twitter-firehose-vs-twitter-api-whats-the-difference-and-why-should-you-care/
[Accessed 1 August 2019]
 Tiwari, S., 2019. Codershood . [Online] Available at: https://www.codershood.info/2019/01/14/naive-bayes-classifier-using-python-with-example/
[Accessed 1 August 2019].
 Pamungkas, A., and Matlab Programming. [Online] Available at: https : //programmingmatlab.com/data-mining-using-matlab/support-vector-machine-svm-using-matlab/
[Accessed 1 August 2019].
 Navlani, A., 2018. DataCamp. [Online] Available at: https : //www.datacamp.com/community/tutorials/decision-tree-classification-python
[Accessed 3 August 2019].
Latest posts by Warstek Media (see all)
- Disclosing the Risk of Cancer due to Pollution, Researchers Sentenced to Prison - 28 September 2019
- Like a Ninja, this medicine can eradicate tumors quietly - 26 September 2019
- Learn How to Identify Depression and Anxiety Levels through a Twitter Account - 26 September 2019