Pengembangan Aplikasi Deteksi Bot Pada Media Sosial Twitter Menggunakan Machine Learning dengan Algoritme Random Forest Classifier

AQILAH AINI ZAHRA

AQILAH AINI ZAHRA, Widyawan, S.T., M.Sc., Ph.D; Dr.Eng. Silmi Fauziati, S.T., M.T

2020 | Skripsi | S1 TEKNOLOGI INFORMASI

Abstrak
File Pdf

Bot Twitter adalah akun Twitter yang diprogram untuk melaksanakan aktivitas sosial secara otomatis berupa mengirimkan twit melalui program penjadwalan. Beberapa bot bertujuan untuk menyebarkan informasi bermanfaat, seperti informasi gempa dan cuaca. Namun demikian, tidak sedikit bot yang memiliki pengaruh negatif, seperti menyebarkan berita bohong, spam, atau menjadi pengikut bayaran untuk menaikkan popularitas suatu akun. Hal ini dapat mengubah sentimen publik terhadap suatu isu, menurunkan kepercayaan pengguna, atau bahkan merubah tatanan masyarakat. Oleh karena itu, diperlukan aplikasi untuk membedakan akun bot dan non-bot. Berdasarkan permasalahan tersebut, penelitian ini mengembangkan sistem deteksi bot dengan menggunakan machine learning untuk klasifikasi multiclass. Kelas tersebut antara lain kelas human, informative, spammer, dan fake followers. Pelatihan model menggunakan metode terbimbing berdasarkan data latih berlabel. Pertama, dataset sebesar 2.333 akun dilakukan pra-pengolahan hingga didapatkan 28 feature set untuk klasifikasi. Feature set ini berasal dari analisis profil pengguna, analisis temporal, dan analisis tweet dengan nilai numerik. Kemudian data dipartisi, dinormalisasikan dengan scaling, dan diimplementasikan algoritme Random Forest Classifier. Setelah itu, fitur diseleksi kembali menjadi 17 feature set untuk memperoleh akurasi paling tinggi yang dicapai oleh model. Pada tahap evaluasi, model deteksi bot menghasilkan akurasi sebesar 96,79%, presisi 97%, recall 96%, dan f-1 score 96%. Sehingga, model deteksi digolongan memiliki akurasi yang tinggi. Model deteksi bot yang selesai dibuat kemudian diimplementasikan ke dalam website dan dilakukan deployment ke cloud. Pada akhirnya, aplikasi web berbasis machine learning ini dapat diakses dan digunakan oleh publik untuk mendeteksi bot Twitter.

A Twitter bot is a programmed Twitter account which has the task to do an automatic social activity such as sending a tweet through the scheduling system. There are some bot accounts which have a positive influence, such as spreading weather news or earthquake alert. However, many bots have a negative influence, such as spreading fake news, spam, or become a fake follower to inflate the popularity of some accounts. This situation could change public sentiment toward an issue, degrading user trust, and even could change the social order. Therefore, an application that could discriminate between a bot account and a non-bot account is needed. Based on that problem statement, this research aims to develop a bot detection system utilizing machine learning for multiclass classification problems. These classes are human, informative, spammer, and fake followers. The model training used supervised learning based on labeled training data. First, the dataset which consists of 2.333 accounts will go through the preprocessing process. The output is 28 feature sets for classification. These features come from user profile analysis, temporal analysis, and tweet analysis in numerical value. After that, the data will be divided, normalized using the scaling technique and it will be trained using Random Forest Classifier algorithm. It is known that only 20 features with largest feature importance score which are needed to get highest accuracy. In the evaluation phase, the bot detection model produces 96,79% accuracy 97% of precision, 96% of recall, and 96% of the f1 score. Thus, the model is concluded as a model with high accuracy. The developed model then will be implemented into a website and will be deployed to the cloud platform. In the end, this web application based on machine learning can publicly accessible.

Kata Kunci : Deteksi Bot, Klasifikasi Multiclass, Machine Learning, Supervised Learning, Twitter

S1-2020-385380-abstract.pdf
S1-2020-385380-bibliography.pdf
S1-2020-385380-tableofcontent.pdf
S1-2020-385380-title.pdf

LAYANAN

E-Resources

Quick Access