Perbandingan Seleksi Fitur Information Gain dan Query Expansion Ranking untuk Analisis Sentimen pada Data Twitter Menggunakan Metode Naive Bayes (Studi Kasus: Pembelajaran Tatap Muka)
ALDI TRI MARGIYONO, Aina Musdholifah, S.Kom., M.Kom., Ph.D; Diyah Utami Kusumaning Putri, S.Kom., M.Sc., M.Cs.
2022 | Skripsi | S1 ILMU KOMPUTERPembelajaran Tatap Muka (PTM) kembali diberlakukan setelah setahun lebih melaksanakan Pembelajaran Jarak Jauh (PJJ) melalui dunia maya sebagai upaya mencegah penularan virus COVID-19 (Coronavirus Disease 2019). Pemberlakuan PTM saat pandemi COVID-19 belum benar-benar berakhir ini pun tak lepas dari komentar masyarakat yang disampaikan melalui berbagai media sosial, salah satunya adalah Twitter. Komentar-komentar di Twitter (tweet) tersebut dapat digunakan untuk mengetahui pandangan masyarakat terhadap kebijakan PTM dengan melakukan analisis sentimen. Analisis sentimen sering dilakukan dengan pendekatan machine learning seperti naive bayes, tetapi memiliki masalah dengan besarnya dimensi fitur, sehingga perlu dilakukan seleksi fitur. Beberapa contoh metode seleksi fitur adalah information gain dan query expansion ranking. Penelitian ini melakukan perbandingan metode seleksi fitur information gain dan query expansion ranking dengan rasio 25%, 50%, dan 75% berdasarkan performa yang dihasilkan dalam analisis sentimen dengan metode naive bayes untuk data Twitter berbahasa Indonesia. Setelah penelitian ini dilakukan, diperoleh kesimpulan bahwa secara keseluruhan performa model naive bayes yang menggunakan seleksi fitur query expansion ranking lebih baik daripada information gain dengan performa terbaik dicapai saat menggunakan rasio 75%. Pada rasio tersebut, model dengan query expansion ranking mendapat nilai rata-rata akurasi sebesar 81,70%, presisi 87,10%, recall 75,73%, dan f1-measure 80,38%, sedangkan model dengan information gain mendapat nilai rata-rata akurasi sebesar 81,64%, presisi 87,93%, recall 73,94%, dan f1-measure 79,84%.
Offline learning (Pembelajaran Tatap Muka/PTM) was again implemented after more than a year the online learning (Pembelajaran Jarak Jauh/PJJ) was implemented through cyberspace as an effort to prevent the transmission of the COVID-19 (Coronavirus Disease 2019) virus. The implementation of PTM when the COVID-19 pandemic has not really ended can not be separated from public comments submitted through various social media, one of which is Twitter. The comments on Twitter (tweet) can be used to find out the public's views on PTM policies by conducting sentiment analysis. Sentiment analysis is often done with a machine learning approach such as naive bayes, but has problems with the size of the feature dimensions, so feature selection is necessary. Some examples of feature selection methods are information gain and query expansion ranking. This research compares the information gain and query expansion ranking feature selection method with ratio of 25%, 50%, and 75% based on the performance generated in sentiment analysis using the naive bayes method for Indonesian-language Twitter data. After this research is done, it was concluded that the overall performance of the naive bayes model using the query expansion ranking feature selection was better than information gain with the best performance achieved when using ratio of 75%. At this ratio, the model with the query expansion ranking got an average accuracy score of 81.70%, precision 87.10%, recall 75.73%, and f1-measure 80.38%, while the model with information gain got an average accuracy score of 81.64%, precision 87.93%, recall 73.94%, and f1-measure 79.84%.
Kata Kunci : analisis sentimen, information gain, naive bayes, Pembelajaran Tatap Muka, query expansion ranking, Twitter