Pengembangan Model BERT dan Back Translation untuk Analisa Sentimen Codemixed pada Data Twitter
Nisrina Hanifa Setiono, Yunita Sari, S.Kom., M.Sc., Ph.D
2025 | Tesis | S2 Ilmu Komputer
The low performance of the BERT model in sentiment analysis on informal code-mixed Twitter data is due to the use of slang, abbreviations, non-standard words, and inconsistencies in spelling and grammar, which do not match the characteristics of formal pretraining data. This mismatch makes it difficult for the model to accurately understand sentence contexts, resulting in low sentiment analysis accuracy. Addressing this challenge is important in the field of natural language processing (NLP), especially for the combination of Indonesian-English commonly used on social media.
This research aims to improve the performance of the BERT model in analyzing sentiment on code-mixed datasets by developing a combination of the BERT model and back translation techniques. This approach is specifically designed to overcome linguistic challenges in informal Indonesian-English code-mixed data, thereby enhancing the accuracy of sentiment analysis. The proposed method was applied to the INDONGLISH dataset consisting of 5,067 Twitter tweets labeled as positive, negative, or neutral sentiments.
The results show that applying back translation directly to tweet data produced better results by preserving the original meaning of the sentences, thereby increasing the performance of the BERTweet model from 0.7270 to 0.7457. Conversely, applying back translation after monolingual translation reduced the accuracy of the BERTweet model from 0.7270 to 0.7161. Repeated translation processes significantly altered sentence structure and context, resulting in mismatched sentiment labels. These findings indicate that additional translation steps can negatively impact the accuracy of sentiment analysis, particularly on code-mixed datasets that are highly sensitive to linguistic variations.
Kata Kunci : Analisis Sentimen, Code-Mixed, BERT, Pretrained Model, Back translation, Text Style Transfer