Evaluasi IndoBERT sebagai Embedding Berbasis Konteks untuk Klasifikasi Emosi pada Teks Bahasa Indonesia

Alfa Natasya Limbong

Alfa Natasya Limbong, Dr. Sigit Priyanta, S.Si., M.Kom.

2025 | Tesis | MAGISTER KECERDASAN ARTIFISIAL

Abstrak
File Pdf

Dalam era digital, media sosial seperti Twitter menjadi wadah penting bagi masyarakat untuk mengekspresikan emosi. Namun, analisis emosi pada teks Bahasa Indonesia masih menghadapi tantangan karena keterbatasan embedding statis seperti Word2Vec dan FastText yang kurang mampu menangkap konteks emosional secara menyeluruh. Penelitian ini mengusulkan pemanfaatan IndoBERT sebagai model klasifikasi emosi berbasis contextual embedding pada teks Twitter berbahasa Indonesia. Dataset yang digunakan adalah Twitter Emotion Dataset dengan 4.401 tweet yang mencakup lima kategori emosi, melalui tahapan prapemrosesan (lowercase, konversi emotikon, normalisasi slang, pengubahan singkatan, serta penghapusan stopword) dan teknik augmentasi backtranslation. Model dievaluasi menggunakan akurasi, presisi, recall, dan F1-score, serta dibandingkan dengan IndoBERT Base dan embedding statis. Hasil menunjukkan bahwa IndoBERT Large secara konsisten melampaui IndoBERT Base, dengan performa terbaik diperoleh pada skenario prapemrosesan tanpa stopword dengan hyperparameter tuning, menghasilkan akurasi 79%, precision 81%, recall 79%, dan F1-score 80%. Kinerja ini lebih tinggi dibandingkan baseline IndoBERT Large tanpa tuning (akurasi 77%, F1-score 78%), serta jauh lebih unggul dibandingkan embedding statis FastText (F1-score 65,36 - 69,23%). Temuan ini menegaskan bahwa contextual embedding IndoBERT, terutama varian Large, lebih efektif dalam menangkap nuansa emosi pada teks Bahasa Indonesia dibandingkan pendekatan berbasis embedding statis, sekaligus menunjukkan pentingnya strategi preprocessing dan hyperparameter tuning dalam meningkatkan akurasi klasifikasi emosi.

In the digital era, social media platforms such as Twitter have become important spaces for people to express their emotions. However, emotion analysis in Indonesian texts still faces challenges due to the limitations of static embeddings such as Word2Vec and FastText, which are less capable of fully capturing emotional context. This study proposes the utilization of IndoBERT as an emotion classification model based on contextual embedding for Indonesian Twitter texts. The dataset used is the Twitter Emotion Dataset consisting of 4,401 tweets covering five emotion categories, processed through several preprocessing stages (lowercasing, emoticon conversion, slang normalization, abbreviation replacement, and stopword removal) along with the backtranslation augmentation technique. The model was evaluated using accuracy, precision, recall, and F1-score, and compared with IndoBERT Base as well as static embeddings. The results show that IndoBERT Large consistently outperforms IndoBERT Base, with the best performance achieved under the scenario of preprocessing without stopwords and hyperparameter tuning, yielding 79?curacy, 81% precision, 79% recall, and 80?-score. This performance surpasses the baseline IndoBERT Large without tuning (77?curacy and 78?-score) and is significantly superior compared to static embeddings such as FastText (65.36-69.23?-score). These findings emphasize that contextual embedding with IndoBERT, particularly the Large variant, is more effective in capturing emotional nuances in Indonesian texts compared to static embedding approaches, while also highlighting the importance of preprocessing strategies and hyperparameter tuning in improving the accuracy of emotion classification.

Kata Kunci : IndoBERT, Emotion Classification, Natural Language Processing (NLP), Contextual Embedding, Backtranslation, Preprocessing, Twitter.

S2-2025-530034-abstract.pdf
S2-2025-530034-bibliography.pdf
S2-2025-530034-tableofcontent.pdf
S2-2025-530034-title.pdf

LAYANAN

E-Resources

Quick Access