Laporkan Masalah

Kombinasi Base-Level dan High-Level Learner yang Optimal Dalam Metode Stacking Ensemble Learning untuk Deteksi Stance pada Fake News Challenge Stage-1

MUKTA HIKAM, Dr. Yohanes Suyanto, M.I.Kom.

2022 | Tesis | MAGISTER ILMU KOMPUTER

Deteksi stance merupakan proses otomasi untuk menentukan posisi suatu subjek dengan subjek lainnya berdasarkan isi kontennya. Penelitian ini merupakan upaya untuk memperbaiki keterbatasan yang ada pada penelitian sebelumnya. Penelitian sebelumnya melakukan deteksi stance pada dataset Fake News Challenge Stage-1 (FNC-1) menggunakan metode Stacking Ensemble Learning dengan 5 base-level learner dan metode Gradient Boosted Decision Tree sebagai high-level learner-nya. Performa model yang dihasilkan tidak baik karena terjadi overfitting pada high-level learner akibat imbalanced dataset dan rentan terhadap noise pada hasil prediksi base-level learner-nya. Solusi yang ditawarkan pada penelitian ini adalah menggunakan kombinasi-kombinasi base-level dan high-level learner yang berbeda dari penelitian sebelumnya. Kombinasi-Kombinasi base-level dan high-level learner yang diuji dalam penelitian ini, yaitu: MLP 1 hidden layer, Random Forest, dan Logistic Regression. Model hasil kombinasi Stacking ini kemudian diuji dan dievaluasi menggunakan 3 matriks scoring (accuracy, macro-F1, dan FNC-1 score). Kombinasi base-level Logistic Regression dan Random Forest dengan high-level Logistic Regression merupakan model Stacking yang memiliki performa paling optimal di antara 12 skenario kombinasi yang diuji pada penelitian ini. Model ini memiliki score accuracy 87,88%, macro-F1 55,35%, dan FNC-1 80,15%. Model ini memiliki performa paling optimal karena kombinasi base-level learner Logistic Regression dan Random Forest menghasilkan keseimbangan antara jumlah noise yang dihasilkan dan potensi maksimal score dari model Stacking yang dapat dihasilkan menggunakan kombinasi base-level learner ini.

Content-based stance detection is an automated method for determining the relationship between two subjects. This research sought to improve the limitations existing in earlier research. An earlier study performed stance detection on the dataset from the Fake News Challenge Stage-1 (FNC-1) using the Stacking Ensemble Learning method with 5 (five) base-level learners and the Gradient Boosted Decision Tree method with the high-level learner. The resulting model's performance was unsuitable due to the overfitting of the high-level learners since imbalanced datasets and vulnerability to noise in the prediction results of the base-level learners. The solution offered in this study was using different combinations of base-level and high-level learners from past research. The combinations of base-level and high-level learners examined in this study: MLP 1 hidden layer, Random Forest, and Logistic Regression. The stacking combination model was tested and evaluated using 3 (three) scoring matrices (accuracy, macro-F1, and FNC-1 score). Among the 12 investigated combination scenarios, the base-level Logistic Regression and Random Forest with high-level Logistic Regression is the Stacking model with the best performance. The scoring accuracy of this model was 87,88%, macro-F1 was 55,35%, and FNC-1 was 80,15%. This model had the best performance since the combination of base-level learners Logistic Regression, and Random Forest achieved a balance between the quantity of noise produced and the maximum possible score of the Stacking model that can be developed using this combination of base-level learners.

Kata Kunci : Deteksi Stance, Stance, FNC-1, Stacking, Ensemble Learning, MLP, Random Forest, Logistic Regression, Overfitting, Noise