Laporkan Masalah

Metode Single Center Imputation from Multiple Chained Equations (SICE) untuk Penanganan Missing Value Data Nominal

Afrizal Meka Mulyana, Aina Musdholifah, S.Kom., M.Kom., Ph.D.

2025 | Tesis | MAGISTER KECERDASAN ARTIFISIAL

Missing value dalam atribut nominal menjadi masalah penting dalam analisis data karena dapat menurunkan kualitas informasi, memicu bias, dan mengurangi akurasi model prediktif. Berbeda dengan data numerik, data nominal tidak memiliki urutan, sehingga metode imputasi konvensional seperti K-Nearest Neighbors (KNN) dan Multivariate Imputation by Chained Equations (MICE) sering kali kurang optimal. Penelitian ini bertujuan mengevaluasi efektivitas metode Single Center Imputation from Multiple Chained Equations (SICE) dalam menangani missing value pada atribut nominal.

Eksperimen dilakukan pada dua dataset benchmark, Adult Income dan Hypothyroid. Masing-masing diproses melalui identifikasi missing value, one-hot encoding, dan imputasi menggunakan KNN, MICE, dan SICE. Hasilnya dievaluasi menggunakan pemodelan klasifikasi dengan Logistic Regression dan Random Forest. Kinerja model diukur menggunakan accuracy, precision, recall, F1-score, serta diuji secara statistik menggunakan paired t-test.

Hasil menunjukkan bahwa SICE unggul secara signifikan pada dataset Hypothyroid, dengan F1-score mencapai 74?n precision 73% pada model Logistic Regression. Kombinasinya dengan Random Forest juga mencatat akurasi tertinggi sebesar 85,01%. Uji paired t-test menunjukkan perbedaan performa yang signifikan (p < 0>SICE dan metode lain pada dataset Hypothyroid. Sebaliknya, pada dataset Adult Income tidak ditemukan perbedaan signifikan antar metode imputasi. Performa SICE bahkan lebih rendah dibanding KNN dan MICE, mengindikasikan bahwa metode sederhana lebih sesuai. Pola missing value pada Adult Income cenderung acak (MCAR), sedangkan pada Hypothyroid menunjukkan keterkaitan antar fitur (MAR), yang memungkinkan pendekatan prediktif seperti SICE menghasilkan imputasi lebih akurat. Dengan demikian, SICE layak diterapkan pada data nominal berpola MAR karena stabil dan efisien.



Missing values in nominal attributes present a major challenge in data analysis, as they can degrade information quality, introduce bias, and reduce predictive model accuracy. Unlike numerical data, nominal data lacks inherent order, making conventional imputation methods such as K-Nearest Neighbors (KNN) and Multivariate Imputation by Chained Equations (MICE) often suboptimal. This study aims to evaluate the effectiveness of the Single Center Imputation from Multiple Chained Equations (SICE) method in addressing missing values specifically in nominal attributes.

Experiments were conducted on two benchmark datasets: Adult Income and Hypothyroid. Each dataset underwent missing value identification, one-hot encoding, and imputation using KNN, MICE, and SICE. The imputed data was then evaluated through classification modeling using Logistic Regression and Random Forest. Model performance was assessed using accuracy, precision, recall, and F1-score, followed by statistical significance testing using paired t-tests.

The results show that SICE significantly outperformed the other methods on the Hypothyroid dataset, achieving an F1-score of 74% and a precision of 73% with Logistic Regression. Combined with Random Forest, it also achieved the highest accuracy of 85.01%. Paired t-tests confirmed that these performance differences were statistically significant (p < 0>

Kata Kunci : Missing Values, Imputasi Data, SICE, Data Nominal, Logistic Regression, Random Forest.

  1. S2-2025-511773-abstract.pdf  
  2. S2-2025-511773-bibliography.pdf  
  3. S2-2025-511773-tableofcontent.pdf  
  4. S2-2025-511773-title.pdf