Skrining Thalassemia Berbasis Data Hematologis Menggunakan Model Random Forest dan Support Vector Machine
Mifta Mardiah, Prof. Dr. Niken Satuti Nur Handayani, M.Sc.; Prof Dr. Drs. Azhari, M.T.
2026 | Tesis | S2 Biologi
Thalassemia is a genetic disorder caused by impaired production of globin chains that form hemoglobin and remains a global public health challenge due to its impact on morbidity and mortality. Thalassemia screening consists of three levels, in which examination up to Level III is required when hematological data do not provide any indication of thalassemia status. The presence of Artificial Intelligence (AI)—particularly Random Forest and Support Vector Machine—offers an alternative approach to accelerate thalassemia screening using hematological data (Level I). This study aims to apply AI models in analyzing hematological data for thalassemia screening. Data preprocessing was performed before clustering and classification. K-means and Gaussian Mixture Model (GMM) were used as clustering methods to find pattern and detect data anomalies. Model training and testing used Random Forest and Support Vector Machine. The model was evaluated through the result of confusion matrix. The clustering result using k-means produced six clusters according to the number of categories, with normal and ?-thalassemia data clearly separated. Several subsets of ?-thalassemia, HbE, and other categories exhibited overlap, indicating similarities in hematological characteristics. GMM clustering achieved an accuracy of 83.72%. In the classification models, Random Forest achieved the highest accuracy of 97%, while the SVM model, the RBF kernel produced the highest accuracy of 87%. On the other hand, SVM-Linear provided the best performance (55,81%). Furthermore, only Random Forest and SVM-Linear were able to recognize the ?-thalassemia category, indicating differences in sensitivity across models. These results indicate that machine learning integration can improve screening efficiency by accelerating the thalassemia examination process, which generally requires further examination.
Kata Kunci : Thalassemia, Random Forest, Support Vector Machine, Data Hematologis, Skrining