Laporkan Masalah

PREDIKSI KEMUNCULAN KEMBALI KANKER PARU-PARU BERDASARKAN DATA EKSPRESI GEN MICROARRAY MENGGUNAKAN HIDDEN NAIVE BAYES

AMIRAH FARAH FAWZIYYAH, Aina Musdholifah, Ph.D.

2016 | Skripsi | S1 ILMU KOMPUTER

Relapse, atau kemunculan kembali kanker setelah perawatan, adalah fenomena yang harus diwaspadai oleh penderita kanker. Jika bisa diperkirakan apakah pasien akan mengalami relapse atau tidak, dokter dapat merencanakan perawatan untuk pasien ke depannya. Salah satu cara untuk memprediksi relapse adalah dengan melakukan analisis ekpresi gen microarray. Dalam penelitian ini dilakukan 3 tahap utama yaitu diskritisasi, seleksi fitur, dan klasifikasi. Metode diskritisasi yang digunakan adalah Weighted Proportioned k-Interval Discretization. Sedangkan seleksi fitur digunakan untuk mengurangi fitur data asli yang masif menjadi lebih sedikit namun tetap dapat digunakan untuk klasifikasi. Terdapat 3 metode seleksi fitur yang digunakan, yaitu: Forward Selection, Correlation-based Feature Selection, dan Information Gain-based Feature Selection. Untuk klasifikasi, digunakan metode Hidden Naive Bayes dan Naive Bayes sebagai pembandingnya. Performa dari model dievaluasi berdasarkan accuracy, sensitivity, dan specificity menggunakan Stratified Cross Validation dengan k=5. Hasil terbaik Hidden Naive Bayes didapatkan dengan menggunakan menggunakan 70 fitur hasil dari teknik seleksi fitur Information Gain-based Feature Selection dengan accuracy 0.952, sensitivity 0.947, dan specificity 0.966 sedangkan hasil terbaik Naive Bayes didapatkan dengan menggunakan 70 fitur hasil dari teknik seleksi fitur Correlation-based Feature Selection dengan accuracy 0.955, sensitivity 0.958, dan specificity 0.947.

A cancer patient must be aware of a phenomenon called relapse. Relapse is condition where the cancer appeared again even after the treatment. If a relapse can be predicted before its appearance, it could help doctor to plan the next treatment for the patient beforehand. One way to predict the relapse is by doing microarray gene expression analysis. In this research, there were 3 main procedures: discretization, feature selection, and classification. The discretization aim was to change the numeric data into categorical. The discretization method used in this research was Weighted Proportioned k-Interval Discretization. The feature selection was used to lessen the massive features amount while keeping its quality for classification. There were 3 methods of feature selection used in this research. They were Forward Selection, Correlation-based Feature Selection, and Information Gain-based Feature Selection. For classification, Hidden Naive Bayes was used as the main algorithm and Naive Bayes was used to as a comparison. The performance of the model was evaluated by accuracy, sensitivity and specificity using Stratified Cross Validation. The best Hidden Naive Bayes performance was achieved using 70 features obtained from Information Gain-based Feature Selection with 0.952 accuracy, 0.947 sensitivity, and 0.966 specificity while the best Naive Bayes was achieved using 70 features obtained from Correlation-based Feature Selection with 0.955 accuracy , 0.958 sensitivity, and 0.947 specificity.

Kata Kunci : ekpresi gen, kemunculan kembali kanker, klasifikasi, diskritisasi, seleksi fitur, hidden naïve bayes