Implementasi dan Analisis Deteksi Anomali Hadoop Distributed File System Berbasis Transformer FP16
Ridwan Akmal, Ir. Yuris Mulya Saputra, S.T., M.Sc., Ph.D., IPM.
2025 | Tugas Akhir | D4 TEKNOLOGI JARINGAN
Dalam era big data seperti saat ini, sistem manajemen data skala besar seperti Hadoop Distributed File System (HDFS) menjadi pemegang peran penting dalam pengolahan dan penyimpanan data dalam jumlah besar. Namun, tantangan utama dari sistem ini adalah mendeteksi anomali secara efektif untuk mencegah potensi gangguan operasional dan ancaman keamanan. Penelitian ini mengusulkan metode berbasis transformer untuk mendeteksi anomali pada log HDFS dengan membandingkan performa tiga model pretrained, yaitu ELECTRA, MiniLM, dan DistilBERT, yang dioptimisasikan dengan pendekatan mixed precision training FP16. Eksperimen dilakukan pada dataset Loghub dengan melakukan beberapa tahapan seperti pemetaan Event Id ke log message, sampling, tokenisasi, pelatihan, evaluasi model, serta integrasi model dan web. Hasil eksperimen menunjukkan bahwa ELECTRA menjadi model terbaik di berbagai metrik seperti loss, accuracy, precision, recall serta F1-score tertinggi yaitu 0.99860. Model ini juga memiliki load time tercepat (0.2286 detik) dan inference time tercepat (0.0241 detik). Berdasarkan penelitian ini model berbasis transformer khususnya ELECTRA dapat diimplementasikan secara efektif untuk deteksi anomali pada log HDFS.
In the era of big
data, large-scale data management systems such as the Hadoop Distributed File
System (HDFS) play a crucial role in processing and storing vast amounts of
data. However, a major challenge in these systems is effectively detecting
anomalies to prevent operational disruptions and security threats. This study
proposes a transformer-based approach for anomaly detection in HDFS logs by
comparing the performance of three pretrained
models: ELECTRA, MiniLM, DistilBERT and optimized using the mixed precision training
FP16 approach. Experiments were conducted on the Loghub dataset, involving
several stages such as mapping Event IDs to log messages, sampling,
tokenization, model training, model evaluation, and integration with a
web-based system. The experimental results show that ELECTRA outperforms the
other models across various metrics, achieving the highest scores in loss,
accuracy, precision, recall, and an F1-score of 0.99860. Additionally, ELECTRA
demonstrated the fastest load time (0.2286 seconds) and inference time (0.0241
seconds). Based on this study, transformer-based models, particularly ELECTRA,
can be effectively implemented for anomaly detection in HDFS logs.
Kata Kunci : Deteksi anomali, Hadoop Distributed File System, ELECTRA, MiniLM, DistilBERT