Analisis Pemanfaatan Generative Adversarial Network (GAN) dalam Klasifikasi Banjir dengan Algoritma K-Nearest Neighbor dan Random Forest
WAHYU AFRIZA, Dr. Mardhani Riasetiawan, SE Ak, MT; Dr. Dyah Aruming Tyas, S.Si.
2023 | Skripsi | ELEKTRONIKA DAN INSTRUMENTASI
Indonesia is a country with a tropical climate that has high rainfall rates and is supported by the uncertainty of weather and climate conditions. With the uncertainty of weather and climate as well as flood events, minimal predictive information on flooding, and the lack of availability of data on the causes of flooding, a comparison of synthetic data generation from the minimal data available from BMKG with synthetic data generation from Kaggle online platform data in the form of temperature and humidity data, rainfall, and wind speed from BMKG and annual rain data from Kaggle was analyzed. This research aims to obtain the results of data comparison analysis of synthetic data generation from different datasets with the benchmark of classification system results using K-Nearest Neighbor (KNN) and Random Forest and accuracy evaluation with Confusion Matrix.
The research process uses climate data from the BMKG DI Yogyakarta Climatology Station within 20 months, the Geophysical Station within 12 months, and Kerala data with a range of 1901–2018. Synthetic data generation is done using the Conditional Tabular Generative Adversarial Network (CTGAN) model. CTGAN produces quite good data in terms of distribution and data differences if the original data is large and the synthetic data produced is small. The KNN classification system on the BMKG data experienced overfitting with an 85–94% evaluation and decreasing validation in the 89%–65% range. Random Forest was more optimal on this data with an evaluation range of 68–98% and validation at 65–98% with both decreasing. This is due to the absence of uniqueness in the data and too little original data made into synthetics, which affects the difficulty of the classification system in identifying data that is quite different in distance and data values generated by CTGAN. While in Kerala data, KKN is very optimal in classification with accuracy values in the evaluation in the range of 92-95% and validation in the range of 0.72-0.83% and Random Forest tends to be less able to identify data with the YES class due to some distribution of data generated by CTGAN how many in the same class.
Kata Kunci : Klasifikasi, Hujan, Banjir, K-Nearest Neighbor (KNN), Data Sintetik, Conditional Tabular Generative Adversarial Network (CTGAN)