ANALISIS KOMPARATIF PEMODELAN TOPIK BERBASIS PROBABILISTIK, NON-PROBABILISTIK, DAN NEURAL EMBEDDING SERTA INTEGRASI GENERATIVE AI UNTUK PELABELAN TOPIK

Rahma Nur Annisa

This study aims to conduct a comparative analysis of topic modeling methods across these paradigms and to evaluate the integration of Generative AI at the topic labeling stage. The increasing volume of unstructured textual data necessitates analytical methods capable of systematically extracting latent thematic structures. Differences in topic modeling paradigms lead to variations in topic characteristics in terms of coherence, diversity, and interpretability. The dataset consists of 10,000 Indonesian-language user reviews of the Shopee application collected from Google Play Store up to 20 October 2025 and preprocessed through standard text cleaning procedures. The implemented methods include Latent Dirichlet Allocation (LDA), Non-Negative Matrix Factorization (NMF), Contextualized Topic Model (CTM), and Bidirectional Encoder Representations from Transformers Topic Modeling (BERTopic). Model performance is evaluated using topic coherence and topic diversity metrics. Subsequently, topic labeling is performed using the Gemini Flash Generative AI model to generate semantically consistent and context-aware topic labels. The results indicate that neural embedding–based models, particularly BERTopic, achieve the best balance between topic coherence and topic diversity, while LDA and NMF exhibit limitations in capturing complex semantic relationships despite their strong structural interpretability. The integration of Generative AI is shown to improve the readability of topic interpretation without modifying the underlying topic structures produced by the base models

LAYANAN

E-Resources

Quick Access