Fine Tuning and Comparing Language Models for Meeting Summarization
Katarina Keishanti Joanne Kartakusuma, Drs. Edi Winarko, M.Sc.,Ph.D
2025 | Skripsi | ILMU KOMPUTER
The increasing reliance on virtual meeting platforms, accelerated by the COVID-19 pandemic, has transformed communication and collaboration across professional and academic domains. However, the proliferation of online meetings has led to challenges such as information overload and difficulty in retaining key discussion points. Automated meeting summarization offers a practical solution to these challenges by condensing lengthy conversations into concise and informative summaries. This study focuses on optimizing large language models (LongT5, BigBird, and LED) for meeting summarization tasks. These models, based on transformer architectures, are fine-tuned on the MeetingBank dataset, a domain-specific corpus tailored to capture the structure and nuances of meeting transcripts. The research explores both standard fine-tuning and parameter-efficient fine-tuning (PEFT) techniques to enhance model performance while improving computational efficiency. The methodology involves data preprocessing steps such as text cleaning, tokenization, and filtering to ensure quality inputs, followed by rigorous model training and evaluation using ROUGE and BERTScore metrics. By comparing the performance of different fine-tuning strategies across models, this study provides insights into the most effective approaches for adapting large language models to meeting summarization. The findings contribute to the field of natural language processing (NLP) by advancing the use of transformer-based models for structured summarization and proposing efficient methodologies for handling long-text summarization in real-world applications.
The increasing reliance on virtual meeting platforms, accelerated by the COVID-19 pandemic, has transformed communication and collaboration across professional and academic domains. However, the proliferation of online meetings has led to challenges such as information overload and difficulty in retaining key discussion points. Automated meeting summarization offers a practical solution to these challenges by condensing lengthy conversations into concise and informative summaries. This study focuses on optimizing large language models (LongT5, BigBird, and LED) for meeting summarization tasks. These models, based on transformer architectures, are fine-tuned on the MeetingBank dataset, a domain-specific corpus tailored to capture the structure and nuances of meeting transcripts. The research explores both standard fine-tuning and parameter-efficient fine-tuning (PEFT) techniques to enhance model performance while improving computational efficiency. The methodology involves data preprocessing steps such as text cleaning, tokenization, and filtering to ensure quality inputs, followed by rigorous model training and evaluation using ROUGE and BERTScore metrics. By comparing the performance of different fine-tuning strategies across models, this study provides insights into the most effective approaches for adapting large language models to meeting summarization. The findings contribute to the field of natural language processing (NLP) by advancing the use of transformer-based models for structured summarization and proposing efficient methodologies for handling long-text summarization in real-world applications.
Kata Kunci : Meeting Summarization, Large Language Models, Fine Tuning Abstractive Summarization, MeetingBank Dataset