Preserving Functional Fidelity in Compressed Large Language Models Through Corrective and Adaptive Decomposition

Muchammad Daniyal Kautsar

Muchammad Daniyal Kautsar, Syukron Abu Ishaq Alfarozi, S.T., Ph.D.;Widyawan, S.T., M.Sc., Ph.D

2025 | Skripsi | TEKNOLOGI INFORMASI

Abstrak
File Pdf

Large Language Models (LLMs) adalah kelas model artificial intelligence dalam bidang Natural Language Processing (NLP), yang dikenal dengan performa unggul dalam memahami, menghasilkan, dan memproses bahasa manusia. Model-model ini dilatih menggunakan dataset yang sangat besar dan mengandung jumlah parameter yang ekstrem, sehingga menimbulkan tantangan signifikan dalam hal computational complexity, konsumsi energi, kebutuhan memori, serta keterbatasan penerapan (deployability) pada perangkat dengan sumber daya terbatas. Keterbatasan ini telah mendorong pengembangan teknik compression yang dapat mengurangi ukuran model sambil tetap mempertahankan performa. Salah satu pendekatan yang menjanjikan adalah low-rank factorization menggunakan Singular Value Decomposition (SVD), yang menawarkan cara efisien untuk mengurangi jumlah parameter dan biaya komputasi. Namun, metode ini sering kali mengakibatkan hilangnya informasi fungsional yang penting. Untuk mengatasi masalah ini, kami mengusulkan CALR (Corrective Adaptive Low-Rank Decomposition), sebuah framework compression baru yang memperluas metode low-rank konvensional dengan memperkenalkan sebuah corrective module yang bersifat parallel dan learnable. CALR menerapkan strategi pelatihan dua tahap (two-stage training strategy): pertama, ia menginisialisasi jalur utama (primary pathway) menggunakan SVD, kemudian melatih corrective module untuk memulihkan output fungsional yang terdegradasi oleh rank truncation. Pendekatan ini secara konsisten mencapai trade-off yang unggul antara compression dan performance jika dibandingkan dengan metode lain seperti LaCo, ShortGPT, dan LoSparse. Evaluasi eksperimental pada tiga arsitektur LLM, yaitu SmolLM2-135M, Qwen3-0.6B, dan Llama-3.2-1B, menunjukkan bahwa CALR dapat mengurangi jumlah parameter dari 27% hingga 51%, sambil mempertahankan 77% hingga 89?ri performa model aslinya. Temuan ini mengonfirmasi efektivitas CALR dalam mempertahankan kapabilitas fungsional bahkan di bawah tingkat compression yang agresif.

Large Language Models (LLMs) are a class of artificial intelligence models in the field of Natural Language Processing (NLP), known for their outstanding performance in understanding, generating, and processing human language. These models are trained on vast datasets and contain an extremely large number of parameters, leading to significant challenges in terms of computational complexity, energy consumption, memory requirements, and limited deployability on resource-constrained devices. These limitations have driven the development of compression techniques that can reduce model size while maintaining performance. One promising approach is low-rank factorization using Singular Value Decomposition (SVD), which offers an efficient way to reduce the number of parameters and computational cost. However, this method often results in the loss of important functional information. To address this issue, we propose CALR (Corrective Adaptive Low-Rank Decomposition), a novel compression framework that extends conventional low-rank methods by introducing a parallel, learnable corrective module. CALR employs a two-stage training strategy: it first initializes the primary pathway using SVD, then trains the corrective module to recover the functional output degraded by rank truncation. This approach consistently achieves a superior trade-off between compression and performance compared to leading methods such as LaCo, ShortGPT, and LoSparse. Experimental evaluations on three LLM architectures, SmolLM2-135M, Qwen3-0.6B, and Llama-3.2-1B, demonstrate that CALR can reduce the number of parameters from 27% to 51%, while maintaining 77% to 89% of the original model performance. These findings confirm the effectiveness of CALR in preserving functional capabilities even under aggressive compression.

Kata Kunci : LLM Compression, Low-Rank Factorization, Singular Value Decomposition (SVD), Corrective Module, Model Efficiency.

S1-2025-479067-abstract.pdf
S1-2025-479067-bibliography.pdf
S1-2025-479067-tableofcontent.pdf
S1-2025-479067-title.pdf

LAYANAN

E-Resources

Quick Access