Version 1
: Received: 6 September 2024 / Approved: 9 September 2024 / Online: 9 September 2024 (14:01:46 CEST)
Version 2
: Received: 10 September 2024 / Approved: 10 September 2024 / Online: 10 September 2024 (14:16:55 CEST)
How to cite:
Li, X.; Ma, Y.; Huang, Y.; Wang, X.; Lin, Y.; Zhang, C. Integrated Optimization of Large Language Models: Synergizing Data Utilization and Compression Techniques. Preprints2024, 2024090662. https://doi.org/10.20944/preprints202409.0662.v2
Li, X.; Ma, Y.; Huang, Y.; Wang, X.; Lin, Y.; Zhang, C. Integrated Optimization of Large Language Models: Synergizing Data Utilization and Compression Techniques. Preprints 2024, 2024090662. https://doi.org/10.20944/preprints202409.0662.v2
Li, X.; Ma, Y.; Huang, Y.; Wang, X.; Lin, Y.; Zhang, C. Integrated Optimization of Large Language Models: Synergizing Data Utilization and Compression Techniques. Preprints2024, 2024090662. https://doi.org/10.20944/preprints202409.0662.v2
APA Style
Li, X., Ma, Y., Huang, Y., Wang, X., Lin, Y., & Zhang, C. (2024). Integrated Optimization of Large Language Models: Synergizing Data Utilization and Compression Techniques. Preprints. https://doi.org/10.20944/preprints202409.0662.v2
Chicago/Turabian Style
Li, X., Yuzhen Lin and Chenxi Zhang. 2024 "Integrated Optimization of Large Language Models: Synergizing Data Utilization and Compression Techniques" Preprints. https://doi.org/10.20944/preprints202409.0662.v2
Abstract
In this paper, we propose "Synergized Efficiency Optimization for Large Language Models" (SEO-LLM), a groundbreaking approach that integrates advanced data utilization and model compression techniques to significantly enhance the performance, efficiency, and scalability of large language models (LLMs). Our method synergistically combines Adaptive Data Augmentation (ADA), Transfer- Active Learning (TAL), Adaptive Iterative Pruning (AIP), and Synergistic Quantization and Distillation (SQD). These components work together to reduce the training data requirement by 30%, compress model size by 67.6%, and improve inference speed by up to 50%, while preserving or even enhancing model accuracy across various NLP tasks. ADA dynamically adjusts augmentation strategies to optimize model generalization, while TAL leverages pre-trained models to focus learning on the most informative data samples. AIP intelligently prunes less significant weights, and SQD harmonizes quantization with knowledge distillation to achieve high compression rates without significant performance loss. The synergy between these techniques makes SEO-LLM a robust solution for deploying LLMs in resource-constrained environments, maintaining state-of-the-art performance with a fraction of the computational and data resources.
Keywords
Natural Language Processing (NLP); Large Language Models (LLMs); Data Utilization; Model Compression; Knowledge Distillation
Subject
Computer Science and Mathematics, Computer Science
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.