Google
Apr 17, 2024For the wide application of LLMs, the inference efficiency is an essential concern, which has been widely studied in existing work, and numerous�...
In this work, we perform a detailed coarse-to-fine analysis of the inference performance of various code libraries. To evaluate the overall effectiveness, we�...
The survey aims to provide a comprehensive understanding of the current state and future directions in efficient LLM serving, offering valuable insights for�...
This repository contains scripts of coarse-to-fine evaluation for large language models, as detailed in the paper Towards Coarse-to-Fine Evaluation of Inference�...
Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models ... coarse-to-fine analysis of the inference performance�...
This paper offers a detailed study on inference efficiency in LLMs, highlighting the challenges and proposing solutions through a coarse-to-fine analytic�...
Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models. Yushuo Chen ,. Tianyi Tang ,. Erge Xiang ,. Linjiang Li ,. Wayne Xin Zhao�...
Oct 12, 2024LLMLingua's performance has been thoroughly evaluated using a variety of small language models as well as different closed Large Language Models�...
This paper presents a truly holistic evaluation for large language models (a "first of its kind study in terms of scope" as one reviewer put it), proposing�...
APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference ... Self-Play Fine ... DiJiang: Efficient Large Language Models�...