Computer Science > Machine Learning

arXiv:2209.12127 (cs)

[Submitted on 25 Sep 2022 (v1), last revised 13 Oct 2023 (this version, v3)]

Title:SpeedLimit: Neural Architecture Search for Quantized Transformer Models

Authors:Yuji Chai, Luke Bailey, Yunho Jin, Matthew Karle, Glenn G. Ko, David Brooks, Gu-Yeon Wei, H. T. Kung

View PDF

Abstract:While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints. Addressing this challenge, we introduce SpeedLimit, a novel Neural Architecture Search (NAS) technique that optimizes accuracy whilst adhering to an upper-bound latency constraint. Our method incorporates 8-bit integer quantization in the search process to outperform the current state-of-the-art technique. Our results underline the feasibility and efficacy of seeking an optimal balance between performance and latency, providing new avenues for deploying state-of-the-art transformer models in latency-sensitive environments.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2209.12127 [cs.LG]
	(or arXiv:2209.12127v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2209.12127

Submission history

From: Yuji Chai [view email]
[v1] Sun, 25 Sep 2022 02:56:01 UTC (1,135 KB)
[v2] Fri, 23 Jun 2023 00:49:36 UTC (1,907 KB)
[v3] Fri, 13 Oct 2023 17:21:46 UTC (1,822 KB)

Computer Science > Machine Learning

Title:SpeedLimit: Neural Architecture Search for Quantized Transformer Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:SpeedLimit: Neural Architecture Search for Quantized Transformer Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators