subscribe to arXiv mailings

doi 10.1002/spe.3036

Transcoding Billions of Unicode Characters per Second with SIMD Instructions

Abstract: In software, text is often represented using Unicode formats (UTF-8 and UTF-16). We frequently have to convert text from one format to the other, a process called transcoding. Popular transcoding functions are slower than state-of-the-art disks and networks. These transcoding functions make little use of the single-instruction-multiple-data (SIMD) instructions available on commodity processors. By… ▽ More In software, text is often represented using Unicode formats (UTF-8 and UTF-16). We frequently have to convert text from one format to the other, a process called transcoding. Popular transcoding functions are slower than state-of-the-art disks and networks. These transcoding functions make little use of the single-instruction-multiple-data (SIMD) instructions available on commodity processors. By designing transcoding algorithms for SIMD instructions, we multiply the speed of transcoding on current systems (x64 and ARM). To ensure reproducibility, we make our software freely available as an open source library. △ Less

Submitted 14 November, 2022; v1 submitted 21 September, 2021; originally announced September 2021.

Comments: Software: https://github.com/simdutf/simdutf

Journal ref: Software: Practice and Experience, Volume 52, Issue 2 February 2022 Pages 555-575

arXiv:1911.02696 [pdf, other]

doi 10.1002/cpe.6304

Efficient Computation of Positional Population Counts Using SIMD Instructions

Authors: Marcus D. R. Klarqvist, Wojciech Muła, Daniel Lemire

Abstract: In several fields such as statistics, machine learning, and bioinformatics, categorical variables are frequently represented as one-hot encoded vectors. For example, given 8 distinct values, we map each value to a byte where only a single bit has been set. We are motivated to quickly compute statistics over such encodings. Given a stream of k-bit words, we seek to compute k distinct sums correspon… ▽ More In several fields such as statistics, machine learning, and bioinformatics, categorical variables are frequently represented as one-hot encoded vectors. For example, given 8 distinct values, we map each value to a byte where only a single bit has been set. We are motivated to quickly compute statistics over such encodings. Given a stream of k-bit words, we seek to compute k distinct sums corresponding to bit values at indexes 0, 1, 2, ..., k-1. If the k-bit words are one-hot encoded then the sums correspond to a frequency histogram. This multiple-sum problem is a generalization of the population-count problem where we seek the sum of all bit values. Accordingly, we refer to the multiple-sum problem as a positional population-count. Using SIMD (Single Instruction, Multiple Data) instructions from recent Intel processors, we describe algorithms for computing the 16-bit position population count using less than half of a CPU cycle per 16-bit word. Our best approach uses up to 400 times fewer instructions and is up to 50 times faster than baseline code using only regular (non-SIMD) instructions, for sufficiently large inputs. △ Less

Submitted 11 May, 2021; v1 submitted 6 November, 2019; originally announced November 2019.

Journal ref: Concurrency and Computation: Practice and Experience 33 (17), 2021

arXiv:1910.05109 [pdf, other]

doi 10.1002/spe.2777

Base64 encoding and decoding at almost the speed of a memory copy

Authors: Wojciech Muła, Daniel Lemire

Abstract: Many common document formats on the Internet are text-only such as email (MIME) and the Web (HTML, JavaScript, JSON and XML). To include images or executable code in these documents, we first encode them as text using base64. Standard base64 encoding uses 64~ASCII characters: both lower and upper case Latin letters, digits and two other symbols. We show how we can encode and decode base64 data at… ▽ More Many common document formats on the Internet are text-only such as email (MIME) and the Web (HTML, JavaScript, JSON and XML). To include images or executable code in these documents, we first encode them as text using base64. Standard base64 encoding uses 64~ASCII characters: both lower and upper case Latin letters, digits and two other symbols. We show how we can encode and decode base64 data at nearly the speed of a memory copy (memcpy) on recent Intel processors, as long as the data does not fit in the first-level (L1) cache. We use the SIMD (Single Instruction Multiple Data) instruction set AVX-512 available on commodity processors. Our implementation generates several times fewer instructions than previous SIMD-accelerated base64 codecs. It is also more versatile, as it can be adapted---even at runtime---to any base64 variant by only changing constants. △ Less

Submitted 2 October, 2019; originally announced October 2019.

Journal ref: Software: Practice and Experience 50 (2), 2020

arXiv:1704.00605 [pdf, other]

doi 10.1145/3132709

Faster Base64 Encoding and Decoding Using AVX2 Instructions

Authors: Wojciech Muła, Daniel Lemire

Abstract: Web developers use base64 formats to include images, fonts, sounds and other resources directly inside HTML, JavaScript, JSON and XML files. We estimate that billions of base64 messages are decoded every day. We are motivated to improve the efficiency of base64 encoding and decoding. Compared to state-of-the-art implementations, we multiply the speeds of both the encoding (~10x) and the decoding (… ▽ More Web developers use base64 formats to include images, fonts, sounds and other resources directly inside HTML, JavaScript, JSON and XML files. We estimate that billions of base64 messages are decoded every day. We are motivated to improve the efficiency of base64 encoding and decoding. Compared to state-of-the-art implementations, we multiply the speeds of both the encoding (~10x) and the decoding (~7x). We achieve these good results by using the single-instruction-multiple-data (SIMD) instructions available on recent Intel processors (AVX2). Our accelerated software abides by the specification and reports errors when encountering characters outside of the base64 set. It is available online as free software under a liberal license. △ Less

Submitted 14 June, 2018; v1 submitted 30 March, 2017; originally announced April 2017.

Comments: software at https://github.com/lemire/fastbase64

Journal ref: ACM Transactions on the Web 12 (3), 2018

arXiv:1611.07612 [pdf, other]

doi 10.1093/comjnl/bxx046

Faster Population Counts Using AVX2 Instructions

Authors: Wojciech Muła, Nathan Kurz, Daniel Lemire

Abstract: Counting the number of ones in a binary stream is a common operation in database, information-retrieval, cryptographic and machine-learning applications. Most processors have dedicated instructions to count the number of ones in a word (e.g., popcnt on x64 processors). Maybe surprisingly, we show that a vectorized approach using SIMD instructions can be twice as fast as using the dedicated instruc… ▽ More Counting the number of ones in a binary stream is a common operation in database, information-retrieval, cryptographic and machine-learning applications. Most processors have dedicated instructions to count the number of ones in a word (e.g., popcnt on x64 processors). Maybe surprisingly, we show that a vectorized approach using SIMD instructions can be twice as fast as using the dedicated instructions on recent Intel processors. The benefits can be even greater for applications such as similarity measures (e.g., the Jaccard index) that require additional Boolean operations. Our approach has been adopted by LLVM: it is used by its popular C compiler (clang). △ Less

Submitted 5 September, 2018; v1 submitted 22 November, 2016; originally announced November 2016.

Comments: Software is at https://github.com/CountOnes/hamming_weight

Journal ref: Computer Journal, Volume 61, Issue 1, 1 January 2018

Showing 1–5 of 5 results for author: Muła, W