Skip to main content

Showing 1–5 of 5 results for author: Muła, W

  1. Transcoding Billions of Unicode Characters per Second with SIMD Instructions

    Authors: Daniel Lemire, Wojciech Muła

    Abstract: In software, text is often represented using Unicode formats (UTF-8 and UTF-16). We frequently have to convert text from one format to the other, a process called transcoding. Popular transcoding functions are slower than state-of-the-art disks and networks. These transcoding functions make little use of the single-instruction-multiple-data (SIMD) instructions available on commodity processors. By… ▽ More

    Submitted 14 November, 2022; v1 submitted 21 September, 2021; originally announced September 2021.

    Comments: Software: https://github.com/simdutf/simdutf

    Journal ref: Software: Practice and Experience, Volume 52, Issue 2 February 2022 Pages 555-575

  2. Efficient Computation of Positional Population Counts Using SIMD Instructions

    Authors: Marcus D. R. Klarqvist, Wojciech Muła, Daniel Lemire

    Abstract: In several fields such as statistics, machine learning, and bioinformatics, categorical variables are frequently represented as one-hot encoded vectors. For example, given 8 distinct values, we map each value to a byte where only a single bit has been set. We are motivated to quickly compute statistics over such encodings. Given a stream of k-bit words, we seek to compute k distinct sums correspon… ▽ More

    Submitted 11 May, 2021; v1 submitted 6 November, 2019; originally announced November 2019.

    Journal ref: Concurrency and Computation: Practice and Experience 33 (17), 2021

  3. Base64 encoding and decoding at almost the speed of a memory copy

    Authors: Wojciech Muła, Daniel Lemire

    Abstract: Many common document formats on the Internet are text-only such as email (MIME) and the Web (HTML, JavaScript, JSON and XML). To include images or executable code in these documents, we first encode them as text using base64. Standard base64 encoding uses 64~ASCII characters: both lower and upper case Latin letters, digits and two other symbols. We show how we can encode and decode base64 data at… ▽ More

    Submitted 2 October, 2019; originally announced October 2019.

    Journal ref: Software: Practice and Experience 50 (2), 2020

  4. arXiv:1704.00605  [pdf, other

    cs.MS cs.PF

    Faster Base64 Encoding and Decoding Using AVX2 Instructions

    Authors: Wojciech Muła, Daniel Lemire

    Abstract: Web developers use base64 formats to include images, fonts, sounds and other resources directly inside HTML, JavaScript, JSON and XML files. We estimate that billions of base64 messages are decoded every day. We are motivated to improve the efficiency of base64 encoding and decoding. Compared to state-of-the-art implementations, we multiply the speeds of both the encoding (~10x) and the decoding (… ▽ More

    Submitted 14 June, 2018; v1 submitted 30 March, 2017; originally announced April 2017.

    Comments: software at https://github.com/lemire/fastbase64

    Journal ref: ACM Transactions on the Web 12 (3), 2018

  5. Faster Population Counts Using AVX2 Instructions

    Authors: Wojciech Muła, Nathan Kurz, Daniel Lemire

    Abstract: Counting the number of ones in a binary stream is a common operation in database, information-retrieval, cryptographic and machine-learning applications. Most processors have dedicated instructions to count the number of ones in a word (e.g., popcnt on x64 processors). Maybe surprisingly, we show that a vectorized approach using SIMD instructions can be twice as fast as using the dedicated instruc… ▽ More

    Submitted 5 September, 2018; v1 submitted 22 November, 2016; originally announced November 2016.

    Comments: Software is at https://github.com/CountOnes/hamming_weight

    Journal ref: Computer Journal, Volume 61, Issue 1, 1 January 2018