Search results

2024/1439 (PDF) Last updated: 2024-09-14

Scabbard: An Exploratory Study on Hardware Aware Design Choices of Learning with Rounding-based Key Encapsulation Mechanisms

Suparna Kundu, Quinten Norga, Angshuman Karmakar, Shreya Gangopadhyay, Jose Maria Bermudo Mera, Ingrid Verbauwhede

Implementation

Recently, the construction of cryptographic schemes based on hard lattice problems has gained immense popularity. Apart from being quantum resistant, lattice-based cryptography allows a wide range of variations in the underlying hard problem. As cryptographic schemes can work in different environments under different operational constraints such as memory footprint, silicon area, efficiency, power requirement, etc., such variations in the underlying hard problem are very useful for designers...

2024/1288 (PDF) Last updated: 2024-08-16

KpqClean Ver2: Comprehensive Benchmarking and Analysis of KpqC Algorithm Round 2 Submissions

Minjoo Sim, Siwoo Eum, Gyeongju Song, Minwoo Lee, Sangwon Kim, Minho Song, Hwajeong Seo

Implementation

From 2022, Korean Post-Quantum Cryptography (KpqC) Competition has been held. Among the Round 1 algorithms of KpqC, eight algorithms were selected in December 2023. To evaluate the algorithms, the performance is critical factor. However, the performance of the algorithms submitted to KpqC was evaluated in different development environments. Consequently, it is difficult to compare the performance of each algorithm fairly, because the measurements were not conducted in the identical...

2024/1282 (PDF) Last updated: 2024-09-02

NTRU+PKE: Efficient Public-Key Encryption Schemes from the NTRU Problem

Jonghyun Kim, Jong Hwan Park

Public-key cryptography

We propose a new NTRU-based Public-Key Encryption (PKE) scheme called $\mathsf{NTRU+}\mathsf{PKE}$, which effectively incorporates the Fujisaki-Okamoto transformation for PKE (denoted as $\mathsf{FO}_{\mathsf{PKE}}$) to achieve chosen-ciphertext security in the Quantum Random Oracle Model (QROM). While $\mathsf{NTRUEncrypt}$, a first-round candidate in the NIST PQC standardization process, was proven to be chosen-ciphertext secure in the Random Oracle Model (ROM), it lacked corresponding...

2024/815 (PDF) Last updated: 2024-05-26

Faster verifications and smaller signatures: Trade-offs for ALTEQ using rejections

Arnaud Sipasseuth

Public-key cryptography

In this paper, we introduce a new probability function parameter in the instantiations of the Goldreich-Micali-Wigderson with Fiat-Shamir and unbalanced challenges used in ALTEQ, a recent NIST PQC candidate in the call for additional signatures. This probability set at 100% does not bring any changes in the scheme, but modifies the public challenge generation process when below 100%, by injecting potential rejections in otherwise completely valid inputs. From a theoretical point of view,...

2024/748 (PDF) Last updated: 2024-05-16

PERK: Compact Signature Scheme Based on a New Variant of the Permuted Kernel Problem

Slim Bettaieb, Loïc Bidoux, Victor Dyseryn, Andre Esser, Philippe Gaborit, Mukul Kulkarni, Marco Palumbi

Public-key cryptography

In this work we introduce PERK a compact digital signature scheme based on the hardness of a new variant of the Permuted Kernel Problem (PKP). PERK achieves the smallest signature sizes for any PKP-based scheme for NIST category I security with 6 kB, while obtaining competitive signing and verification timings. PERK also compares well with the general state-of-the-art. To substantiate those claims we provide an optimized constant-time AVX2 implementation, a detailed performance analysis and...

2024/070 (PDF) Last updated: 2024-06-10

Hints from Hertz: Dynamic Frequency Scaling Side-Channel Analysis of Number Theoretic Transform in Lattice-Based KEMs

Tianrun Yu, Chi Cheng, Zilong Yang, Yingchen Wang, Yanbin Pan, Jian Weng

Attacks and cryptanalysis

Number Theoretic Transform (NTT) has been widely used in accelerating computations in lattice-based cryptography. However, attackers can potentially launch power analysis targeting NTT because it is usually the most time-consuming part of the implementation. This extended time frame provides a natural window of opportunity for attackers. In this paper, we investigate the first CPU frequency leakage (Hertzbleed-like) attacks against NTT in lattice-based KEMs. Our key observation is that...

2024/069 (PDF) Last updated: 2024-01-16

SDitH in Hardware

Sanjay Deshpande, James Howe, Jakub Szefer, Dongze Yue

Implementation

This work presents the first hardware realisation of the Syndrome-Decoding-in-the-Head (SDitH) signature scheme, which is a candidate in the NIST PQC process for standardising post-quantum secure digital signature schemes. SDitH's hardness is based on conservative code-based assumptions, and it uses the Multi-Party-Computation-in-the-Head (MPCitH) construction. This is the first hardware design of a code-based signature scheme based on traditional decoding problems and only the second for...

2023/1962 (PDF) Last updated: 2024-06-19

A Survey of Polynomial Multiplications for Lattice-Based Cryptosystems

Vincent Hwang

Implementation

We survey various mathematical tools used in software works multiplying polynomials in \[ \frac{\mathbb{Z}_q[x]}{\left\langle {x^n - \alpha x - \beta} \right\rangle}. \] In particular, we survey implementation works targeting polynomial multiplications in lattice-based cryptosystems Dilithium, Kyber, NTRU, NTRU Prime, and Saber with instruction set architectures/extensions Armv7-M, Armv7E-M, Armv8-A, and AVX2. There are three emphases in this paper: (i) modular arithmetic, (ii)...

2023/1760 (PDF) Last updated: 2024-02-11

Biscuit: New MPCitH Signature Scheme from Structured Multivariate Polynomials

Luk Bettale, Delaram Kahrobaei, Ludovic Perret, Javier Verbel

Cryptographic protocols

This paper describes Biscuit, a new multivariate-based signature scheme derived using the MPC-in-the-Head (MPCitH) approach. The security of Biscuit is related to the problem of solving a set of structured quadratic algebraic equations. These equations are highly compact and can be evaluated using very few multiplications (one multiplication per equation). The core of Biscuit is a rather simple MPC protocol for secure multiplications using standard optimized multiplicative triples. This...

2023/1683 (PDF) Last updated: 2024-01-15

Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4

Ward Beullens, Fabio Campos, Sofía Celi, Basil Hess, Matthias J. Kannwischer

Implementation

MAYO is a popular high-calorie condiment as well as an auspicious candidate in the ongoing NIST competition for additional post-quantum signature schemes achieving competitive signature and public key sizes. In this work, we present high-speed implementations of MAYO using the AVX2 and Armv7E-M instruction sets targeting recent x86 platforms and the Arm Cortex-M4. Moreover, the main contribution of our work is showing that MAYO can be even faster when switching from a bitsliced...

2023/1666 (PDF) Last updated: 2024-01-31

MiRitH: Efficient Post-Quantum Signatures from MinRank in the Head

Gora Adj, Stefano Barbero, Emanuele Bellini, Andre Esser, Luis Rivera-Zamarripa, Carlo Sanna, Javier Verbel, Floyd Zweydinger

Public-key cryptography

Since 2016’s NIST call for standardization of post-quantum cryptographic primitives, developing efficient post-quantum secure digital signature schemes has become a highly active area of research. The difficulty in constructing such schemes is evidenced by NIST reopening the call in 2022 for digital signature schemes, because of missing diversity in existing proposals. In this work, we introduce the new post-quantum digital signature scheme MiRitH. As direct successor of a scheme recently...

2023/1637 (PDF) Last updated: 2024-01-30

Algorithmic Views of Vectorized Polynomial Multipliers – NTRU

Han-Ting Chen, Yi-Hua Chung, Vincent Hwang, Bo-Yin Yang

Implementation

The lattice-based post-quantum cryptosystem NTRU is used by Google for protecting Google’s internal communication. In NTRU, polynomial multiplication is one of bottleneck. In this paper, we explore the interactions between polynomial multiplication, Toeplitz matrix–vector product, and vectorization with architectural insights. For a unital commutative ring $R$, a positive integer $n$, and an element $\zeta \in R$, we reveal the benefit of vector-by-scalar multiplication instructions while...

2023/721 (PDF) Last updated: 2023-05-22

A Fast RLWE-Based IPFE Library and its Application to Privacy-Preserving Biometric Authentication

Supriya Adhikary, Angshuman Karmakar

Public-key cryptography

With the increased use of data and communication through the internet and the abundant misuse of personal data by many organizations, people are more sensitive about their privacy. Privacy-preserving computation is becoming increasingly important in this era. Functional encryption allows a user to evaluate a function on encrypted data without revealing sensitive information. Most implementations of functional encryption schemes are too time-consuming for practical use. Mera et al. first...

2023/624 (PDF) Last updated: 2024-07-04

HAETAE: Shorter Lattice-Based Fiat-Shamir Signatures

Jung Hee Cheon, Hyeongmin Choe, Julien Devevey, Tim Güneysu, Dongyeon Hong, Markus Krausz, Georg Land, Marc Möller, Damien Stehlé, MinJune Yi

Public-key cryptography

We present HAETAE (Hyperball bimodAl modulE rejecTion signAture schemE), a new lattice-based signature scheme. Like the NIST-selected Dilithium signature scheme, HAETAE is based on the Fiat-Shamir with Aborts paradigm, but our design choices target an improved complexity/compactness compromise that is highly relevant for many space-limited application scenarios. We primarily focus on reducing signature and verification key sizes so that signatures fit into one TCP or UDP datagram while...

2023/604 (PDF) Last updated: 2024-04-18

Pushing the Limit of Vectorized Polynomial Multiplication for NTRU Prime

Vincent Hwang

Implementation

We conduct a systematic examination of vector arithmetic for polynomial multiplications in software. Vector instruction sets and extensions typically specify a fixed number of registers, each holding a power-of-two number of bits, and support a wide variety of vector arithmetic on registers. Programmers then try to align mathematical computations with the vector arithmetic supported by the designated instruction set or extension. We delve into the intricacies of this process for polynomial...

2023/541 (PDF) Last updated: 2023-08-08

Algorithmic Views of Vectorized Polynomial Multipliers for NTRU and NTRU Prime (Long Paper)

Han-Ting Chen, Yi-Hua Chung, Vincent Hwang, Chi-Ting Liu, Bo-Yin Yang

Implementation

This paper explores the design space of vector-optimized polynomial multiplications in the lattice-based key-encapsulation mechanisms NTRU and NTRU Prime. Since NTRU and NTRU Prime do not support straightforward applications of number– theoretic transforms, the state-of-the-art vector code either resorted to Toom–Cook, or introduced various techniques for coefficient ring extensions. All these techniques lead to a large number of small-degree polynomial multiplications, which is the...

2023/399 (PDF) Last updated: 2023-03-21

High Throughput Lattice-based Signatures on GPUs: Comparing Falcon and Mitaka

Wai-Kong Lee, Raymond K. Zhao, Ron Steinfeld, Amin Sakzad, Seong Oun Hwang

Implementation

The US National Institute of Standards and Technology initiated a standardization process for post-quantum cryptography in 2017, with the aim of selecting key encapsulation mechanisms and signature schemes that can withstand the threat from emerging quantum computers. In 2022, Falcon was selected as one of the standard signature schemes, eventually attracting effort to optimize the implementation of Falcon on various hardware architectures for practical applications. Recently, Mitaka was...

2023/059 (PDF) Last updated: 2023-04-10

Oil and Vinegar: Modern Parameters and Implementations

Ward Beullens, Ming-Shing Chen, Shih-Hao Hung, Matthias J. Kannwischer, Bo-Yuan Peng, Cheng-Jhih Shih, Bo-Yin Yang

Implementation

Two multivariate digital signature schemes, Rainbow and GeMSS, made it into the third round of the NIST PQC competition. However, either made its way to being a standard due to devastating attacks (in one case by Beullens, the other by Tao, Petzoldt, and Ding). How should multivariate cryptography recover from this blow? We propose that, rather than trying to fix Rainbow and HFEv- by introducing countermeasures, the better approach is to return to the classical Oil and Vinegar scheme. We...

2022/1726 (PDF) Last updated: 2022-12-14

Optimization for SPHINCS+ using Intel Secure Hash Algorithm Extensions

Thomas Hanson, Qian Wang, Santosh Ghosh, Fernando Virdia, Anne Reinders, Manoj R. Sastry

Implementation

SPHINCS+ was selected as a candidate digital signature scheme for standardization by the NIST Post-Quantum Cryptography Standardization Process. It offers security capabilities relying only on the security of cryptographic hash functions. However, it is less efficient than the lattice-based schemes. In this paper, we present an optimized software library for the SPHINCS+ signature scheme, which combines the Intel® Secure Hash Algorithm Extensions (SHA-NI) and AVX2 vector instructions. We...

2022/1154 (PDF) Last updated: 2022-09-07

Efficient Constant-Time Implementation of SM4 with Intel GFNI instruction set extension and Arm NEON coprocessor

Weiji Guo

Implementation

The efficiency of constant-time SM4 implementation has been lagging behind that of AES for most internet traffic and applicable data encryption scenarios. The best performance before our works was 3.77 cpb for x86 platform (AESNI + AVX2), and 8.62 cpb for Arm platform (NEON). Meanwhile the state of art constant-time AES implementation could reach 0.63 cpb. Dedicated SM4 instruction set extensions like those optionally available in Armv8.2, could achieve comparable cpb to AES. But they are...

2022/881 (PDF) Last updated: 2022-08-16

A Novel High-performance Implementation of CRYSTALS-Kyber with AI Accelerator

Lipeng Wan, Fangyu Zheng, Guang Fan, Rong Wei, Lili Gao, Jiankuo Dong, Jingqiang Lin, Yuewu Wang

Implementation

Public-key cryptography, including conventional cryptosystems and post-quantum cryptography, involves computation-intensive workloads. With noticing the extraordinary computing power of AI accelerators, in this paper, we further explore the feasibility to introduce AI accelerators into high-performance cryptographic computing. Since AI accelerators are dedicated to machine learning or neural networks, the biggest challenge is how to transform cryptographic workloads into their operations,...

2022/579 (PDF) Last updated: 2022-11-09

Compact and Efficient KEMs over NTRU Lattices

Zhichuang Liang, Boyue Fang, Jieyu Zheng, Yunlei Zhao

Public-key cryptography

The NTRU lattice is a promising candidate to construct practical cryptosystems, in particular key encapsulation mechanism (KEM), resistant to quantum computing attacks. Nevertheless, there are still some inherent obstacles to NTRU-based KEM schemes in having integrated performance, taking security, bandwidth, error probability, and computational efficiency {as a whole}, that is as good as and even better than their \{R,M\}LWE-based counterparts. In this work, we solve this problem by...

2022/482 (PDF) Last updated: 2022-04-23

cuFE: High Performance Privacy Preserving Support Vector Machine with Inner-Product Functional Encryption

KyungHyun Han, Wai-Kong Lee, Angshuman Karmakar, Jose Maria Bermudo Mera, Seong Oun Hwang

Public-key cryptography

Privacy preservation is a sensitive issue in our modern society. It is becoming increasingly important in many applications in this ever-growing and highly connected digital era. Functional encryption is a computation on encrypted data paradigm that allows users to retrieve the evaluation of a function on encrypted data without revealing the data, thus effectively protecting users' privacy. However, existing functional encryption implementations are still very time-consuming for practical...

2022/152 (PDF) Last updated: 2022-02-12

K-XMSS and K-SPHINCS$^+$:Hash based Signatures with\\Korean Cryptography Algorithms

Minjoo Sim, Siwoo Eum, Gyeongju Song, HyeokDong Kwon, Kyungbae Jang, HyunJun Kim, HyunJi Kim, Yujin Yang, Wonwoong Kim, Wai-Kong Lee, Hwajeong Seo

Implementation

Hash-Based Signature (HBS) uses a hash function to construct a digital signature scheme, where its security is guaranteed by the collision resistance of the hash function used. To provide sufficient security in the post-quantum environment, the length of hash should be satisfied. Modern HBS can be classified into stateful and stateless schemes. Two representative stateful and stateless HBS are XMSS and SPHINCS$^+$, respectively. In this paper, we propose two HBS schemes: K-XMSS...

2021/954 (PDF) Last updated: 2021-07-22

Scabbard: a suite of efficient learning with rounding key-encapsulation mechanisms

Jose Maria Bermudo Mera, Angshuman Karmakar, Suparna Kundu, Ingrid Verbauwhede

In this paper, we introduce Scabbard, a suite of post-quantum key-encapsulation mechanisms. Our suite contains three different schemes Florete, Espada, and Sable based on the hardness of module- or ring-learning with rounding problem. In this work, we first show how the latest advancements on lattice-based cryptography can be utilized to create new better schemes and even improve the state-of-the-art on post-quantum cryptography. We put particular focus on designing schemes that can...

2021/493 (PDF) Last updated: 2021-04-19

Optimizing BIKE for the Intel Haswell and ARM Cortex-M4

Ming-Shing Chen, Tung Chou, Markus Krausz

Implementation

BIKE is a key encapsulation mechanism that entered the third round of the NIST post-quantum cryptography standardization process. This paper presents two constant-time implementations for BIKE, one tailored for the Intel Haswell and one tailored for the ARM Cortex-M4. Our Haswell implementation is much faster than the avx2 implementation written by the BIKE team: for bikel1, the level-1 parameter set, we achieve a 1.39x speedup for decapsulation (which is the slowest operation) and a 1.33x...

2021/446 (PDF) Last updated: 2021-04-08

Towards practical GGM-based PRF from (Module-)Learning-with-Rounding

Chitchanok Chuengsatiansup, Damien Stehle

Foundations

We investigate the efficiency of a (module-)LWR-based PRF built using the GGM design. Our construction enjoys the security proof of the GGM construction and the (module-)LWR hardness assumption which is believed to be post-quantum secure. We propose GGM-based PRFs from PRGs with larger ratio of output to input. This reduces the number of PRG invocations which improves the PRF performance and reduces the security loss in the GGM security reduction. Our construction bridges the gap between...

2021/173 (PDF) Last updated: 2022-02-17

TensorCrypto

Wai-Kong Lee, Hwajeong Seo, Zhenfei Zhang, Seongoun Hwang

Implementation

Tensor core is a specially designed hardware included in new NVIDIA GPU chips, aimed at accelerating deep learning applications. With the introduction of tensor core, the matrix multiplication at low precision can be computed much faster than using conventional integer and floating point units in NVIDIA GPU. In the past, applications of tensor core were mainly restricted to machine learning and mixed precision scientific computing. In this paper, we show that for the first time, tensor core...

2020/1397 (PDF) Last updated: 2021-01-14

NTT Multiplication for NTT-unfriendly Rings

Chi-Ming Marvin Chung, Vincent Hwang, Matthias J. Kannwischer, Gregor Seiler, Cheng-Jhih Shih, Bo-Yin Yang

Implementation

In this paper, we show how multiplication for polynomial rings used in the NIST PQC finalists Saber and NTRU can be efficiently implemented using the Number-theoretic transform (NTT). We obtain superior performance compared to the previous state of the art implementations using Toom–Cook multiplication on both NIST’s primary software optimization targets AVX2 and Cortex-M4. Interestingly, these two platforms require different approaches: On the Cortex-M4, we use 32-bit NTT-based polynomial...

2020/1158 (PDF) Last updated: 2023-05-24

Don't throw your nonces out with the bathwater: Speeding up Dilithium by reusing the tail of y

Amber Sprenkels, Bas Westerbaan

Public-key cryptography

We suggest a small change to the Dilithium signature scheme, that allows one to reuse computations between rejected nonces, for a speed-up in signing time. The modification is based on the idea that, after rejecting on a too large $\|\mathbf{r}_0\|_\infty$, not all elements of the nonce $\mathbf{y}$ are spent. We swap the order of the checks; and if this $\mathbf{r}_0$-check fails, we only need to resample $y_1$. We provide a proof that shows that the modification does not affect the...

2020/572 (PDF) Last updated: 2020-10-26

HACL×N: Verified Generic SIMD Crypto (for all your favorite platforms)

Marina Polubelova, Karthikeyan Bhargavan, Jonathan Protzenko, Benjamin Beurdouche, Aymeric Fromherz, Natalia Kulatova, Santiago Zanella-Béguelin

Implementation

We present a new methodology for building formally verified cryptographic libraries that are optimized for multiple architectures. In particular, we show how to write and verify generic crypto code in the F* programming language that exploits single-instruction multiple data (SIMD) parallelism. We show how this code can be compiled to platforms that supports vector instructions, including ARM Neon and Intel AVX, AVX2, and AVX512. We apply our methodology to obtain verified vectorized...

2020/454 (PDF) Last updated: 2020-04-20

Optimized Lattice Basis Reduction In Dimension 2, and Fast Schnorr and EdDSA Signature Verification

Thomas Pornin

Implementation

We present an optimization of Lagrange's algorithm for lattice basis reduction in dimension 2. The optimized algorithm is proven to be correct and to always terminate with quadratic complexity; it uses more iterations on average than Lagrange's algorithm, but each iteration is much simpler to implement, and faster. The achieved speed is such that it makes application of the speed-up on ECDSA and EC Schnorr signatures described by Antipa et al worthwhile, even for very fast curves such...

2020/408 (PDF) Last updated: 2020-04-13

Speed up over the Rainbow

Nir Drucker, Shay Gueron

Implementation

Rainbow is a signature scheme that is based on multivariate polynomials. It is one of the Round-2 candidates of the NIST’s Post-Quantum Cryptography Standardization project. Its computations rely heavily on GF(2^8) arithmetic and the Rainbow submission optimizes the code by using AVX2 shuffle and permute instructions. In this paper, we show a new optimization that leverages: a) AVX512 architecture; b) the latest processor capabilities Galois Field New Instructions(GF-NI), available on Intel...

2019/1309 (PDF) Last updated: 2019-11-13

SaberX4: High-throughput Software Implementationof Saber Key Encapsulation Mechanism

Sujoy Sinha Roy

Implementation

Saber is a module lattice-based CCA-secure key encapsulation mechanism (KEM) which has been shortlisted for the second round of NIST's Post Quantum Cryptography Standardization project. To attain simplicity and efficiency on constrained devices, the Saber algorithm is serial by construction. However, on high-end platforms, such as modern Intel processors with AVX2 instructions, Saber achieves limited speedup using vector processing instructions due to its serial nature. In this paper we...

2019/085 (PDF) Last updated: 2020-04-24

The Lattice-Based Digital Signature Scheme qTESLA

Erdem Alkim, Paulo S. L. M. Barreto, Nina Bindel, Juliane Kramer, Patrick Longa, Jefferson E. Ricardini

Public-key cryptography

We present qTESLA, a family of post-quantum digital signature schemes that exhibits several attractive features such as simplicity and strong security guarantees against quantum adversaries, and built-in protection against certain side-channel and fault attacks. qTESLA---selected for round 2 of NIST's post-quantum cryptography standardization project---consolidates a series of recent schemes originating in works by Lyubashevsky, and Bai and Galbraith. We provide full-fledged, constant-time...

2019/040 (PDF) Last updated: 2020-02-09

NTTRU: Truly Fast NTRU Using NTT

Vadim Lyubashevsky, Gregor Seiler

Public-key cryptography

We present NTTRU -- an IND-CCA2 secure NTRU-based key encapsulation scheme that uses the number theoretic transform (NTT) over the cyclotomic ring $Z_{7681}[X]/(X^{768}-X^{384}+1)$ and produces public keys and ciphertexts of approximately $1.25$ KB at the $128$-bit security level. The number of cycles on a Skylake CPU of our constant-time AVX2 implementation of the scheme for key generation, encapsulation and decapsulation is approximately $6.4$K, $6.1$K, and $7.9$K, which is more than...

2018/230 (PDF) Last updated: 2019-03-18

Saber: Module-LWR based key exchange, CPA-secure encryption and CCA-secure KEM

Jan-Pieter D’Anvers, Angshuman Karmakar, Sujoy Sinha Roy, Frederik Vercauteren

Public-key cryptography

In this paper, we introduce Saber, a package of cryptographic primitives whose security relies on the hardness of the Module Learning With Rounding problem (Mod-LWR). We first describe a secure Diffie-Hellman type key exchange protocol, which is then transformed into an IND-CPA encryption scheme and finally into an IND-CCA secure key encapsulation mechanism using a post-quantum version of the Fujisaki-Okamoto transform. The design goals of this package were simplicity, efficiency and...

2018/229 (PDF) Last updated: 2018-03-01

Optimizing polynomial convolution for NTRUEncrypt

Wei Dai, William Whyte, Zhenfei Zhang

Implementation

NTRUEncrypt is one of the most promising candidates for quantum-safe cryptography. In this paper, we focus on the NTRU743 paramter set. We give a report on all known attacks against this parameter set and show that it delivers 256 bits of security against classical attackers and 128 bits of security against quantum attackers. We then present a parameter-dependent optimization using a tailored hierarchy of multipli- cation algorithms as well as the Intel AVX2 instructions, and show that this...

2018/039 (PDF) Last updated: 2018-01-09

Faster AVX2 optimized NTT multiplication for Ring-LWE lattice cryptography

Gregor Seiler

Implementation

Constant-time polynomial multiplication is one of the most time-consuming operations in many lattice-based cryptographic constructions. For schemes based on the hardness of Ring-LWE in power-of-two cyclotomic fields with completely splitting primes, the AVX2 optimized implementation of the Number-Theoretic Transform (NTT) from the NewHope key-exchange scheme is the state of the art for fast multiplication. It uses floating point vector instructions. We show that by using a modification of...

2017/1005 (PDF) Last updated: 2021-08-25

Tightly-Secure Key-Encapsulation Mechanism in the Quantum Random Oracle Model

Tsunekazu Saito, Keita Xagawa, Takashi Yamakawa

Public-key cryptography

Key-encapsulation mechanisms secure against chosen ciphertext attacks (IND-CCA-secure KEMs) in the quantum random oracle model have been proposed by Boneh, Dagdelen, Fischlin, Lehmann, Schafner, and Zhandry (CRYPTO 2012), Targhi and Unruh (TCC 2016-B), and Hofheinz, Hövelmanns, and Kiltz (TCC 2017). However, all are non-tight and, in particular, security levels of the schemes obtained by these constructions are less than half of original security levels of their building blocks. In this...

2017/680 (PDF) Last updated: 2017-07-18

SOFIA: MQ-based signatures in the QROM

Ming-Shing Chen, Andreas Hülsing, Joost Rijneveld, Simona Samardjiska, Peter Schwabe

We propose SOFIA, the first MQ-based signature scheme provably secure in the quantum-accessible random oracle model (QROM). Our construction relies on an extended version of Unruh's transform for 5-pass identification schemes that we describe and prove secure both in the ROM and QROM. Based on a detailed security analysis, we provide concrete parameters for SOFIA that achieve 128 bit post-quantum security. The result is SOFIA-4-128 with parameters that are carefully optimized to minimize...

2017/667 (PDF) Last updated: 2017-08-29

High-speed key encapsulation from NTRU

Andreas Hülsing, Joost Rijneveld, John M. Schanck, Peter Schwabe

Implementation

This paper presents software demonstrating that the 20-year-old NTRU cryptosystem is competitive with more recent lattice-based cryptosystems in terms of speed, key size, and ciphertext size. We present a slightly simplified version of textbook NTRU, select parameters for this encryption scheme that target the 128-bit post-quantum security level, construct a KEM that is CCA2-secure in the quantum random oracle model, and present highly optimized software targeting Intel CPUs with the AVX2...

2016/504 (PDF) Last updated: 2016-11-10

Speeding up the Number Theoretic Transform for Faster Ideal Lattice-Based Cryptography

Patrick Longa, Michael Naehrig

Implementation

The Number Theoretic Transform (NTT) provides efficient algorithms for cyclic and nega-cyclic convolutions, which have many applications in computer arithmetic, e.g., for multiplying large integers and large degree polynomials. It is commonly used in cryptographic schemes that are based on the hardness of the Ring Learning With Errors (R-LWE) problem to efficiently implement modular polynomial multiplication. We present a new modular reduction technique that is tailored for the special...

2016/467 (PDF) Last updated: 2016-05-17

Speeding up R-LWE post-quantum key exchange

Shay Gueron, Fabian Schlieker

Implementation

Post-quantum cryptography has attracted increased attention in the last couple of years, due to the threat of quantum computers breaking current cryptosystems. In particular, the key size and performance of post-quantum algorithms became a significant target for optimization. In this spirit, Alkim \etal have recently proposed a significant optimization for a key exchange scheme that is based on the R-LWE problem. In this paper, we build on the implementation of Alkim \etal, and focus on...

2014/170 (PDF) Last updated: 2014-03-04

Parallelized hashing via j-lanes and j-pointers tree modes, with applications to SHA-256

Shay Gueron

Implementation

The j-lanes tree hashing is a tree mode that splits an input message to j slices, computes j independent digests of each slice, and outputs the hash value of their concatenation. The j-pointers tree hashing is a similar tree mode that receives, as input, j pointers to j messages (or slices of a single message), computes their digests and outputs the hash value of their concatenation. Such modes have parallelization capabilities on a hashing process that is serial by nature. As a result, they...

2013/773 (PDF) Last updated: 2013-12-12

CBEAM: Efficient Authenticated Encryption from Feebly One-Way $\phi$ Functions

Markku-Juhani O. Saarinen

Secret-key cryptography

We show how efficient and secure cryptographic mixing functions can be constructed from low-degree rotation-invariant $\phi$ functions rather than conventional S-Boxes. These novel functions have surprising properties; many exhibit inherent feeble (Boolean circuit) one-wayness and offer speed/area tradeoffs unobtainable with traditional constructs. Recent theoretical results indicate that even if the inverse is not explicitly computed in an implementation, its degree plays a fundamental...

2013/759 (PDF) Last updated: 2013-11-22

Vectorization of ChaCha Stream Cipher

Martin Goll, Shay Gueron

Implementation

This paper describes software optimization for the stream Cipher ChaCha. We leverage the wide vectorization capabilities of the new AVX2 architecture, to speed up ChaCha encryption (and decryption) on the latest x86_64 processors. In addition, we show how to apply vectorization for the future AVX512 architecture, and get further speedup. This leads to significant performance gains. For example, on the latest Intel Haswell microarchitecture, our AVX2 implementation performs at 1.43 cycles...

2012/476 (PDF) Last updated: 2012-08-21

A j-lanes tree hashing mode and j-lanes SHA-256

Shay Gueron

Implementation

j-lanes hashing is a tree mode that splits an input message to j slices, computes j independent digests of each slice, and outputs the hash value of their concatenation. We demonstrate the performance advantage of j-lanes hashing on SIMD architectures, by coding a 4-lanes-SHA-256 implementation and measuring its performance on the latest 3rd Generation Intel® Core™. For message ranging 2KB to 132KB in length, the 4-lanes SHA-256 is between 1.5 to 1.97 times faster than the fastest publicly...

2012/371 (PDF) Last updated: 2012-07-05

Simultaneous hashing of multiple messages

Shay Gueron, Vlad Krasnov

Implementation

We describe a method for efficiently hashing multiple messages of different lengths. Such computations occur in various scenarios, and one of them is when an operating system checks the integrity of its components during boot time. These tasks can gain performance by parallelizing the computations and using SIMD architectures. For such scenarios, we compare the performance of a new 4-buffers SHA-256 S-HASH implementation, to that of the standard serial hashing. Our results are measured on...

2012/275 (PDF) Last updated: 2012-05-29

Implementing BLAKE with AVX, AVX2, and XOP

Samuel Neves, Jean-Philippe Aumasson

Implementation

In 2013 Intel will release the AVX2 instructions, which introduce 256-bit single-instruction multiple-data (SIMD) integer arithmetic. This will enable desktop and server processors from this vendor to support 4-way SIMD computation of 64-bit add-rotate-xor algorithms, as well as 8-way 32-bit SIMD computations. AVX2 also includes interesting instructions for cryptographic functions, like any-to-any permute and vectorized table-lookup. In this paper, we explore the potential of AVX2 to...

2012/067 (PDF) Last updated: 2012-06-05

Parallelizing message schedules to accelerate the computations of hash functions

Shay Gueron, Vlad Krasnov

This paper describes an algorithm for accelerating the computations of Davies-Meyer based hash functions. It is based on parallelizing the computation of several message schedules for several message blocks of a given message. This parallelization, together with the proper use of vector processor instructions (SIMD) improves the overall algorithm’s performance. Using this method, we obtain a new software implementation of SHA-256 that performs at 12.11 Cycles/Byte on the 2nd and 10.84...

Search Help

51 results sorted by ID