51 results sorted by ID
Recently, the construction of cryptographic schemes based on hard lattice problems has gained immense popularity. Apart from being quantum resistant, lattice-based cryptography allows a wide range of variations in the underlying hard problem. As cryptographic schemes can work in different environments under different operational constraints such as memory footprint, silicon area, efficiency, power requirement, etc., such variations in the underlying hard problem are very useful for designers...
From 2022, Korean Post-Quantum Cryptography (KpqC) Competition has been held. Among the Round 1 algorithms of KpqC, eight algorithms were selected in December 2023. To evaluate the algorithms, the performance is critical factor. However, the performance of the algorithms submitted to KpqC was evaluated in different development environments. Consequently, it is difficult to compare the performance of each algorithm fairly, because the measurements were not conducted in the identical...
We propose a new NTRU-based Public-Key Encryption (PKE) scheme called $\mathsf{NTRU+}\mathsf{PKE}$, which effectively incorporates the Fujisaki-Okamoto transformation for PKE (denoted as $\mathsf{FO}_{\mathsf{PKE}}$) to achieve chosen-ciphertext security in the Quantum Random Oracle Model (QROM). While $\mathsf{NTRUEncrypt}$, a first-round candidate in the NIST PQC standardization process, was proven to be chosen-ciphertext secure in the Random Oracle Model (ROM), it lacked corresponding...
In this paper, we introduce a new probability function parameter in the instantiations of the Goldreich-Micali-Wigderson with Fiat-Shamir and unbalanced challenges used in ALTEQ, a recent NIST PQC candidate in the call for additional signatures. This probability set at 100% does not bring any changes in the scheme, but modifies the public challenge generation process when below 100%, by injecting potential rejections in otherwise completely valid inputs. From a theoretical point of view,...
In this work we introduce PERK a compact digital signature scheme based on the hardness of a new variant of the Permuted Kernel Problem (PKP). PERK achieves the smallest signature sizes for any PKP-based scheme for NIST category I security with 6 kB, while obtaining competitive signing and verification timings. PERK also compares well with the general state-of-the-art. To substantiate those claims we provide an optimized constant-time AVX2 implementation, a detailed performance analysis and...
Number Theoretic Transform (NTT) has been widely used in accelerating computations in lattice-based cryptography. However, attackers can potentially launch power analysis targeting NTT because it is usually the most time-consuming part of the implementation. This extended time frame provides a natural window of opportunity for attackers. In this paper, we investigate the first CPU frequency leakage (Hertzbleed-like) attacks against NTT in lattice-based KEMs. Our key observation is that...
This work presents the first hardware realisation of the Syndrome-Decoding-in-the-Head (SDitH) signature scheme, which is a candidate in the NIST PQC process for standardising post-quantum secure digital signature schemes. SDitH's hardness is based on conservative code-based assumptions, and it uses the Multi-Party-Computation-in-the-Head (MPCitH) construction. This is the first hardware design of a code-based signature scheme based on traditional decoding problems and only the second for...
We survey various mathematical tools used in software works multiplying polynomials in \[ \frac{\mathbb{Z}_q[x]}{\left\langle {x^n - \alpha x - \beta} \right\rangle}. \] In particular, we survey implementation works targeting polynomial multiplications in lattice-based cryptosystems Dilithium, Kyber, NTRU, NTRU Prime, and Saber with instruction set architectures/extensions Armv7-M, Armv7E-M, Armv8-A, and AVX2. There are three emphases in this paper: (i) modular arithmetic, (ii)...
This paper describes Biscuit, a new multivariate-based signature scheme derived using the MPC-in-the-Head (MPCitH) approach. The security of Biscuit is related to the problem of solving a set of structured quadratic algebraic equations. These equations are highly compact and can be evaluated using very few multiplications (one multiplication per equation). The core of Biscuit is a rather simple MPC protocol for secure multiplications using standard optimized multiplicative triples. This...
MAYO is a popular high-calorie condiment as well as an auspicious candidate in the ongoing NIST competition for additional post-quantum signature schemes achieving competitive signature and public key sizes. In this work, we present high-speed implementations of MAYO using the AVX2 and Armv7E-M instruction sets targeting recent x86 platforms and the Arm Cortex-M4. Moreover, the main contribution of our work is showing that MAYO can be even faster when switching from a bitsliced...
Since 2016’s NIST call for standardization of post-quantum cryptographic primitives, developing efficient post-quantum secure digital signature schemes has become a highly active area of research. The difficulty in constructing such schemes is evidenced by NIST reopening the call in 2022 for digital signature schemes, because of missing diversity in existing proposals. In this work, we introduce the new post-quantum digital signature scheme MiRitH. As direct successor of a scheme recently...
The lattice-based post-quantum cryptosystem NTRU is used by Google for protecting Google’s internal communication. In NTRU, polynomial multiplication is one of bottleneck. In this paper, we explore the interactions between polynomial multiplication, Toeplitz matrix–vector product, and vectorization with architectural insights. For a unital commutative ring $R$, a positive integer $n$, and an element $\zeta \in R$, we reveal the benefit of vector-by-scalar multiplication instructions while...
With the increased use of data and communication through the internet and the abundant misuse of personal data by many organizations, people are more sensitive about their privacy. Privacy-preserving computation is becoming increasingly important in this era. Functional encryption allows a user to evaluate a function on encrypted data without revealing sensitive information. Most implementations of functional encryption schemes are too time-consuming for practical use. Mera et al. first...
We present HAETAE (Hyperball bimodAl modulE rejecTion signAture schemE), a new lattice-based signature scheme. Like the NIST-selected Dilithium signature scheme, HAETAE is based on the Fiat-Shamir with Aborts paradigm, but our design choices target an improved complexity/compactness compromise that is highly relevant for many space-limited application scenarios. We primarily focus on reducing signature and verification key sizes so that signatures fit into one TCP or UDP datagram while...
We conduct a systematic examination of vector arithmetic for polynomial multiplications in software. Vector instruction sets and extensions typically specify a fixed number of registers, each holding a power-of-two number of bits, and support a wide variety of vector arithmetic on registers. Programmers then try to align mathematical computations with the vector arithmetic supported by the designated instruction set or extension. We delve into the intricacies of this process for polynomial...
This paper explores the design space of vector-optimized polynomial multiplications in the lattice-based key-encapsulation mechanisms NTRU and NTRU Prime. Since NTRU and NTRU Prime do not support straightforward applications of number– theoretic transforms, the state-of-the-art vector code either resorted to Toom–Cook, or introduced various techniques for coefficient ring extensions. All these techniques lead to a large number of small-degree polynomial multiplications, which is the...
The US National Institute of Standards and Technology initiated a standardization process for post-quantum cryptography in 2017, with the aim of selecting key encapsulation mechanisms and signature schemes that can withstand the threat from emerging quantum computers. In 2022, Falcon was selected as one of the standard signature schemes, eventually attracting effort to optimize the implementation of Falcon on various hardware architectures for practical applications. Recently, Mitaka was...
Two multivariate digital signature schemes, Rainbow and GeMSS, made it into the third round of the NIST PQC competition. However, either made its way to being a standard due to devastating attacks (in one case by Beullens, the other by Tao, Petzoldt, and Ding). How should multivariate cryptography recover from this blow? We propose that, rather than trying to fix Rainbow and HFEv- by introducing countermeasures, the better approach is to return to the classical Oil and Vinegar scheme. We...
SPHINCS+ was selected as a candidate digital signature scheme for standardization by the NIST Post-Quantum Cryptography Standardization Process. It offers security capabilities relying only on the security of cryptographic hash functions. However, it is less efficient than the lattice-based schemes. In this paper, we present an optimized software library for the SPHINCS+ signature scheme, which combines the Intel® Secure Hash Algorithm Extensions (SHA-NI) and AVX2 vector instructions. We...
The efficiency of constant-time SM4 implementation has been lagging behind that of AES for most internet traffic and applicable data encryption scenarios. The best performance before our works was 3.77 cpb for x86 platform (AESNI + AVX2), and 8.62 cpb for Arm platform (NEON). Meanwhile the state of art constant-time AES implementation could reach 0.63 cpb. Dedicated SM4 instruction set extensions like those optionally available in Armv8.2, could achieve comparable cpb to AES. But they are...
Public-key cryptography, including conventional cryptosystems and post-quantum cryptography, involves computation-intensive workloads. With noticing the extraordinary computing power of AI accelerators, in this paper, we further explore the feasibility to introduce AI accelerators into high-performance cryptographic computing. Since AI accelerators are dedicated to machine learning or neural networks, the biggest challenge is how to transform cryptographic workloads into their operations,...
The NTRU lattice is a promising candidate to construct practical cryptosystems, in particular key encapsulation mechanism (KEM), resistant to quantum computing attacks. Nevertheless, there are still some inherent obstacles to NTRU-based KEM schemes in having integrated performance, taking security, bandwidth, error probability, and computational efficiency {as a whole}, that is as good as and even better than their \{R,M\}LWE-based counterparts. In this work, we solve this problem by...
Privacy preservation is a sensitive issue in our modern society. It is becoming increasingly important in many applications in this ever-growing and highly connected digital era. Functional encryption is a computation on encrypted data paradigm that allows users to retrieve the evaluation of a function on encrypted data without revealing the data, thus effectively protecting users' privacy. However, existing functional encryption implementations are still very time-consuming for practical...
Hash-Based Signature (HBS) uses a hash function to construct a digital signature scheme, where its security is guaranteed by the collision resistance of the hash function used. To provide sufficient security in the post-quantum environment, the length of hash should be satisfied. Modern HBS can be classified into stateful and stateless schemes. Two representative stateful and stateless HBS are XMSS and SPHINCS$^+$, respectively. In this paper, we propose two HBS schemes: K-XMSS...
In this paper, we introduce Scabbard, a suite of post-quantum key-encapsulation mechanisms. Our suite contains three different schemes Florete, Espada, and Sable based on the hardness of module- or ring-learning with rounding problem. In this work, we first show how the latest advancements on lattice-based cryptography can be utilized to create new better schemes and even improve the state-of-the-art on post-quantum cryptography. We put particular focus on designing schemes that can...
BIKE is a key encapsulation mechanism that entered the third round of the NIST post-quantum cryptography standardization process. This paper presents two constant-time implementations for BIKE, one tailored for the Intel Haswell and one tailored for the ARM Cortex-M4. Our Haswell implementation is much faster than the avx2 implementation written by the BIKE team: for bikel1, the level-1 parameter set, we achieve a 1.39x speedup for decapsulation (which is the slowest operation) and a 1.33x...
We investigate the efficiency of a (module-)LWR-based PRF built using the GGM design. Our construction enjoys the security proof of the GGM construction and the (module-)LWR hardness assumption which is believed to be post-quantum secure. We propose GGM-based PRFs from PRGs with larger ratio of output to input. This reduces the number of PRG invocations which improves the PRF performance and reduces the security loss in the GGM security reduction. Our construction bridges the gap between...
Tensor core is a specially designed hardware included in new NVIDIA GPU chips, aimed at accelerating deep learning applications. With the introduction of tensor core, the matrix multiplication at low precision can be computed much faster than using conventional integer and floating point units in NVIDIA GPU. In the past, applications of tensor core were mainly restricted to machine learning and mixed precision scientific computing. In this paper, we show that for the first time, tensor core...
In this paper, we show how multiplication for polynomial rings used in the NIST PQC finalists Saber and NTRU can be efficiently implemented using the Number-theoretic transform (NTT). We obtain superior performance compared to the previous state of the art implementations using Toom–Cook multiplication on both NIST’s primary software optimization targets AVX2 and Cortex-M4. Interestingly, these two platforms require different approaches: On the Cortex-M4, we use 32-bit NTT-based polynomial...
We suggest a small change to the Dilithium signature scheme, that allows one to reuse computations between rejected nonces, for a speed-up in signing time. The modification is based on the idea that, after rejecting on a too large $\|\mathbf{r}_0\|_\infty$, not all elements of the nonce $\mathbf{y}$ are spent. We swap the order of the checks; and if this $\mathbf{r}_0$-check fails, we only need to resample $y_1$. We provide a proof that shows that the modification does not affect the...
We present a new methodology for building formally verified cryptographic libraries that are optimized for multiple architectures. In particular, we show how to write and verify generic crypto code in the F* programming language that exploits single-instruction multiple data (SIMD) parallelism. We show how this code can be compiled to platforms that supports vector instructions, including ARM Neon and Intel AVX, AVX2, and AVX512. We apply our methodology to obtain verified vectorized...
We present an optimization of Lagrange's algorithm for lattice basis reduction in dimension 2. The optimized algorithm is proven to be correct and to always terminate with quadratic complexity; it uses more iterations on average than Lagrange's algorithm, but each iteration is much simpler to implement, and faster. The achieved speed is such that it makes application of the speed-up on ECDSA and EC Schnorr signatures described by Antipa et al worthwhile, even for very fast curves such...
Rainbow is a signature scheme that is based on multivariate polynomials. It is one of the Round-2 candidates of the NIST’s Post-Quantum Cryptography Standardization project. Its computations rely heavily on GF(2^8) arithmetic and the Rainbow submission optimizes the code by using AVX2 shuffle and permute instructions. In this paper, we show a new optimization that leverages: a) AVX512 architecture; b) the latest processor capabilities Galois Field New Instructions(GF-NI), available on Intel...
Saber is a module lattice-based CCA-secure key encapsulation mechanism (KEM) which has been shortlisted for the second round of NIST's Post Quantum Cryptography Standardization project. To attain simplicity and efficiency on constrained devices, the Saber algorithm is serial by construction. However, on high-end platforms, such as modern Intel processors with AVX2 instructions, Saber achieves limited speedup using vector processing instructions due to its serial nature. In this paper we...
We present qTESLA, a family of post-quantum digital signature schemes that exhibits several attractive features such as simplicity and strong security guarantees against quantum adversaries, and built-in protection against certain side-channel and fault attacks. qTESLA---selected for round 2 of NIST's post-quantum cryptography standardization project---consolidates a series of recent schemes originating in works by Lyubashevsky, and Bai and Galbraith. We provide full-fledged, constant-time...
We present NTTRU -- an IND-CCA2 secure NTRU-based key encapsulation scheme that uses the number theoretic transform (NTT) over the cyclotomic ring $Z_{7681}[X]/(X^{768}-X^{384}+1)$ and produces public keys and ciphertexts of approximately $1.25$ KB at the $128$-bit security level. The number of cycles on a Skylake CPU of our constant-time AVX2 implementation of the scheme for key generation, encapsulation and decapsulation is approximately $6.4$K, $6.1$K, and $7.9$K, which is more than...
In this paper, we introduce Saber, a package of cryptographic primitives whose security relies on the hardness of the Module Learning With Rounding problem (Mod-LWR). We first describe a secure Diffie-Hellman type key exchange protocol, which is then transformed into an IND-CPA encryption scheme and finally into an IND-CCA secure key encapsulation mechanism using a post-quantum version of the Fujisaki-Okamoto transform. The design goals of this package were simplicity, efficiency and...
NTRUEncrypt is one of the most promising candidates for quantum-safe cryptography. In this paper, we focus on the NTRU743 paramter set. We give a report on all known attacks against this parameter set and show that it delivers 256 bits of security against classical attackers and 128 bits of security against quantum attackers. We then present a parameter-dependent optimization using a tailored hierarchy of multipli- cation algorithms as well as the Intel AVX2 instructions, and show that this...
Constant-time polynomial multiplication is one of the most time-consuming operations in many lattice-based cryptographic constructions. For schemes based on the hardness of Ring-LWE in power-of-two cyclotomic fields with completely splitting primes, the AVX2 optimized implementation of the Number-Theoretic Transform (NTT) from the NewHope key-exchange scheme is the state of the art for fast multiplication. It uses floating point vector instructions. We show that by using a modification of...
Key-encapsulation mechanisms secure against chosen ciphertext attacks (IND-CCA-secure KEMs) in the quantum random oracle model have been proposed by Boneh, Dagdelen, Fischlin, Lehmann, Schafner, and Zhandry (CRYPTO 2012), Targhi and Unruh (TCC 2016-B), and Hofheinz, Hövelmanns, and Kiltz (TCC 2017). However, all are non-tight and, in particular, security levels of the schemes obtained by these constructions are less than half of original security levels of their building blocks. In this...
We propose SOFIA, the first MQ-based signature scheme provably secure in the quantum-accessible random oracle model (QROM). Our construction relies on an extended version of Unruh's transform for 5-pass identification schemes that we describe and prove secure both in the ROM and QROM. Based on a detailed security analysis, we provide concrete parameters for SOFIA that achieve 128 bit post-quantum security. The result is SOFIA-4-128 with parameters that are carefully optimized to minimize...
This paper presents software demonstrating that the 20-year-old NTRU cryptosystem is competitive with more recent lattice-based cryptosystems in terms of speed, key size, and ciphertext size. We present a slightly simplified version of textbook NTRU, select parameters for this encryption scheme that target the 128-bit post-quantum security level, construct a KEM that is CCA2-secure in the quantum random oracle model, and present highly optimized software targeting Intel CPUs with the AVX2...
The Number Theoretic Transform (NTT) provides efficient algorithms for cyclic and nega-cyclic convolutions, which have many applications in computer arithmetic, e.g., for multiplying large integers and large degree polynomials. It is commonly used in cryptographic schemes that are based on the hardness of the Ring Learning With Errors (R-LWE) problem to efficiently implement modular polynomial multiplication. We present a new modular reduction technique that is tailored for the special...
Post-quantum cryptography has attracted increased attention in the last couple of years, due to the threat of quantum computers breaking current cryptosystems. In particular, the key size and performance of post-quantum algorithms became a significant target for optimization. In this spirit, Alkim \etal have recently proposed a significant optimization for a key exchange scheme that is based on the R-LWE problem. In this paper, we build on the implementation of Alkim \etal, and focus on...
The j-lanes tree hashing is a tree mode that splits an input message to j slices, computes j independent digests of each slice, and outputs the hash value of their concatenation. The j-pointers tree hashing is a similar tree mode that receives, as input, j pointers to j messages (or slices of a single message), computes their digests and outputs the hash value of their concatenation. Such modes have parallelization capabilities on a hashing process that is serial by nature. As a result, they...
We show how efficient and secure cryptographic mixing functions can be constructed from low-degree rotation-invariant $\phi$ functions rather than conventional S-Boxes. These novel functions have surprising properties; many exhibit inherent feeble (Boolean circuit) one-wayness and offer speed/area tradeoffs unobtainable with traditional constructs. Recent theoretical results indicate that even if the inverse is not explicitly computed in an implementation, its degree plays a fundamental...
This paper describes software optimization for the stream Cipher ChaCha. We leverage the wide vectorization capabilities of the new AVX2 architecture, to speed up ChaCha encryption (and decryption) on the latest x86_64 processors. In addition, we show how to apply vectorization for the future AVX512 architecture, and get further speedup. This leads to significant performance gains. For example, on the latest Intel Haswell microarchitecture, our AVX2 implementation performs at 1.43 cycles...
j-lanes hashing is a tree mode that splits an input message to j slices, computes j independent digests of each slice, and outputs the hash value of their concatenation. We demonstrate the performance advantage of j-lanes hashing on SIMD architectures, by coding a 4-lanes-SHA-256 implementation and measuring its performance on the latest 3rd Generation Intel® Core™. For message ranging 2KB to 132KB in length, the 4-lanes SHA-256 is between 1.5 to 1.97 times faster than the fastest publicly...
We describe a method for efficiently hashing multiple messages of different lengths. Such computations occur in various scenarios, and one of them is when an operating system checks the integrity of its components during boot time. These tasks can gain performance by parallelizing the computations and using SIMD architectures. For such scenarios, we compare the performance of a new 4-buffers SHA-256 S-HASH implementation, to that of the standard serial hashing. Our results are measured on...
In 2013 Intel will release the AVX2 instructions, which introduce 256-bit single-instruction multiple-data (SIMD) integer arithmetic. This will enable desktop and server processors from this vendor to support 4-way SIMD computation of 64-bit add-rotate-xor algorithms, as well as 8-way 32-bit SIMD computations. AVX2 also includes interesting instructions for cryptographic functions, like any-to-any permute and vectorized table-lookup. In this paper, we explore the potential of AVX2 to...
This paper describes an algorithm for accelerating the computations of Davies-Meyer based hash functions. It is based on parallelizing the computation of several message schedules for several message blocks of a given message. This parallelization, together with the proper use of vector processor instructions (SIMD) improves the overall algorithm’s performance. Using this method, we obtain a new software implementation of SHA-256 that performs at 12.11 Cycles/Byte on the 2nd and 10.84...