459 results sorted by ID
Masking schemes are key in thwarting side-channel attacks due to their robust theoretical foundation. Transitioning from Boolean to arithmetic (B2A) masking is a necessary step in various cryptography schemes, including hash functions, ARX-based ciphers, and lattice-based cryptography. While there exists a significant body of research focusing on B2A software implementations, studies pertaining to hardware implementations are quite limited, with the majority dedicated solely to creating...
Datasets of side-channel leakage measurements are widely used in research to develop and benchmarking side-channel attack and evaluation methodologies. Compared to using custom and/or one-off datasets, widely-used and publicly available datasets improve research reproducibility and comparability. Further, performing high-quality measurements requires specific equipment and skills, while also taking a significant amount of time. Therefore, using publicly available datasets lowers the barriers...
True Random Number Generators (TRNGs) and Physically Unclonable Functions (PUFs) are critical hardware primitives for cryptographic systems, providing randomness and device-specific security. TRNGs require complete randomness, while PUFs rely on consistent, device-unique responses. In this work, both primitives are implemented on a System-on-Chip Field-Programmable Gate Array (SoC FPGA), leveraging the integrated Phase-Locked Loops (PLLs) for robust entropy generation in PLLbased TRNGs. A...
The RISC-V Vector Cryptography Extensions (Zvk) were ratified in 2023 and integrated into the main ISA manuals in 2024. These extensions support high-speed symmetric cryptography (AES, SHA2, SM3, SM4) operating on the vector register file and offer significant performance improvements over scalar cryptography extensions (Zk) due to data parallelism. As a ratified extension, Zvk is supported by compiler toolchains and is already being integrated into popular cryptographic middleware such as...
Physically (or Physical) Unclonable Functions (PUFs) are basic and useful primitives in designing cryptographic systems. PUFs are designed to facilitate device authentication, secure boot, firmware integrity, and secure communications. To achieve these objectives, PUFs must exhibit both consistent repeatability and instance-specific randomness. The Arbiter PUF (APUF), recognized as the first silicon PUF, is capable of generating a substantial number of secret keys instantaneously based on...
Phase-locked loops (PLLs) integrated within field-programmable gate arrays (FPGAs) or System-on-Chip FPGAs (SoCs) represent a promising approach for generating random numbers. Their widespread deployment, isolated functionality within these devices, and robust entropy, as demonstrated in prior studies, position PLL-based true random number generators (PLL-TRNGs) as highly viable solutions for this purpose. This study explicitly examines PLL-TRNG implementations using the ZC702 Rev1.1...
Recently, the construction of cryptographic schemes based on hard lattice problems has gained immense popularity. Apart from being quantum resistant, lattice-based cryptography allows a wide range of variations in the underlying hard problem. As cryptographic schemes can work in different environments under different operational constraints such as memory footprint, silicon area, efficiency, power requirement, etc., such variations in the underlying hard problem are very useful for designers...
Kyber was selected by NIST as a Post-Quantum Cryptography Key Encapsulation Mechanism standard. This means that the industry now needs to transition and adopt these new standards. One of the most demanding operations in Kyber is the modular arithmetic, making it a suitable target for optimization. This work offers a novel modular reduction design with the lowest area on Xilinx FPGA platforms. This novel design, through K-reduction and LUT-based reduction, utilizes 49 LUTs and 1 DSP...
A fixed length tweakable enciphering scheme (TES) is the appropriate cryptographic functionality for low level disk encryption. Research on TES over the last two decades have led to a number of proposals many of which have already been implemented using FPGAs. This paper considers the FPGA implementations of two more recent and promising TESs, namely AEZ and FAST. The relevant architectures are described and simulation results on the Xilinx Virtex 5 and Virtex 7 FPGAs are presented. For...
The Extended Greatest Common Divisor (XGCD) computation is a critical component in various cryptographic applications and algorithms, including both pre- and post-quantum cryptosystems. In addition to computing the greatest common divisor (GCD) of two integers, the XGCD also produces Bezout coefficients $b_a$ and $b_b$ which satisfy $\mathrm{GCD}(a,b) = a\times b_a + b\times b_b$. In particular, computing the XGCD for large integers is of significant interest. Most recently, XGCD computation...
Multi-scalar multiplication (MSM) is the most computation-intensive part in proof generation of Zero-knowledge proof (ZKP). In this paper, we propose MSMAC, an FPGA accelerator for large-scale MSM. MSMAC adopts a specially designed Instruction Set Architecture (ISA) for MSM and optimizes pipelined Point Addition Unit (PAU) with hybrid Karatsuba multiplier. Moreover, a runtime system is proposed to split MSM tasks with the optimal sub-task size and orchestrate execution of Processing Elements...
The field of post-quantum cryptography (PQC) is continuously evolving. Many researchers are exploring efficient PQC implementation on various platforms, including x86, ARM, FPGA, GPU, etc. In this paper, we present an Efficient CryptOgraphy CRYSTALS (ECO-CRYSTALS) implementation on standard 64-bit RISC-V Instruction Set Architecture (ISA). The target schemes are two winners of the National Institute of Standards and Technology (NIST) PQC competition: CRYSTALS-Kyber and CRYSTALS-Dilithium,...
Resource-constrained devices such as wireless sensors and Internet of Things (IoT) devices have become ubiquitous in our digital ecosystem. These devices generate and handle a major part of our digital data. In the face of the impending threat of quantum computers on our public-key infrastructure, it is impossible to imagine the security and privacy of our digital world without integrating post-quantum cryptography (PQC) into these devices. Usually, due to the resource constraints of these...
This paper proposes a fast and efficient FPGA-based hardware-software co-design for the supersingular isogeny key encapsulation (SIKE) protocol controlled by a custom RISC-V processor. Firstly, we highly optimize the core unit, the polynomial-based field arithmetic logic unit (FALU), with the proposed fast convolution-like multiplier (FCM) to significantly reduce the resource consumption while still maintaining low latency and constant time for all the four SIKE parameters. Secondly, we pack...
Garbled circuits (GC) are a secure multiparty computation protocol that enables two parties to jointly compute a function using their private data without revealing it to each other. While garbled circuits are proven secure at the protocol level, implementations can still be vulnerable to side-channel attacks. Recently, side-channel analysis of GC implementations has garnered significant interest from researchers. We investigate popular open-source GC frameworks and discover that the AES...
In this work, we present various hardware implementations for the lightweight cipher ASCON, which was recently selected as the winner of the NIST organized Lightweight Cryptography (LWC) competition. We cover encryption + tag generation and decryption + tag verification for the ASCON AEAD and also the ASCON hash function. On top of the usual (unprotected) implementation, we present side-channel protection (threshold countermeasure) and triplication/majority-based fault protection. To the...
We present a novel approach to small area and low-latency first-order masking in hardware. The core idea is to separate the processing of shares in time in order to achieve non-completeness. Resulting circuits are proven first-order glitch-extended PINI secure. This means the method can be straightforwardly applied to mask arbitrary functions without constraints which the designer must take care of. Furthermore we show that an implementation can benefit from optimization through EDA tools...
Masking is one of the most popular countermeasures to protect implementations against power and electromagnetic side channel attacks, because it offers provable security. Masking has been shown secure against d-threshold probing adversaries by Ishai et al. at CRYPTO'03, but this adversary's model doesn't consider any physical hardware defaults and thus such masking schemes were shown to be still vulnerable when implemented as hardware circuits. To addressed these limitations glitch-extended...
In this paper, we introduce a new approach to secure computing by implementing a platform that utilizes an NVMe-based system with an FPGA-based Torus FHE accelerator, SSD, and middleware on the host-side. Our platform is the first of its kind to offer complete secure computing capabilities for TFHE using an FPGA-based accelerator. We have defined secure computing instructions to evaluate 14-bit to 14-bit functions using TFHE, and our middleware allows for communication of ciphertexts, keys,...
Zero-knowledge proof systems are widely used in different applications on the Internet. Among zero-knowledge proof systems, SNARKs are a popular choice because of their fast verification time and small proof size. The efficiency of zero-knowledge systems is crucial for usability, resulting in the development of so-called arithmetization-oriented ciphers. In this work, we introduce Vision Mark-32, a modified instance of Vision defined over binary tower fields, with an optimized number of...
The expansion of flip-chip technologies and a lack of backside protection make the integrated circuit (IC) vulnerable to certain classes of physical attacks mounted from the IC's backside. Laser-assisted probing, electromagnetic, and body-biasing injection attacks are examples of such attacks. Unfortunately, there are few countermeasures proposed in the literature, and none are available commercially. Those that do exist are not only expensive but also incompatible with current IC...
This paper introduces a high-performance and scalable hardware architecture designed for the Number-Theoretic Transform (NTT), a fundamental component extensively utilized in lattice-based encryption and fully homomorphic encryption schemes. The underlying rationale behind this research is to harness the advantages of the hypercube topology. This topology serves to significantly diminish the volume of data exchanges required during each iteration of the NTT, reducing it to a complexity of...
A recent work from Eurocrypt 2023 suggests that prime-field masking has excellent potential to improve the efficiency vs. security tradeoff of masked implementations against side-channel attacks, especially in contexts where physical leakages show low noise. We pick up on the main open challenge that this seed result leads to, namely the design of an optimized prime cipher able to take advantage of this potential. Given the interest of tweakable block ciphers with cheap inverses in many...
Fully Homomorphic Encryption (FHE) enables computation on encrypted data, holding immense potential for enhancing data privacy and security in various applications. Presently, FHE adoption is hindered by slow computation times, caused by data being encrypted into large polynomials. Optimized FHE libraries and hardware acceleration are emerging to tackle this performance bottleneck. Often, these libraries implement the Number Theoretic Transform (NTT) algorithm for efficient polynomial...
A universal circuit (UC) can be thought of as a programmable circuit that can simulate any circuit up to a certain size by specifying its secret configuration bits. UCs have been incorporated into various applications, such as private function evaluation (PFE). Recently, studies have attempted to formalize the concept of semiconductor intellectual property (IP) protection in the context of UCs. This is despite the observations made in theory and practice that, in reality, the adversary may...
This work presents the first hardware realisation of the Syndrome-Decoding-in-the-Head (SDitH) signature scheme, which is a candidate in the NIST PQC process for standardising post-quantum secure digital signature schemes. SDitH's hardness is based on conservative code-based assumptions, and it uses the Multi-Party-Computation-in-the-Head (MPCitH) construction. This is the first hardware design of a code-based signature scheme based on traditional decoding problems and only the second for...
As the message recovery-based attack poses a serious threat to lattice-based schemes, we conducted a study on the side-channel secu- rity of parallel implementations of lattice-based key encapsulation mech- anisms. Initially, we developed a power model to describe the power leakage during message encoding. Utilizing this power model, we pro- pose a multi-ciphertext message recovery attack, which can retrieve the required messages for a chosen ciphertext attack through a suitable mes- sage...
During the standardisation process of post-quantum cryptography, NIST encourages research on side-channel analysis for candidate schemes. As the recommended lattice signature scheme, CRYSTALS-Dilithium, when implemented on hardware, has only been subjected to the side-channel attack presented by Steffen et al. in IACR ePrint 2022. This attack is not complete and requires excessive traces. Therefore, we investigate the leakage of an FPGA (Kintex7) implementation of CRYSTALS-Dilithium using...
Falcon is one out of three post-quantum signature schemes which have been selected for standardization by NIST in July 2022. To the best of our knowledge, Falcon is the only selected algorithm that does not yet have a publicly reported hardware description that performs signing or key generation. The reason might be that the Falcon signature and key generation algorithms do not fit well in hardware due to the use of floating-point numbers and recursive functions. This publication describes...
The evolution of quantum algorithms threatens to break public key cryptography in polynomial time. The development of quantum-resistant algorithms for the post-quantum era has seen a significant growth in the field of post quantum cryptography (PQC). Polynomial multiplication is the core of Ring Learning with Error (RLWE) lattice based cryptography (LBC) which is one of the most promising PQC candidates. In this work, we present the design of fast and energy-efficient pipelined Number...
Homomorphic encryption (HE) has gained broad attention in recent years as it allows computations on encrypted data enabling secure cloud computing. Deploying HE presents a notable challenge since it introduces a performance overhead by orders of magnitude. Hence, most works target accelerating server-side operations on hardware platforms, while little attention has been given to client-side operations. In this paper, we present a novel design methodology to implement and accelerate the...
Over the last decades, fault injection attacks have been demonstrated to be an effective method for breaking the security of electronic devices. Some types of fault injection attacks, like clock and voltage glitching, require very few resources by the attacker and are practical and simple to execute. A cost-effective countermeasure against these attacks is the use of a detector circuit which detects timing violations - the underlying effect that glitch attacks rely on. In this paper, we take...
The Number Theoretic Transform (NTT) is a powerful mathematical tool with a wide range of applications in various fields, including signal processing, cryptography, and error correction codes. In recent years, there has been a growing interest in efficiently implementing the NTT on hardware platforms for lattice-based cryptography within the context of NIST's Post-Quantum Cryptography (PQC) competition. The implementation of NTT in cryptography stands as a pivotal advancement,...
The emergence of hardware trojans as significant threats in various aspects of hardware design, including Firmware, open-source IP, and PCB design, has raised serious concerns. Simultaneously, AI technologies have been employed to simplify the complexity of Side Channel Analysis (SCA) attacks. Due to the increasing risk posed by these threats, it becomes essential to test hardware by considering all possible attack vectors. This paper aims to propose a black box attack using...
Fault attacks impose a serious threat against the practical implementations of cryptographic algorithms. Statistical Ineffective Fault Attacks (SIFA), exploiting the dependency between the secret data and the fault propagation overcame many of the known countermeasures. Later, several countermeasures have been proposed to tackle this attack using error detection methods. However, the efficiency of the countermeasures, in part governed by the number of error checks, still remains a...
Modern block ciphers designed for hardware and masked with Threshold Implementations (TIs) provide provable security against first-order attacks. However, the application of TIs leaves designers to deal with a trade-off between its security and its cost, for example, the process to generate its required random bits. This generation cost comes with an increased overhead in terms of area and latency. Decreasing the number of random bits for the masking allows to reduce the aforementioned...
The recently announced National Institute of Standards and Technology (NIST) Post-quantum cryptography (PQC) third-round standardization process has released its candidates to be standardized and Falcon is one of them. On the other hand, however, very few hardware implementation works for Falcon have been released due to its very complicated computation procedure and intensive complexity. With this background, in this paper, we propose an efficient hardware structure to implement residue...
NIST issued a new call in 2023 to diversify the portfolio of quantum-resistant digital signature schemes since the current portfolio relies on lattice problems. The MAYO scheme, which builds on the Unbalanced Oil and Vinegar (UOV) problem, is a promising candidate for this new call. MAYO introduces emulsifier maps and a novel 'whipping' technique to significantly reduce the key sizes compared to previous UOV schemes. This paper provides a comprehensive analysis of the implementation...
We show that the key agreement scheme [IEEE Trans. Veh. Technol. 71(4): 3470-3479, 2022] fails to keep user anonymity, not as claimed.
Electromagnetic Fault Injection (EMFI) has been demonstrated to be useful for both academic and industrial research. Due to the dangerous voltages involved, most work is done with commercial tools. This paper introduces a safety-focused low-cost and open-source design that can be built for less than \$50 using only off-the-shelf parts. The paper also introduces an iCE40 based Time-to-Digital Converter (TDC), which is used to visualize the glitch inserted by the EMFI tool. This...
MAYO is a topical modification of the established multivariate signature scheme UOV. Signer and Verifier locally enlarge the public key map, such that the dimension of the oil space and therefore, the parameter sizes in general, can be reduced. This significantly reduces the public key size while maintaining the appealing properties of UOV, like short signatures and fast verification. Therefore, MAYO is considered as an attractive candidate in the NIST call for additional digital signatures...
Masking is a prominent strategy to protect cryptographic implementations against side-channel analysis. Its popularity arises from the exponential security gains that can be achieved for (approximately) quadratic resource utilization. Many variants of the countermeasure tailored for different optimization goals have been proposed. The common denominator among all of them is the implicit demand for robust and high entropy randomness. Simply assuming that uniformly distributed random bits are...
Deploying cryptography on embedded systems requires security against physical attacks. At CHES 2019, M&M was proposed as a combined countermeasure applying masking against SCAs and information-theoretic MAC tags against FAs. In this paper, we show that one of the protected AES implementations in the M&M paper is vulnerable to a zero-value SIFA2-like attack. A practical attack is demonstrated on an ASIC board. We propose two versions of the attack: the first follows the SIFA approach to...
Masking is a well-studied method for achieving provable security against side-channel attacks. In masking, each sensitive variable is split into $d$ randomized shares, and computations are performed with those shares. In addition to the computational overhead of masked arithmetic, masking also has a storage cost, increasing the requirements for working memory and secret key storage proportionally with $d$. In this work, we introduce mask compression. This conceptually simple technique is...
NIST has recently selected CRYSTALS-Kyber as a new public key encryption and key establishment algorithm to be standardized. This makes it important to evaluate the resistance of CRYSTALS-Kyber implementations to side-channel attacks. Software implementations of CRYSTALS-Kyber have already been thoroughly analysed. The discovered vulnerabilities helped improve the subsequently released versions and promoted stronger countermeasures against side-channel attacks. In this paper, we present the...
Persistent fault analysis is a novel and efficient cryptanalysis method. The persistent fault attacks take advantage of a persistent fault injected in a non-volatile memory, then present on the device until the reboot of the device. Contrary to classical physical fault injection, where differential analysis can be performed, persistent fault analysis requires new analyses and dedicated countermeasures. Persistent fault analysis requires a persistent fault injected in the S-box such that the...
FPGAs have been used in the cloud since several years, as accelerators for various workloads such as machine learning, database processes and security tasks. As for other cloud services, a highly desired feature is virtualization in which multiple tenants can share a single FPGA to increase utilization and by that efficiency. By solely using standard FPGA logic in the untrusted tenant, on-chip logic sensors allow remote power analysis side-channel and covert channel attacks on the victim...
Sampling random values from a discrete Gaussian distribution with high precision is a major and computationally intensive operation of upcoming or existing cryptographic standards. FALCON is one such algorithm that the National Institute of Standards and Technology chose to standardize as a next-generation, quantum-secure digital signature algorithm. The discrete Gaussian sampling of FALCON has both flexibility and efficiency needs—it constitutes 72% of total signature generation in...
Masking is considered to be an essential defense mechanism against side-channel attacks, but it is challenging to be adopted for hardware cryptographic implementations, especially for high security orders. Recently, Knichel et al. proposed an automated tool called AGEMA that enables the generation of masked implementations in hardware for arbitrary security orders using composable gadgets. This accelerates the construction and practical application of masking schemes. This article proposes a...
The rise of low-power, cost-efficient internet-connected devices has led to a need for lightweight cryptography. The lightweight block cipher PRIDE, designed by Martin R. Albrecht, is one of the most efficient ciphers designed for IoT-constrained environments. It is useful for connected devices, requires fewer resources to implement, and has high performance. PRIDE is a software-oriented lightweight cipher optimized for microcontrollers. This paper focuses on the FPGA implementation of the...
The magic of Fully Homomorphic Encryption (FHE) is that it allows operations on encrypted data without decryption. Unfortunately, the slow computation time limits their adoption. The slow computation time results from the vast memory requirements (64Kbits per ciphertext), a bootstrapping key of 1.3 GB, and sizeable computational overhead (10240 NTTs, each NTT requiring 5120 32-bit multiplications). We accelerate the FHEW bootstrapping in hardware on a high-end U280 FPGA. To reduce the...
Implementation of Fully Homomorphic Encryption (FHE) is challenging. Especially when considering hardware acceleration, the major performance bottleneck is data transfer. Here we propose an algebraic framework called Heterogenous Lattice Graph (HLG) to build and process computing graphs in Residue Number System (RNS), which is the basis of high performance implementation of mainstream FHE algorithms. There are three main design goals for HLG framework: • Design a dedicated IR (HLG...
FPGA-SoCs are a popular platform for accelerating a wide range of applications due to their performance and flexibility. From a security point of view, these systems have been shown to be vulnerable to various attacks, especially side-channel attacks where an attacker can obtain the secret key of a cryptographic algorithm via laboratory mea- surement equipment or even remotely with sensors implemented inside the FPGA logic itself. Fortunately, a variety of countermeasures on...
During the last decade, there has been a stunning progress in the domain of AI with adoption in both safety-critical and security-critical applications. A key requirement for this is highly trained Machine Learning (ML) models, which are valuable Intellectual Property (IP) of the respective organizations. Naturally, these models have become targets for model recovery attacks through side-channel leakage. However, majority of the attacks reported in literature are either on simple embedded...
The development of Artificial Intelligence (AI) based systems to automatically generate hardware systems has gained an impulse that aims to accelerate the hardware design cycle with no human intervention. Recently, the striking AI-based system ChatGPT from OpenAI has achieved a momentous headline and has gone viral within a short span of time since its launch. This chatbot has the capability to interactively communicate with the designers through a prompt to generate software and hardware...
Streamlined NTRU Prime is a lattice-based Key Encapsulation Mechanism (KEM) that is, together with X25519, currently the default algorithm in OpenSSH 9. Being based on lattice assumptions, it is assumed to be secure also against attackers with access to large-scale quantum computers. While Post-Quantum Cryptography (PQC) schemes have been subject to extensive research in the recent years, challenges remain with respect to protection mechanisms against attackers that have additional...
Masking has become one of the most effective approaches for securing hardware designs against side-channel attacks. Regardless of the effort put into correctly implementing masking schemes on a field-programmable gate array (FPGA), leakage can be unexpectedly observed. This is due to the fact that the assumption underlying all masked designs, i.e., the leakages of different shares are independent of each other, may no longer hold in practice. In this regard, extreme temperatures have been...
Two multivariate digital signature schemes, Rainbow and GeMSS, made it into the third round of the NIST PQC competition. However, either made its way to being a standard due to devastating attacks (in one case by Beullens, the other by Tao, Petzoldt, and Ding). How should multivariate cryptography recover from this blow? We propose that, rather than trying to fix Rainbow and HFEv- by introducing countermeasures, the better approach is to return to the classical Oil and Vinegar scheme. We...
The goals of cryptography are achieved using mathematically strong crypto-algorithms, which are adopted for securing data and communication. Even though the algorithms are mathematically secure, the implementation of these algorithms may be vulnerable to side-channel attacks such as timing and power analysis attacks. One of the effective countermeasures against such attacks is Threshold Implementation(TI). However, TI realization in crypto-device introduces hardware complexity, so it...
State-of-the-art sensors for measuring FPGA voltage fluctuations are time-to-digital converters (TDCs). They allow detecting voltage fluctuations in the order of a few nanoseconds. The key building component of a TDC is a delay line, typically implemented as a chain of fast carry propagation multiplexers. In FPGAs, the fast carry chains are constrained to dedicated logic and routing, and need to be routed strictly vertically. In this work, we present an alternative approach to designing...
TRNG is an essential component for security applications. A vulnerable TRNG could be exploited to facilitate potential attacks or be related to a reduced key space, and eventually results in a compromised cryptographic system. A digital FIRO-/GARO-based TRNG with high throughput and high entropy rate was introduced by Jovan Dj. Golić (TC’06). However, the fact that periodic oscillation is a main failure of FIRO-/GARO-based TRNGs is noticed in the paper (Markus Dichtl, ePrint’15). We verify...
The sampling of polynomials with fixed weight is a procedure required by round-4 Key Encapsulation Mechanisms (KEMs) for Post-Quantum Cryptography (PQC) standardization (BIKE, HQC, McEliece) as well as NTRU, Streamlined NTRU Prime, and NTRU LPRrime. Recent attacks have shown in this context that side-channel leakage of sampling methods can be exploited for key recoveries. While countermeasures regarding such timing attacks have already been presented, still, there is no comprehensive work...
To mitigate cybersecurity breaches, secure communication is crucial for the Internet of Things (IoT) environment. Data integrity is one of the most significant characteristics of security, which can be achieved by employing cryptographic hash functions. In view of the demand from IoT applications, the National Institute of Standards and Technology (NIST) initiated a standardization process for lightweight hash functions. This work presents field-programmable gate array (FPGA) implementations...
Number-Theoretic-Transform (NTT) is a variation of Fast-Fourier-Transform (FFT) on finite fields. NTT is being increasingly used in blockchain and zero-knowledge proof applications. Although FFT and NTT are widely studied for FPGA implementation, we believe CycloneNTT is the first to solve this problem for large data sets ($\ge2^{24}$, 64-bit numbers) that would not fit in the on-chip RAM. CycloneNTT uses a state-of-the-art butterfly network and maps the dataflow to hybrid FIFOs composed of...
Fully Homomorphic Encryption (FHE) is a technique that allows computation on encrypted data. It has the potential to drastically change privacy considerations in the cloud, but high computational and memory overheads are preventing its broad adoption. TFHE is a promising Torus-based FHE scheme that heavily relies on bootstrapping, the noise-removal tool invoked after each encrypted logical/arithmetical operation. We present FPT, a Fixed-Point FPGA accelerator for TFHE bootstrapping. FPT...
Security against side-channel assisted attacks remains a focus and concern in the ongoing standardization process of quantum-computer-resistant cryptography algorithms. Hiding and masking techniques are currently under investigation to protect the Post-Quantum Cryptography (PQC) algorithms in the NIST PQC standardization process against sophisticated side-channel attacks. Between hiding and masking, masking is emerging as a popular option due to its simplicity and minimized cost of...
CRYSTALS-Kyber has been recently selected by the NIST as a new public-key encryption and key-establishment algorithm to be standardized. This makes it important to assess how well CRYSTALS-Kyber implementations withstand side-channel attacks. Software implementations of CRYSTALS-Kyber have been already analyzed and the discovered vulnerabilities were patched in the subsequently released versions. In this paper, we present a profiling side-channel attack on a hardware implementation of...
While hardware implementations allow the production of highly efficient and performance oriented designs, exploiting features such as parallelization, their longer time to code and implement often bottlenecks rapid prototyping. On the other hand, high-level synthesis (HLS) tools allow for faster experimentation of software code to a hardware platform while demonstrating a reasonable extrapolation of the expected hardware behavior. In this work, we attempt to show a rapid, fast prototyping of...
Clock randomization is one of the oldest countermeasures against side-channel attacks. Various implementations have been presented in the past, along with positive security evaluations. However, in this paper we show that it is possible to break countermeasures based on a randomized clock by sampling side-channel measurements at a frequency much higher than the encryption clock, synchronizing the traces with pre-processing, and targeting the beginning of the encryption. We demonstrate a...
The lattice-based CRYSTALS-Dilithium signature scheme has been selected for standardization by the NIST. As part of the selection process, a large number of implementations for platforms like x86, ARM Cortex-M4, or – on the hardware side – Xilinx Artix-7 have been presented and discussed by experts. While software implementations have been subject to side-channel analysis with several attacks being published, an analysis of Dilithium hardware implementations and their peculiarities has not...
Multi-Scalar Multiplication (MSM) on elliptic curves is one of the primitives and bottlenecks at the core of many zero-knowledge proof systems. Speeding up MSM typically results in faster proof generation, which in turn makes ZK-based applications practical. We focus on accelerating large MSM on FPGA, and we present speed records for $\texttt{BLS12-377}$ on FPGA: 5.66s for $N=2^{26}$, sub-second for $N=2^{22}$. We developed a fully-pipelined curve adder in extended Twisted Edwards...
Post-quantum cryptography represents a category of cryptosystems resistant to quantum algorithms. Recently, NIST launched a process to standardize one or more of such algorithms in the key encapsulation mechanism and signature categories. Such schemes are under the scrutiny of their mathematical security, but they are not side-channel secure at the algorithm level. That is why their side-channel vulnerabilities must be assessed by the research community. In this paper, we present a...
Isogeny-based cryptography suffers from a long-running time due to its requirement of a great amount of large integer arithmetic. The Residue Number System (RNS) can compensate for that drawback by making computation more efficient via parallelism. However, performing a modular reduction by a large prime which is not part of the RNS base is very expensive. In this paper, we propose a new fast and efficient modular reduction algorithm using RNS. Also, we evaluate our modular reduction method...
In this paper, we introduce a second-order masking of the AES using the minimal number of shares and a total of 1268 bits of randomness including the sharing of the plaintext and key. The masking of the S-box is based on the tower field decomposition of the inversion over bytes where the changing of the guards technique is used in order to re-mask the middle branch of the decomposition. The sharing of the S-box is carefully crafted such that it achieves first-order probing security without...
This work presents a hardware design for constant-time implementation of the HQC (Hamming Quasi-Cyclic) code-based key encapsulation mechanism. HQC has been selected for the fourth round of NIST's Post-Quantum Cryptography standardization process and this work presents the first, hand-optimized design of HQC key generation, encapsulation, and decapsulation written in Verilog targeting implementation on FPGAs. The three modules further share a common SHAKE256 hash module to reduce area...
Quantum computers pose a threat to the security of communications over the internet. This imminent risk has led to the standardization of cryptographic schemes for protection in a post-quantum scenario. We present a design methodology for future implementations of such algorithms. This is manifested using the NIST selected digital signature scheme CRYSTALS-Dilithium and key encapsulation scheme CRYSTALS-Kyber. A unified architecture, \crystal, is proposed that can perform key generation,...
Multi-Scalar Multiplication (MSM) is a fundamental computational problem. Interest in this problem was recently prompted by its application to ZK-SNARKs, where it often turns out to be the main computational bottleneck. In this paper we set forth a pipelined design for computing MSM. Our design is based on a novel algorithmic approach and hardware-specific optimizations. At the core, we rely on a modular multiplication technique which we deem to be of independent interest. We implemented...
Physically related functions~(PReFs) are hardware primitives proposed to establish key-exchange between resource-constrained devices with no pre-established secrets. In this paper, we introduce XOR composition of PReFs to eliminate the requirement of revealing the complete functionality of the hardware primitive during the setup phase, which is a prerequisite to setup PReFs. We evaluate the quality of XOR\_PReF design by implementing them on Artix-7 FPGAs.
Due to the complexity and the cost of producing integrated circuits, most hardware circuit designers outsource the manufacturing of their circuits to a third-party foundry. However, a dishonest foundry may abuse its access to the circuit's design in a variety of ways that undermine the designer's investment or potentially introduce vulnerabilities. To combat these issues, the hardware community has developed the notion of logic locking, which allows the designer to send the foundry a...
In this paper, we propose the first key-recovery side-channel attack on Classic McEliece, a KEM finalist in the NIST Post-quantum Cryptography Standardization Project. Our novel idea is to design an attack algorithm where we submit special ciphertexts to the decryption oracle that correspond to cases of single errors. Decoding of such cipher-texts involves only a single entry in a large secret permutation, which is part of the secret key. Through an identified leakage in the additive...
The looming threat of an adversary with Quantum computing capability led to a worldwide research effort towards identifying and standardizing novel post-quantum cryptographic primitives. Post-standardization, all existing security protocols will need to support efficient implementation of these primitives. In this work, we contribute to these efforts by reporting the smallest implementation of CRYSTALS-Dilithium, a finalist candidate for post-quantum digital signature. By invoking multiple...
Homomorphic encryption enables computation on encrypted data, and hence it has a great potential in privacy-preserving outsourcing of computations to the cloud. Hardware acceleration of homomorphic encryption is crucial as software implementations are very slow. In this paper, we present design methodologies for building a programmable hardware accelerator for speeding up the cloud-side homomorphic evaluations on encrypted data. First, we propose a divide-and-conquer technique that...
In this work, we study and formalize security notions for algorithm substitution attacks (ASAs) on em cryptographic puzzles. Puzzles are difficult problems that require an investment of computation, memory, or some other related resource. They are heavily used as a building block for the consensus networks used by cryptocurrencies. These include primitives such as proof-of-work, proof-of-space, and verifiable delay functions (VDFs). Due to economies of scale, these networks increasingly rely...
We present the first specification-compliant constant-time FPGA implementation of the Classic McEliece cryptosystem from the third-round of NIST's Post-Quantum Cryptography standardization process. In particular, we present the first complete implementation including encapsulation and decapsulation modules as well as key generation with seed expansion. All the hardware modules are parametrizable, at compile time, with security level and performance parameters. As the most time consuming...
In this paper, we present a high-performance architecture for elliptic curve cryptography (ECC) over Curve448, which to the best of our knowledge, is the fastest implementation of ECC point multiplication over Curve448 to date. Firstly, we introduce a novel variant of the Karatsuba formula for asymmetric digit multiplier, suitable for typical DSP primitive with asymmetric input. It reduces the number of required DSPs compared to previous work and preserves the performance via full...
Hardware obfuscation through redundancy addition is a well-known countermeasure against reverse engineering. For FPGA designs, such a technique can be implemented with a small overhead, however, its effectiveness is heavily dependent on the stealthiness of the redundant elements. Hardware opaque predicates can provide adequately stealthy constant values that can be used for obfuscation. However, in this report, we show that such obfuscation schemes can be defeated by ensuring the full...
We describe FPGA implementations of the Spook candidate to the NIST lightweight cryptography competition in two flavors. First, unprotected implementations that exhibit the excellent throughput and energy consumption for the area target specified by the NIST benchmarking initiative. Second, protected implementations leveraging the leveled implementation concept that the Spook design enables and confirming the significant performance gains that it enables.
Many currently deployed public-key cryptosystems are based on the difficulty of the discrete logarithm and integer factorization problems. However, given an adequately sized quantum computer, these problems can be solved in polynomial time as a function of the key size. Due to the future threat of quantum computing to current cryptographic standards, alternative algorithms that remain secure under quantum computing are being evaluated for future use. As a part of this evaluation,...
This paper demonstrates the first side-channel attack on homomorphic encryption (HE), which allows computing on encrypted data. We reveal a power-based side-channel leakage of Microsoft SEAL prior to v3.6 that implements the Brakerski/Fan-Vercauteren (BFV) protocol. Our proposed attack targets the Gaussian sampling in the SEAL’s encryption phase and can extract the entire message with a single power measurement. Our attack works by (1) identifying each coefficient index being sampled, (2)...
Firekite is a synchronous stream cipher using a pseudo-random number generator (PRNG) whose security relies on the hardness of the \textit{Learning Parity with Noise} (LPN) problem. It is one of a few LPN-based symmetric encryption schemes and it can be very efficiently implemented on a low-end SoC FPGA. The designers, Bogos, Korolija, Locher, and Vaudenay, demonstrated appealing properties of Firekite such as requiring only one source of cryptographically strong bits, small key size, high...
In this paper, we present a high-speed, unified elliptic curve cryptography (ECC) processor for arbitrary Weierstrass curves over GF(p), which to the best of our knowledge, outperforms other similar works in terms of execution time. Our approach employs the combination of the schoolbook long and Karatsuba multiplication algorithm for the elliptic curve point multiplication (ECPM) to achieve better parallelization while retaining low complexity. In the hardware implementation, the substantial...
Secure communication often require both encryption and digital signatures to guarantee the confidentiality of the message and the authenticity of the parties. However, post-quantum cryptographic protocols are often studied independently. In this work, we identify a powerful synergy between two finalist protocols in the NIST standardization process. In particular, we propose a technique that enables SABER and Dilithium to share the exact same polynomial multiplier. Since polynomial...
FPGA bitstream encryption and authentication can be defeated by various techniques and it is critical to understand how these vulnerabilities enable extraction and tampering of commercial FPGA bitstreams. We exploit the physical vulnerability of bitstream encryption keys to readout using failure analysis equipment and conduct an end-to-end bitstream tamper attack. Our work underscores the feasibility of supply chain bitstream tampering and the necessity of guarding against such attacks in...
This paper proposes a lightweight authenticated encryption (AE) scheme, called Light-OCB, which can be viewed as a lighter variant of the CAESAR winner OCB as well as a faster variant of the high profi le NIST LWC competition submission LOCUS-AEAD. Light-OCB is structurally similar to LOCUS-AEAD and uses a nonce-based derived key that provides optimal security, and short-tweak tweakable blockcipher (tBC) for efficient domain separation. Light-OCB improves over LOCUS-AEAD by reducing the...
Power analysis attacks on ECC have been presented since almost the very beginning of DPA itself, even before the standardization of AES. Given that power analysis attacks against AES are well known and have a large body of practical artifacts to demonstrate attacks on both software and hardware implementations, it is surprising that these artifacts are generally lacking for ECC. In this work we begin to remedy this by providing a complete open-source ECDSA attack artifact, based on a...
Performance in hardware has typically played a significant role in differentiating among leading candidates in cryptographic standardization efforts. Winners of two past NIST cryptographic contests (Rijndael in case of AES and Keccak in case of SHA-3) were ranked consistently among the two fastest candidates when implemented using FPGAs and ASICs. Hardware implementations of cryptographic operations may quite easily outperform software implementations for at least a subset of major...
We propose design methodologies for building a compact, unified and programmable cryptoprocessor architecture that computes post-quantum key agreement and digital signature. Synergies in the two types of cryptographic primitives are used to make the cryptoprocessor compact. As a case study, the cryptoprocessor architecture has been optimized targeting the signature scheme 'CRYSTALS-Dilithium' and the key encapsulation mechanism (KEM) 'Saber', both finalists in the NIST’s post-quantum...
The field of post-quantum cryptography aims to develop and analyze algorithms that can withstand classical and quantum cryptanalysis. The NIST PQC standardization process, now in its third round, specifies ease of protection against side-channel analysis as an important selection criterion. In this work, we develop and validate a masked hardware implementation of Saber key encapsulation mechanism, a third-round NIST PQC finalist. We first design a baseline lightweight hardware architecture...
Many currently deployed public-key cryptosystems are based on the difficulty of the discrete logarithm and integer factorization problems. However, given an adequately sized quantum computer, these problems can be solved in polynomial time as a function of the key size. Due to the future threat of quantum computing to current cryptographic standards, alternative algorithms that remain secure under quantum computing are being evaluated for future use. One such algorithm is CRYSTALS-Dilithium,...
We present a novel full hardware implementation of Streamlined NTRU Prime, with two variants: A high-speed, high-area implementation, and a slower, low-area implementation. We introduce several new techniques that improve performance, including a batch inversion for key generation, a high-speed schoolbook polynomial multiplier, an NTT polynomial multiplier combined with a CRT map, a new DSP-free modular reduction method, a high-speed radix sorting module, and new en- and decoders. With the...