The Unlikely Hero: Nonideality in Analog Photonic Neural Networks as Built-in Defender Against Adversarial Attacks

Haotian Lu, Ziang Yin, Partho Bhoumik, Sanmitra Banerjee, Krishnendu Chakrabarty, Jiaqi Gu
Arizona State University
jiaqigu@asu.edu
(2022)
Abstract.

Electronic-photonic computing systems have emerged as a promising platform for accelerating deep neural network (DNN) workloads. Major efforts have been focused on countering hardware non-idealities and boosting efficiency with various hardware/algorithm co-design methods. However, the adversarial robustness of such photonic analog mixed-signal AI hardware remains unexplored. Though the hardware variations can be mitigated with robustness-driven optimization methods, malicious attacks on the hardware show distinct behaviors from noises, which requires a customized protection method tailored to optical analog hardware. In this work, we rethink the role of conventionally undesired non-idealities in photonic analog accelerators and claim their surprising effects on defending against adversarial weight attacks. Inspired by the protection effects from DNN quantization and pruning, we propose a synergistic defense framework tailored for optical analog hardware that proactively protects sensitive weights via pre-attack unary weight encoding and post-attack vulnerability-aware weight locking. Efficiency-reliability trade-offs are formulated as constrained optimization problems and efficiently solved offline without model re-training costs. Extensive evaluation of various DNN benchmarks with a multi-core photonic accelerator shows that our framework maintains near-ideal on-chip inference accuracy under adversarial bit-flip attacks with merely <<<3% memory overhead. Our codes are open-sourced at link.

journalyear: 2022copyright: acmcopyrightconference: Proceedings of the 30th Asia and South Pacific Design Automation Conference; Jan. 20–23, 2025; Tokyo Odaiba Miraikan, Japanbooktitle: Proceedings of the 59th ACM/IEEE Design Automation Conference (DAC) (DAC ’22), July 10–14, 2022, San Francisco, CA, USAprice: 15.00doi: 10.1145/xxxxxxx.xxxxxxxisbn: 978-1-4503-9142-9/22/07

1. Introduction

In recent years, analog optical neural networks (ONNs) stand out for their ability to deliver unparalleled speed and efficiency, presenting a promising avenue for artificial intelligence (AI) applications (NP_NATURE2017_Shen, ; NP_PIEEE2020_Cheng, ; NP_NaturePhotonics2021_Shastri, ; NP_ACS2022_Feng, ; NP_Science2024_Xu, ; NP_SciRep2017_Tait, ; NP_Nature2021_Xu, ; NP_Nature2021_Feldmann, ; NP_NatureComm2022_Zhu, ). However, deploying photonic accelerators is impeded by various non-idealities, e.g., low-precision control, hardware noises, and crosstalk, that increase the design complexity to ensure robust deployment. Extensive prior work has focused on suppressing the physical non-ideality and improving the system robustness via cross-layer hardware/algorithm co-design (NP_DATE2020_Gu, ; NP_ICCAD2019_Zhao, ; NP_ICCAD2020_Zhu, ; NP_TCAD2022_Mirza, ). Besides the built-in variations/noises, photonic accelerators are exposed to adversarial attacks (BFADefense_ICCAD2017_Liu, ; BFA_ICCV2019_Rakin, ; TBFA_TPAMI22_Rakin, ) in real-world deployment, raising hardware security concerns. Like digital AI accelerators, we envision that malicious attacks, e.g., bit-flip attacks in stored NN weights, will quickly become another potential roadblock for emerging optical analog neural accelerators. Only tens of bit-flips on the most significant bits (MSB) of critical weights severely degrade the accuracy. Effective pre-attack protection and post-attack accuracy recovery schemes that leverage the unique properties of analog optical hardware remain unexplored.

Prior work in neural network defense has explored various training-based and training-free defense methods (NP_DATE2020_Gu, ; DefendBFA_CVPR2020_He, ; BFADefense_DAC2020_Li, ; RADAR_DATE21_Li, ). For example, noise-aware training (NAT) (NP_DATE2020_Gu, ) and adversarial training have been proposed to smooth the NN loss landscape and increase attack tolerance. Among various defense methods, a class that exploits model compression techniques is particularly interesting in the analog NN context. Quantization, a common model compression method, has been applied for defense. Binarization-aware training (BAT) (DefendBFA_CVPR2020_He, ), as a training-based method, has been proposed to provide pre-attack protection by reducing weight sensitivity via 1-bit weights. However, training-based methods usually suffer from huge model re-training costs and encounter practical concerns in data access, privacy, etc. As a pre-attack protection conducted offline, training-based defense maximizes the average performance across arbitrary attacks, which usually lack precise protection at the cost of task performance degradation. Training-free methods usually occur post-attack as a complementary protection mechanism, detecting/localizing the victim weights (BFADetection_ICCAD2020_Liu, ) and resuming accuracy by error mitigation/correction. A representative training-free defense method is pruning-based accuracy recovery (RADAR_DATE21_Li, ). It detects victim weight groups via MSB checksum verification and prunes detected weights to 0 to partially reduce the bit-flip induced error. Since low-bit precision and sparsity naturally exist as built-in primitives in optical AI hardware mainly for efficiency-accuracy trade-offs, it inspires us to explore their novel usages in defense.

ONNs’ non-idealities have been treated as undesired hardware restrictions compared to digital computers, while in this work, we revisit their role as intrinsic low-cost defenders, adding reliability as a new dimension in the hardware/software co-design space. In this work, for the first time, we propose a synergistic defense framework for photonic AI hardware that provides pre-attack protection via an optics-inspired unary weight representation and post-attack accuracy recovery via a sensitivity-aware on-chip weight locking technique. Memory efficiency and adversarial robustness are co-optimized to provide near-ideal accuracy protection at marginal memory overhead.

The major contributions of this paper are as follows:

  • \bullet

    We investigate the adversarial robustness of optical analog neural networks under malicious weight attacks and explore the built-in protection of the photonic accelerator non-idealities.

  • \bullet

    We propose a quantization-inspired truncated complementary unary weight encoding to minimize the ONN weight sensitivity with optimized efficiency-robustness trade-offs.

  • \bullet

    We propose a pruning-inspired clustering-based weight locking technique that co-optimizes detection precision, accuracy recovery, and memory efficiency.

  • \bullet

    Our synergistic framework with integrated pre-attack unary protection and post-attack weight locking has shown near-ideal resumed accuracy with a marginal 3% memory overhead.

2. Preliminaries

2.1. Photonic AI Accelerators and Optical DAC

Various photonic AI accelerators have been demonstrated (NP_NATURE2017_Shen, ; NP_PIEEE2020_Cheng, ; NP_NaturePhotonics2021_Shastri, ; NP_ACS2022_Feng, ; NP_DATE2019_Liu, ; NP_HPCA2024_Zhu, ). As a case study, we focus on one multi-core photonic AI accelerator architecture based on dynamic photonic tensor cores (PTC) (NP_HPCA2024_Zhu, ). Each PTC takes two optically-encoded matrices and performs speed-of-light matrix-matrix multiplication. The input signals are quantized to reduce the digital-analog conversion (DAC) cost. The inputs X𝑋Xitalic_X are quantized to 8-bit fixed-point numbers, while the weights are quantized to b𝑏bitalic_b-bit, e.g., ranging from 4-bit to 8-bit. A recent trend to reduce the electrical DAC (eDAC) power bottleneck is to employ optical DAC (oDAC) modules, which encode discretized values to light magnitude with segmented modulators (NP_OE2017_Samani, ; NP_JSSC2017_Moazeni, ), as shown in Fig. 1.

Refer to caption
Figure 1. (Left) Example optical DACs with segmented modulators (NP_OE2017_Samani, ; NP_JSSC2017_Moazeni, ). (Right) Signed BCD to unary representation conversion.
Refer to caption
Figure 2. Proposed built-in defense flow for photonic AI accelerators against malicious weight attack.

The controller in segmented oDAC is partitioned into 2b1{}^{b}-1start_FLOATSUPERSCRIPT italic_b end_FLOATSUPERSCRIPT - 1 equal-length segments, each contributing to 1 least-significant bit (LSB) of the encoded value. In this setting, the binary weight value needs to be converted to a unary representation where a ’1’ applies a voltage to that bit without the need for high-power eDAC. Thus, the number of leading 1’s can represent the original binary-coded digit (BCD), i.e.,

(1) (w)B={1}w{0}2b1w=(1,,1w,0,,02b1w)U,subscript𝑤𝐵superscript1𝑤superscript0superscript2𝑏1𝑤subscriptsubscript11𝑤subscript00superscript2𝑏1𝑤𝑈\small(w)_{B}=\{1\}^{w}\{0\}^{2^{b}-1-w}=\big{(}{\underbrace{{1,\cdots},1}_{w}% ,\underbrace{0,\cdots,0}_{2^{b}-1-w}}\big{)}_{U},( italic_w ) start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT = { 1 } start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT { 0 } start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT - 1 - italic_w end_POSTSUPERSCRIPT = ( under⏟ start_ARG 1 , ⋯ , 1 end_ARG start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT , under⏟ start_ARG 0 , ⋯ , 0 end_ARG start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT - 1 - italic_w end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ,

where w𝑤witalic_w is a signed integer value. For example, (11110000)U=(4)B. Compared to PTCs with high-speed eDACs, oDAC-enhanced designs show significant power reduction. This unique hardware architecture and property inspire us to explore intrinsic unary encoding as an effective protection mechanism that brings minimum weight sensitivity without extra BCD-to-Unary conversion cost, as unary coding is the built-in primitive.

3. Proposed Defense Framework

We will introduce the threat model and investigate built-in defense mechanisms in non-ideal analog photonic accelerators. As shown in the overview Fig. 2, two key techniques will be introduced to provide both pre-attack weight protection and post-attack accuracy recovery with optimized memory-robustness trade-offs.

3.1. Threat Model and Attacker Settings

As a case study, we assume a widely employed attacker model: gradient-based attacker BFA (BFA_ICCV2019_Rakin, ). Important assumptions on the threat model are given in Table 1. We follow the standard white-box attack threat model assumptions as previous work (BFA_ICCV2019_Rakin, ; TBFA_TPAMI22_Rakin, ; DefendBFA_CVPR2020_He, ; BFADefense_TC2022_Liu, ).

Table 1. Threat model assumed in this work.
Access Required Access NOT Requied
DNN model and parameters Training Configurations
A mini-batch of attack dataset Modify scaling factors in quantization & Norm.
On-chip forward/backward prop. Modify address mapping/look-up tables

Eq. (2) describes the target of an on-chip adversarial attacker under Hamming Distance (HD) and inference budget (Tinfsubscript𝑇𝑖𝑛𝑓T_{inf}italic_T start_POSTSUBSCRIPT italic_i italic_n italic_f end_POSTSUBSCRIPT) constraint.

(2) minAAcc(W^A,𝒟test)subscriptsubscript𝐴𝐴𝑐𝑐subscript^𝑊subscript𝐴superscript𝒟𝑡𝑒𝑠𝑡\displaystyle\min_{\mathcal{I}_{A}}~{}Acc(\widehat{W}_{\mathcal{I}_{A}},% \mathcal{D}^{test})roman_min start_POSTSUBSCRIPT caligraphic_I start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_A italic_c italic_c ( over^ start_ARG italic_W end_ARG start_POSTSUBSCRIPT caligraphic_I start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_POSTSUBSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_t italic_e italic_s italic_t end_POSTSUPERSCRIPT ) maxA(W^A,𝒟att)absentsubscriptsubscript𝐴subscript^𝑊subscript𝐴superscript𝒟𝑎𝑡𝑡\displaystyle\approx\max_{\mathcal{I}_{A}}~{}\mathcal{L}(\widehat{W}_{\mathcal% {I}_{A}},\mathcal{D}^{att})≈ roman_max start_POSTSUBSCRIPT caligraphic_I start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L ( over^ start_ARG italic_W end_ARG start_POSTSUBSCRIPT caligraphic_I start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_POSTSUBSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_a italic_t italic_t end_POSTSUPERSCRIPT )
s.t.W^AW1HD;s.t.subscriptnormsubscript^𝑊subscript𝐴𝑊1𝐻𝐷\displaystyle\text{s.t.}~{}~{}\|\widehat{W}_{\mathcal{I}_{A}}-W\|_{1}\leq HD;s.t. ∥ over^ start_ARG italic_W end_ARG start_POSTSUBSCRIPT caligraphic_I start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_W ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_H italic_D ; # of model inferencesTinf,# of model inferencessubscript𝑇𝑖𝑛𝑓\displaystyle\quad\#\text{ of model inferences}\leq T_{inf},# of model inferences ≤ italic_T start_POSTSUBSCRIPT italic_i italic_n italic_f end_POSTSUBSCRIPT ,

where Asubscript𝐴\mathcal{I}_{A}caligraphic_I start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT is the selected bits to attack, W^Asubscript^𝑊subscript𝐴\widehat{W}_{\mathcal{I}_{A}}over^ start_ARG italic_W end_ARG start_POSTSUBSCRIPT caligraphic_I start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the attacked weights, and 𝒟attsuperscript𝒟𝑎𝑡𝑡\mathcal{D}^{att}caligraphic_D start_POSTSUPERSCRIPT italic_a italic_t italic_t end_POSTSUPERSCRIPT is the attack dataset, usually a small batch of BS𝐵𝑆BSitalic_B italic_S examples. The minimization of post-attack test accuracy is often estimated by maximizing the loss function on a small attack dataset.

Gradient-based Attacker (BFA).  The gradient-based attacker has access to the gradient information of a mini-batch of data via on-chip backpropagation and attacks the most sensitive bits. Specifically, we adopt the attacker algorithm BFA (BFA_ICCV2019_Rakin, ) that progressively searches for the most sensitive bits indicated by the largest absolute gradient if the flip direction aligns with the gradient. The gradients of all weights will be re-evaluated every time it flips one bit. Each bit-flip requires forward and backward propagation, equivalently consuming three inference budgets. If the attacker consumes all inference budget but still has an extra hamming distance budget left unused, it will directly select the most sensitive but unattacked weights to flip their MSB to make sure it always uses up the hamming distance budget. We conduct all the experiments under the HD=100𝐻𝐷100HD=100italic_H italic_D = 100 condition.

3.2. Efficient Built-in Pre-Attack Defense via Unary Weight Representation

For efficiency and control complexity consideration, the weights of photonic analog AI hardware are often quantized to low-bitwidth fixed-point numbers (NP_DATE2020_Gu, ; NP_HPCA2024_Zhu, ), usually represented as binary-coded decimal (BCD) format with 2’s complement encoding. The weights are fetched from electrical memory, converted to voltage signals via DACs, and encoded in the optical domain for computing. To investigate the role of quantized weight encoding in the adversarial robustness of analog hardware, we first raise several critical questions: ➊ How does quantization impact the adversarial robustness against bit-flip attack? ➋ How can we leverage the natural unary representation inspired by optical DAC as an effective defense? ➌ What is the robustness-memory trade-off of unary representation and how to avoid the exponential memory cost?

Refer to caption
((a))
Refer to caption
((b))
Figure 3. (a) Lower bitwidth reduces weight sensitivity. (b) Low-bit quantization helps improve bit-flip attack robustness.

3.2.1. Protection Effects of Quantization

To answer the question ➊, we investigate how sensitivity changes with various bitwidth in quantization. In Fig. 3(a), we observe that low-bit quantization can reduce the overall weight sensitivity defined later in Eq. (5). Hence, in Fig. 3(b), we observe a clear protection effect from low-bit quantization against bit-flip attack, which lays the foundation for our further study in memory-efficient unary representation.

3.2.2. Unary Representation as Built-in Protection

BCD-format is compact in storage but sensitive to bit-flip attack since the MSB flip can cause significant deviation by half of the weight range, which casts a serious reliability threat to the hardware. An intuitive solution is to leverage the built-in unary representation to minimize the bit-flip sensitivity as all bits in unary-coded weight are LSB. With a predefined protection rate α𝛼\alphaitalic_α, i.e., the percentage of weights protected by unary representation, the protected weights can be searched by maximizing the post-attack accuracy, which reflects the protection effectiveness,

(3) Usuperscriptsubscript𝑈\displaystyle\mathcal{I}_{U}^{*}caligraphic_I start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT =argmaxUAcc(W^U,𝒟val),absentsubscriptargmaxsubscript𝑈𝐴𝑐𝑐subscript^𝑊subscript𝑈superscript𝒟𝑣𝑎𝑙\displaystyle=\operatorname*{argmax}_{\mathcal{I}_{U}}~{}Acc(\widehat{W}_{% \mathcal{I}_{U}},\mathcal{D}^{val}),= roman_argmax start_POSTSUBSCRIPT caligraphic_I start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_A italic_c italic_c ( over^ start_ARG italic_W end_ARG start_POSTSUBSCRIPT caligraphic_I start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUBSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_v italic_a italic_l end_POSTSUPERSCRIPT ) ,

where Usuperscriptsubscript𝑈\mathcal{I}_{U}^{*}caligraphic_I start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is the selected indices for unary protection to maximize the validation accuracy after attack, W^Usubscript^𝑊subscript𝑈\widehat{W}_{\mathcal{I}_{U}}over^ start_ARG italic_W end_ARG start_POSTSUBSCRIPT caligraphic_I start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUBSCRIPT represents the attacked weights. We denote the number of weights protected as NU=α|W|=|U|subscript𝑁𝑈𝛼𝑊superscriptsubscript𝑈N_{U}=\lceil\alpha|W|\rceil=|\mathcal{I}_{U}^{*}|italic_N start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT = ⌈ italic_α | italic_W | ⌉ = | caligraphic_I start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT |.

For the original unary representation, the memory overhead is exponential, which limits protection efficiency, as shown in Eq. (4).

(4) mU=((2b1)|U|+l=1Llog2NUl×|U,l|)/(b|W|),subscript𝑚𝑈superscript2𝑏1subscript𝑈superscriptsubscript𝑙1𝐿subscript2superscriptsubscript𝑁𝑈𝑙subscript𝑈𝑙𝑏𝑊\small m_{U}=\big{(}(2^{b}-1)|\mathcal{I}_{U}|+\sum_{l=1}^{L}\lceil\log_{2}N_{% U}^{l}\rceil\times|\mathcal{I}_{U,l}|\big{)}/(b|W|),italic_m start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT = ( ( 2 start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT - 1 ) | caligraphic_I start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT | + ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ⌈ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ⌉ × | caligraphic_I start_POSTSUBSCRIPT italic_U , italic_l end_POSTSUBSCRIPT | ) / ( italic_b | italic_W | ) ,

Given the memory overhead budget, we can roughly derive the maximum number of weights we can protect. Then, the next phase is to determine the weights to protect. The overall pre-attack protection algorithm with unary representation is detailed in Alg. 1. To maximize the protection effectiveness, we prefer to protect vulnerable weights that show the largest bit-flip sensitivity S𝑆Sitalic_S.

Bit-flip-Aware Weight Sensitivity Evaluation.  A widely used weight sensitivity is the magnitude of the first-order gradient |W|subscript𝑊|\nabla_{W}\mathcal{L}|| ∇ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT caligraphic_L | or second-order gradient |W2|subscriptsuperscript2𝑊|\nabla^{2}_{W}\mathcal{L}|| ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT caligraphic_L | in the literature. Those metrics are designed for small random perturbations in a neighbor region where the gradient and curvature information can capture the sensitivity. However, the bit-flip attack is not a random perturbation that has a determined direction, i.e., from 0/1 to 1/0; meanwhile, the large deviation from the MSB flip breaks the assumption of small local perturbation. Therefore, we employ a bit-flip-aware sensitivity score based on Taylor expansion of the loss on the validation dataset,

(5) S=0WΔWMSB+12W2ΔWMSB2,𝑆subscript0subscript𝑊Δsubscript𝑊𝑀𝑆𝐵12superscriptsubscript𝑊2Δsuperscriptsubscript𝑊𝑀𝑆𝐵2\small S=\mathcal{L}-\mathcal{L}_{0}\approx\nabla_{W}\mathcal{L}\cdot\Delta W_% {MSB}+\frac{1}{2}\cdot\nabla_{W}^{2}\mathcal{L}\cdot\Delta W_{MSB}^{2},italic_S = caligraphic_L - caligraphic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≈ ∇ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT caligraphic_L ⋅ roman_Δ italic_W start_POSTSUBSCRIPT italic_M italic_S italic_B end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ ∇ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_L ⋅ roman_Δ italic_W start_POSTSUBSCRIPT italic_M italic_S italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where the Hessian matrix is approximated by its diagonal entries W2superscriptsubscript𝑊2\nabla_{W}^{2}\mathcal{L}∇ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_L, and ΔWMSBΔsubscript𝑊𝑀𝑆𝐵\Delta W_{MSB}roman_Δ italic_W start_POSTSUBSCRIPT italic_M italic_S italic_B end_POSTSUBSCRIPT is the perturbation caused by MSB-1 flip. A larger sensitivity S𝑆Sitalic_S represents a higher vulnerability to bit-flip. This score is aware of the alignment of the bit-flip direction with the gradients. Only bit-flips leading to larger S will be considered for protection.

Sensitivity-Guided Memory Overhead Assignment.  Once we obtain the sensitivity scores for all weights, we need to further determine how to leverage the scores as guidance to distribute the memory overhead budget to all neural network layers. We propose Top-Sensitive-Layer Assignment that ranks layers based on their overall sensitivity and allocates all memory budgets to the most sensitive layers. As shown in Fig. 5(a), layer sensitivity is estimated by the averaged 50%-quantile and 75%-quantile of sensitivity of all weights, i.e., S¯l=(Q50%(S)+Q75%(S))/2superscript¯𝑆𝑙subscript𝑄percent50𝑆subscript𝑄percent75𝑆2\bar{S}^{l}=(Q_{50\%}(S)+Q_{75\%}(S))/2over¯ start_ARG italic_S end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = ( italic_Q start_POSTSUBSCRIPT 50 % end_POSTSUBSCRIPT ( italic_S ) + italic_Q start_POSTSUBSCRIPT 75 % end_POSTSUBSCRIPT ( italic_S ) ) / 2. The sorted layer indices from most sensitive to least sensitive are (l1,l2,,lL)subscript𝑙1subscript𝑙2subscript𝑙𝐿(l_{1},l_{2},\cdots,l_{L})( italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_l start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ). Formally, the layer-wise overhead budgets are (|Wl1|i=1L|Wli|,|Wl2|i=1L|Wli|,,mj=1L|Wlj|i=1L|Wli|,0,,0)superscript𝑊subscript𝑙1superscriptsubscript𝑖1𝐿superscript𝑊subscript𝑙𝑖superscript𝑊subscript𝑙2superscriptsubscript𝑖1𝐿superscript𝑊subscript𝑙𝑖𝑚superscriptsubscript𝑗1superscript𝐿superscript𝑊subscript𝑙𝑗superscriptsubscript𝑖1𝐿superscript𝑊subscript𝑙𝑖00(\frac{|W^{l_{1}}|}{\sum_{i=1}^{L}|W^{l_{i}}|},\frac{|W^{l_{2}}|}{\sum_{i=1}^{% L}|W^{l_{i}}|},\cdots,m-\sum_{j=1}^{L^{\prime}}\frac{|W^{l_{j}}|}{\sum_{i=1}^{% L}|W^{l_{i}}|},0,\cdots,0)( divide start_ARG | italic_W start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT | italic_W start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | end_ARG , divide start_ARG | italic_W start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT | italic_W start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | end_ARG , ⋯ , italic_m - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT divide start_ARG | italic_W start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT | italic_W start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | end_ARG , 0 , ⋯ , 0 ), where only the top-Lsuperscript𝐿L^{\prime}italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT sensitive layers can have weight protection.

Attack-injected Search for Weight Selection.  Based on the sensitivity-weighted guidance, we can select a certain number of vulnerable weights to protect for each layer. However, solely relying on weight sensitivity ranking to select weights to protect does not directly optimize toward the true objective, i.e., maximization of post-attack accuracy Acc(W^)𝐴𝑐𝑐^𝑊Acc(\widehat{W})italic_A italic_c italic_c ( over^ start_ARG italic_W end_ARG ). Since the selection procedure is pre-deployment (offline), we can afford to search by sampling weights with sensitivity-weighted probability and select the group that brings the best protection based on bit-flip attack emulation and validation accuracy.

Algorithm 1 Pre-attack unary weight protection algorithm
Loss function (W)𝑊\mathcal{L}(W)caligraphic_L ( italic_W ), protection rate α𝛼\alphaitalic_α, hamming distance for attacker HD𝐻𝐷HDitalic_H italic_D, # of attacks Tasubscript𝑇𝑎T_{a}italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, max search steps T𝑇Titalic_T, and validation set 𝒟valsuperscript𝒟𝑣𝑎𝑙\mathcal{D}^{val}caligraphic_D start_POSTSUPERSCRIPT italic_v italic_a italic_l end_POSTSUPERSCRIPT.
Calculate per-weight sensitivity {Sl}l=1Lsuperscriptsubscriptsuperscript𝑆𝑙𝑙1𝐿\{S^{l}\}_{l=1}^{L}{ italic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT and layer sensitivity {S¯l}l=1Lsuperscriptsubscriptsuperscript¯𝑆𝑙𝑙1𝐿\{\bar{S}^{l}\}_{l=1}^{L}{ over¯ start_ARG italic_S end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT
{NUl}l=1Lmem_assignment(α,{S¯l}l=1L)superscriptsubscriptsuperscriptsubscript𝑁𝑈𝑙𝑙1superscript𝐿mem_assignment𝛼superscriptsubscriptsuperscript¯𝑆𝑙𝑙1𝐿\{N_{U}^{l}\}_{l=1}^{L^{\prime}}\leftarrow\texttt{mem\_assignment}(\alpha,\{% \bar{S}^{l}\}_{l=1}^{L}){ italic_N start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ← mem_assignment ( italic_α , { over¯ start_ARG italic_S end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT )
for l1L𝑙1superscript𝐿l\leftarrow 1\cdots L^{\prime}italic_l ← 1 ⋯ italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT do
    Best accuracy Acc0𝐴𝑐superscript𝑐0Acc^{*}\leftarrow 0italic_A italic_c italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ← 0
    for t1T𝑡1𝑇t\leftarrow 1\cdots Titalic_t ← 1 ⋯ italic_T do
         Sample NUlsuperscriptsubscript𝑁𝑈𝑙N_{U}^{l}italic_N start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT indices with probability Pl=softmax(Sl)superscript𝑃𝑙softmaxsuperscript𝑆𝑙P^{l}=\texttt{softmax}(S^{l})italic_P start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = softmax ( italic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) as U,tlsuperscriptsubscript𝑈𝑡𝑙\mathcal{I}_{U,t}^{l}caligraphic_I start_POSTSUBSCRIPT italic_U , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT
         Protect weights with TCU: WU,tllBCD-to-TCU(Wl,U,tl)superscriptsubscript𝑊superscriptsubscript𝑈𝑡𝑙𝑙BCD-to-TCUsuperscript𝑊𝑙superscriptsubscript𝑈𝑡𝑙W_{\mathcal{I}_{U,t}^{l}}^{l}\leftarrow\texttt{BCD-to-TCU}(W^{l},\mathcal{I}_{% U,t}^{l})italic_W start_POSTSUBSCRIPT caligraphic_I start_POSTSUBSCRIPT italic_U , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ← BCD-to-TCU ( italic_W start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , caligraphic_I start_POSTSUBSCRIPT italic_U , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT )
         for j1Ta𝑗1subscript𝑇𝑎j\leftarrow 1\cdots T_{a}italic_j ← 1 ⋯ italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT do
             W^U,tllAttack(WU,tll,HD);AccjAcc(W^IU,tll,𝒟val)formulae-sequencesuperscriptsubscript^𝑊superscriptsubscript𝑈𝑡𝑙𝑙Attacksuperscriptsubscript𝑊superscriptsubscript𝑈𝑡𝑙𝑙𝐻𝐷𝐴𝑐subscript𝑐𝑗𝐴𝑐𝑐subscriptsuperscript^𝑊𝑙superscriptsubscript𝐼𝑈𝑡𝑙superscript𝒟𝑣𝑎𝑙\widehat{W}_{\mathcal{I}_{U,t}^{l}}^{l}\leftarrow\texttt{Attack}(W_{\mathcal{I% }_{U,t}^{l}}^{l},HD);\quad Acc_{j}\leftarrow Acc(\widehat{W}^{l}_{I_{U,t}^{l}}% ,\mathcal{D}^{val})over^ start_ARG italic_W end_ARG start_POSTSUBSCRIPT caligraphic_I start_POSTSUBSCRIPT italic_U , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ← Attack ( italic_W start_POSTSUBSCRIPT caligraphic_I start_POSTSUBSCRIPT italic_U , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , italic_H italic_D ) ; italic_A italic_c italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ← italic_A italic_c italic_c ( over^ start_ARG italic_W end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_U , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_v italic_a italic_l end_POSTSUPERSCRIPT )          
         Get worst post-attack accuracy: Acctmin{Accj|j[Ta]}𝐴𝑐subscript𝑐𝑡conditional𝐴𝑐subscript𝑐𝑗for-all𝑗delimited-[]subscript𝑇𝑎Acc_{t}\leftarrow\min\{Acc_{j}|\forall j\in[T_{a}]\}italic_A italic_c italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ← roman_min { italic_A italic_c italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ∀ italic_j ∈ [ italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ] }
         if Acct>Acc𝐴𝑐subscript𝑐𝑡𝐴𝑐superscript𝑐Acc_{t}>Acc^{*}italic_A italic_c italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT > italic_A italic_c italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT then
             Record most protective weights: AccAcct;UlU,tlformulae-sequence𝐴𝑐superscript𝑐𝐴𝑐subscript𝑐𝑡superscriptsubscript𝑈𝑙superscriptsubscript𝑈𝑡𝑙Acc^{*}\leftarrow Acc_{t};~{}~{}\mathcal{I}_{U}^{l*}\leftarrow\mathcal{I}_{U,t% }^{l}italic_A italic_c italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ← italic_A italic_c italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; caligraphic_I start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l ∗ end_POSTSUPERSCRIPT ← caligraphic_I start_POSTSUBSCRIPT italic_U , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT              
U{Ul|l[L]}superscriptsubscript𝑈conditional-setsuperscriptsubscript𝑈𝑙𝑙delimited-[]superscript𝐿\mathcal{I}_{U}^{*}\leftarrow\{\mathcal{I}_{U}^{l*}|l\in[L^{\prime}]\}caligraphic_I start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ← { caligraphic_I start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l ∗ end_POSTSUPERSCRIPT | italic_l ∈ [ italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] }

3.2.3. Memory-Efficient Truncated Complementary Unary Representation

Refer to caption
Figure 4. Different coding formats. Our truncated complementary unary representation shows superior memory efficiency.

To answer question ➌, we propose a memory-efficient truncated complementary unary (TCU) representation. An important observation is that in the original unary representation, the large number of trailing zeros for small values are only for bitwidth alignment purposes without any expressiveness. Hence, an intuitive compression method is to truncate the trailing zeros to reduce bitwidth from (2b1)superscript2𝑏1(2^{b}-1)( 2 start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT - 1 ) to b^^𝑏\hat{b}over^ start_ARG italic_b end_ARG. For example, if b=3𝑏3b=3italic_b = 3, we can compress numbers smaller than 4 by using only 4-bit in truncated unary-format instead of 231=7superscript23172^{3}-1=72 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT - 1 = 7-bit, e.g., (7b1100000)Utruncate(4b1100)U𝑡𝑟𝑢𝑛𝑐𝑎𝑡𝑒subscript7superscript𝑏1100000𝑈subscript4superscript𝑏1100𝑈(7b^{\prime}1100000)_{U}\xrightarrow{truncate}(4b^{\prime}1100)_{U}( 7 italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT 1100000 ) start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_ARROW start_OVERACCENT italic_t italic_r italic_u italic_n italic_c italic_a italic_t italic_e end_OVERACCENT → end_ARROW ( 4 italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT 1100 ) start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT, without changing its actual encoded value of (2)Bsubscript2𝐵(2)_{B}( 2 ) start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT.

How to select optimal truncation bitwidth b^^𝑏\hat{b}over^ start_ARG italic_b end_ARG? – If b^^𝑏\hat{b}over^ start_ARG italic_b end_ARG is too large, not many zeros can be truncated. On the other hand, if b^^𝑏\hat{b}over^ start_ARG italic_b end_ARG is overly small, only a few weights with small values can be expressed by the truncated bitwidth. Both cases give unsatisfactory memory saving.

We first illustrate a truncated version of unary format (TU) by clustering weights into exponentially-spaced bins and assigning a truncation bitwidth to cover the largest value in each bin, as shown in Fig. 4. For small positive values, such a method can significantly trim the redundant trailing zeros for memory reduction. But it is not very efficient for the largest bin. Moreover, since most sensitive weights have small absolute values (RADAR_DATE21_Li, ), a large proportion of negative weights, unfortunately, fall into the largest bin.

Aware of the Gaussian-like weight distribution in real neural networks and the important property of unary representation, i.e., counting 0’s is equivalent to counting 1’s, we propose a complementary unary format (TCU) that stores trailing 0’s and trims leading 1’s for negative values. For instance, the required bitwidth for -3 can be reduced from 15-bit in TU-format to 3-bit in TCU-format, as illustrated in Fig. 4. Similar to logarithmic quantization, the exponentially-sized bins in TCU-format reduce the bin count while minimizing memory overhead by holding a large number of small-value yet sensitive weights in the lowest-bitwidth bins, shown in Fig. 5(b).

Refer to caption
((a))
Refer to caption
((b))
Figure 5. (a) 6 layers in 8-bit VGG-8 shows distinct layer sensitivity statistics. (b) Distribution of absolute values of weights protected by Unary Protection for 8-bit VGG-8 on CIFAR10. Vulnerable weights that deserve to be protected have small magnitudes.

The memory overhead ratio mTCUsubscript𝑚𝑇𝐶𝑈m_{TCU}italic_m start_POSTSUBSCRIPT italic_T italic_C italic_U end_POSTSUBSCRIPT of TCU-format and indexing overhead is formulated in Eq.(6).

(6) mTCU=lL(iIUl2log2min(2b|Wi|,|Wi|)+log2Nl×|Ul|)/(blL|Wl|).subscript𝑚𝑇𝐶𝑈superscriptsubscript𝑙superscript𝐿subscript𝑖superscriptsubscript𝐼𝑈𝑙superscript2subscript2superscript2𝑏subscript𝑊𝑖subscript𝑊𝑖subscript2subscript𝑁𝑙superscriptsubscript𝑈𝑙𝑏superscriptsubscript𝑙𝐿superscript𝑊𝑙\small m_{TCU}=\sum_{l}^{L^{\prime}}\Big{(}\sum\limits_{i\in I_{U}^{l}}2^{% \lceil\log_{2}\min(2^{b}-|W_{i}|,|W_{i}|)\rceil}+\lceil\log_{2}N_{l}\rceil% \times|\mathcal{I}_{U}^{l}|\Big{)}/(b\sum_{l}^{L}|W^{l}|).italic_m start_POSTSUBSCRIPT italic_T italic_C italic_U end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT ⌈ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_min ( 2 start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT - | italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | , | italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ) ⌉ end_POSTSUPERSCRIPT + ⌈ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⌉ × | caligraphic_I start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT | ) / ( italic_b ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT | italic_W start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT | ) .

3.3. Post-attack Accuracy Recovery via Sensitivity-aware Weight Locking

Pre-attack unary protection is unaware of the actual attacked bits as it is performed offline before deployment. This lack of precise targets makes the coverage of pre-attack protection insufficient if only a small percentage (α𝛼\alphaitalic_α) of weights can be converted to TCU format. It is necessary to employ post-attack detection and recovery mechanisms to compensate for this inevitable protection miss.

Pruning is widely used in analog ONNs to improve energy efficiency (NP_TCAD2020_Gu, ; NP_JSTPE2023_Banerjee, ; NP_ICCAD2024_Yin, ; NP_isvlsi2022_banerjee2022pruning, ; NP_OFC2022_banerjee2022champ, ). We ask two critical questions: ➊ how to leverage the natural hardware sparsity for defense? and ➋ how to trade off pruning-induced accuracy loss and protection effects?

In previous work (RADAR_DATE21_Li, ), pruning is utilized to force detected under-attack weight groups to zero to partially cancel out the bit-flip errors. This method is intuitive due to two facts. (1) First, the MSB flip creates a deviation of half of the weight range, which always changes the sign of the weight, e.g., (3)B(5)Bsubscript3𝐵subscript5𝐵(-3)_{B}\rightarrow(5)_{B}( - 3 ) start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT → ( 5 ) start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT for a 4-bit weight. Hence, forcing it to 0 always reduces the error |(3)0|<|(3)5|3035|(-3)-0|<|(-3)-5|| ( - 3 ) - 0 | < | ( - 3 ) - 5 |, at least on the weight itself. (2) Second, weight distribution shown in Fig. 5(b) shows that many sensitive weights under attack have small magnitudes (RADAR_DATE21_Li, ), which further justifies that pruning is a promising built-in mechanism for accuracy recovery, which answers question ➊.

However, simple weight pruning fails to resume accuracy in practice because many of the pruned weights are either still far away from 0 or not real victim weights due to inevitable false alarms in group-wise detection. In other words, pruning fake victims to 0 turns out to be a self-attack.

To answer question ➋, we propose sensitivity-aware weight locking, which generalizes prior pruning-based method and significantly boosts the protection effectiveness with optimal clustering and locking.

Refer to caption
Figure 6. Comparison between pruning-based protection and our proposed weight-locking method.
Refer to caption
Figure 7. The layer-wise offline search procedure to find optimal detection group size and locking solutions given accuracy constraints.

Post-attack accuracy recovery generally follows two steps: detection (localization) and resume. We assume the same detection technique based on group-wise MSB checksum verification (RADAR_DATE21_Li, ). A mismatch in checksum will mark the entire group of size G𝐺Gitalic_G as the victim weight group. All weights in victim groups will be resumed in the second step. Shown in Fig. 6, unlike the prior pruning method that forces victim weights to 0, we propose sensitivity-aware weight locking that intelligently finds K𝐾Kitalic_K centroids before deployment and locks detected victim weights to their centroids to maximize recovery effectiveness. Key trade-offs here include detection group size G𝐺Gitalic_G, which impacts detection accuracy and memory, and cluster number K𝐾Kitalic_K, which impacts both the centroid storage cost and resumed accuracy.

For the l𝑙litalic_l-th layer, we formulate it as an accuracy-constrained memory overhead minimization problem as follows in Eq.(7),

(7) min{Wk}k=1K,L,G,KmL(G,K)={|W|(log2K+2)Gb|W|,G>1|W|(log2K+1)b|W|,G=1s.t.Acc0Acc(W^L,{Wk}k=1K)<η,subscriptsuperscriptsubscriptsubscript𝑊𝑘𝑘1𝐾subscript𝐿𝐺𝐾subscript𝑚𝐿𝐺𝐾casesotherwise𝑊subscript2𝐾2𝐺𝑏𝑊𝐺1otherwise𝑊subscript2𝐾1𝑏𝑊𝐺1s.t.𝐴𝑐subscript𝑐0𝐴𝑐𝑐subscript^𝑊subscript𝐿superscriptsubscriptsubscript𝑊𝑘𝑘1𝐾𝜂\small\begin{aligned} \min_{\{W_{k}\}_{k=1}^{K},\mathcal{I}_{L},G,K}&m_{L}(G,K% )=\begin{cases}&\frac{|W|(\log_{2}K+2)}{G\cdot b|W|},~{}~{}G>1\\ &\frac{|W|(\log_{2}K+1)}{b|W|},~{}~{}G=1\\ \end{cases}\\ \text{s.t.}~{}~{}&Acc_{0}-Acc(\widehat{W}_{\mathcal{I}_{L},\{W_{k}\}_{k=1}^{K}% })<\eta\end{aligned},start_ROW start_CELL roman_min start_POSTSUBSCRIPT { italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT , caligraphic_I start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , italic_G , italic_K end_POSTSUBSCRIPT end_CELL start_CELL italic_m start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( italic_G , italic_K ) = { start_ROW start_CELL end_CELL start_CELL divide start_ARG | italic_W | ( roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_K + 2 ) end_ARG start_ARG italic_G ⋅ italic_b | italic_W | end_ARG , italic_G > 1 end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL divide start_ARG | italic_W | ( roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_K + 1 ) end_ARG start_ARG italic_b | italic_W | end_ARG , italic_G = 1 end_CELL end_ROW end_CELL end_ROW start_ROW start_CELL s.t. end_CELL start_CELL italic_A italic_c italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_A italic_c italic_c ( over^ start_ARG italic_W end_ARG start_POSTSUBSCRIPT caligraphic_I start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , { italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) < italic_η end_CELL end_ROW ,

where Wksubscript𝑊𝑘W_{k}italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the b𝑏bitalic_b-bit centroid for the k𝑘kitalic_k-th cluster, L{1,2,,K}|W|subscript𝐿superscript12𝐾𝑊\mathcal{I}_{L}\in\{1,2,\cdots,K\}^{|W|}caligraphic_I start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ∈ { 1 , 2 , ⋯ , italic_K } start_POSTSUPERSCRIPT | italic_W | end_POSTSUPERSCRIPT is the assigned cluster IDs for weights, and η𝜂\etaitalic_η is the threshold of the gap between ideal and resumed accuracy. The above optimization is performed independently for each layer. For the memory overhead mLsubscript𝑚𝐿m_{L}italic_m start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT: 1) (log2K)/Gsubscript2𝐾𝐺(\log_{2}K)/G( roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_K ) / italic_G denotes the number of bits required to store the cluster ID for each weight. 2) 1111 or 2/G2𝐺2/G2 / italic_G denotes the memory required to store the golden signature used in checksum-based detection (RADAR_DATE21_Li, ).

The algorithm to solve this optimization problem for layer-l𝑙litalic_l is illustrated in Fig. 7. To prioritize the lowest-memory solution, we initialize it to the largest group size G𝐺Gitalic_G=512 and minimum cluster count K𝐾Kitalic_K=1, and gradually search feasible solutions by halving G𝐺Gitalic_G and double increasing K𝐾Kitalic_K. To find the cluster centroids that minimize locking-induced accuracy loss, we augment conventional K-Means clustering to a locking-aware variant. We first perform single-cluster (K𝐾Kitalic_K=1) K-Means within each group. The distance dinsubscript𝑑𝑖𝑛d_{in}italic_d start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT from weight Wisubscript𝑊𝑖W_{i}italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to centroid Wn~~subscript𝑊𝑛\widetilde{W_{n}}over~ start_ARG italic_W start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG of n𝑛nitalic_n-th detection group is redefined as in Eq.(8)

(8) din=Wi(WiWn~)+12Wi2(WiWn~)2.subscript𝑑𝑖𝑛subscriptsubscript𝑊𝑖subscript𝑊𝑖~subscript𝑊𝑛12superscriptsubscriptsubscript𝑊𝑖2superscriptsubscript𝑊𝑖~subscript𝑊𝑛2\small d_{in}=\nabla_{W_{i}}\mathcal{L}\cdot(W_{i}-\widetilde{W_{n}})+\frac{1}% {2}\cdot\nabla_{W_{i}}^{2}\mathcal{L}\cdot(W_{i}-\widetilde{W_{n}})^{2}.italic_d start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT = ∇ start_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L ⋅ ( italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over~ start_ARG italic_W start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ ∇ start_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_L ⋅ ( italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over~ start_ARG italic_W start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

We then obtain N=|W|/G𝑁𝑊𝐺N=\lceil|W|/G\rceilitalic_N = ⌈ | italic_W | / italic_G ⌉ group centroids aware of locking errors Acc(W)Acc(W~)𝐴𝑐𝑐𝑊𝐴𝑐𝑐~𝑊Acc(W)-Acc(\widetilde{W})italic_A italic_c italic_c ( italic_W ) - italic_A italic_c italic_c ( over~ start_ARG italic_W end_ARG ). Since NKmuch-greater-than𝑁𝐾N\gg Kitalic_N ≫ italic_K, we further perform a standard K-Means clustering to the obtained N𝑁Nitalic_N centroids and get {Wk}k=1Ksuperscriptsubscriptsubscript𝑊𝑘𝑘1𝐾\{W_{k}\}_{k=1}^{K}{ italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT.

3.4. Synergistic Protection with Integrated TCU Encoding and Weight Locking

To provide double protection against bit-flip attacks, we leverage both pre-attack unary protection and post-attack weight locking with co-optimized memory overhead. Before deployment, given a protection rate α𝛼\alphaitalic_α, we first perform pre-attack unary protection in Alg. 1 to find the weight indices Usubscript𝑈\mathcal{I}_{U}caligraphic_I start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT and protect them by converting them to TCU-format, effectively reducing the sensitivity of vulnerable weights from MSB to LSB. Meanwhile, we prepare the locking solutions ({Wk}k=1K,L)superscriptsubscriptsubscript𝑊𝑘𝑘1𝐾subscript𝐿(\{W_{k}\}_{k=1}^{K},\mathcal{I}_{L})( { italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT , caligraphic_I start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) for each layer using the procedure in Fig. 7 given a target accuracy drop threshold η𝜂\etaitalic_η. After the attack happens, checksum-based detection is applied to pinpoint potential victim weight groups under adversarial attack. Then, all weights in the detected groups will be locked to their pre-assigned centroid for post-attack accuracy recovery. A carefully selected (α,η)𝛼𝜂(\alpha,\eta)( italic_α , italic_η ) setting gives the best post-attack accuracy and lowest memory overhead. Since the memory overhead tends to become very large in two extreme cases, i.e., pure unary protection or pure locking, the optimal solution in the middle range can be simply found by greedy search. We gradually reduce the unary protection rate α𝛼\alphaitalic_α, e.g., from 2% to 0.25%, and for each α𝛼\alphaitalic_α, we evaluate the overall memory cost and resumed accuracy for all η𝜂\etaitalic_η candidates, e.g., η{1%,1.5%,2%}𝜂percent1percent1.5percent2\eta\in\{1\%,1.5\%,2\%\}italic_η ∈ { 1 % , 1.5 % , 2 % }. The search is stopped when the memory overhead increases, and the most efficient solution can be selected.

4. Experimental Results

4.1. Experiment Setup

Table 2. Impact of batch-size BS𝐵𝑆BSitalic_B italic_S (size of 𝒟attsuperscript𝒟𝑎𝑡𝑡\mathcal{D}^{att}caligraphic_D start_POSTSUPERSCRIPT italic_a italic_t italic_t end_POSTSUPERSCRIPT) on post-attack accuracy (8-bit VGG8-CIFAR10) with different inference budgets Tinfsubscript𝑇𝑖𝑛𝑓T_{inf}italic_T start_POSTSUBSCRIPT italic_i italic_n italic_f end_POSTSUBSCRIPT. Accuracy with the same color corresponds to the same hardware cost (BS×Tinf𝐵𝑆subscript𝑇𝑖𝑛𝑓BS\times T_{inf}italic_B italic_S × italic_T start_POSTSUBSCRIPT italic_i italic_n italic_f end_POSTSUBSCRIPT). Bold texts show the lowest accuracy with the same color.
Inference Budget Tinfsubscript𝑇𝑖𝑛𝑓T_{inf}italic_T start_POSTSUBSCRIPT italic_i italic_n italic_f end_POSTSUBSCRIPT
Batch Size 20 40 80 160 320 640 1280
8 61.28 60.47 \cellcolor[HTML]F898A238.15 \cellcolor[HTML]F1BDA638.81 \cellcolor[HTML]F5DBA439.39 \cellcolor[HTML]FFFAB932.84 \cellcolor[HTML]C0F1C835.75
16 53.62 \cellcolor[HTML]F898A252.45 \cellcolor[HTML]F1BDA635.64 \cellcolor[HTML]F5DBA440.30 \cellcolor[HTML]FFFAB920.49 \cellcolor[HTML]C0F1C814.68 \cellcolor[HTML]94DCF816.72
32 \cellcolor[HTML]F898A262.91 \cellcolor[HTML]F1BDA650.76 \cellcolor[HTML]F5DBA446.01 \cellcolor[HTML]FFFAB928.12 \cellcolor[HTML]C0F1C817.50 \cellcolor[HTML]94DCF815.28 \cellcolor[HTML]A6C9EC13.06
64 \cellcolor[HTML]F1BDA664.43 \cellcolor[HTML]F5DBA449.26 \cellcolor[HTML]FFFAB936.24 \cellcolor[HTML]C0F1C822.27 \cellcolor[HTML]94DCF816.84 \cellcolor[HTML]A6C9EC13.39 28.04
128 \cellcolor[HTML]F5DBA465.35 \cellcolor[HTML]FFFAB950.16 \cellcolor[HTML]C0F1C842.74 \cellcolor[HTML]94DCF826.87 \cellcolor[HTML]A6C9EC17.49 12.42 11.08
Table 3. Defense efficiency of unary protection with two different methods across different protection rate α𝛼\alphaitalic_α and attacker inference budget Tinfsubscript𝑇𝑖𝑛𝑓T_{inf}italic_T start_POSTSUBSCRIPT italic_i italic_n italic_f end_POSTSUBSCRIPT. Accuracy is for 8-bit VGG8-CIFAR10. BS𝐵𝑆BSitalic_B italic_S=16 for BFA attacker.
Method Memory Overhead mUsubscript𝑚𝑈m_{U}italic_m start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT Protected Weight Percentage α𝛼\alphaitalic_α Worst Acc. Mean Acc.
Even Assignment 3.34% 0.10% 19.65 39.21
8.51% 0.25% 53.48 69.69
33.90% 1.00% 79.59 83.12
309.08% 9.00% 87.16 87.31
Top-Sensitive-Layer Assignment 3.34% 0.10% 75.43 80.06
6.70% 0.20% 78.86 83.27
16.96% 0.50% 80.02 84.03
67.90% 2.00% 87.14 87.27

Dataset and NN Models.  We evaluate our method on VGG-8 CIFAR-10 and ResNet-18 CIFAR-100 for image classification. We choose 4-bit, 6-bit, and 8-bit for weight quantization. Input activations are 8-bit.

Training Settings.  We pre-train all models for 200 epochs with an Adam optimizer with a 2E-3 learning rate, a cosine decay scheduler, 1E-4 weight decay, and data augmentation (random crop and flip). BatchNorm layers are all frozen after pretraining.

Benchmarks and Metrics.  We adopt the strongest attacker setting shown in Section 4.2.1 with HD𝐻𝐷HDitalic_H italic_D=100. To cover different attack budget scenarios, we sweep over budgets Tinfsubscript𝑇𝑖𝑛𝑓T_{inf}italic_T start_POSTSUBSCRIPT italic_i italic_n italic_f end_POSTSUBSCRIPT such that the number of bits flipped ranges from 2 to 100. We show averaged/worst/best inference accuracy on 5 attack datasets across all budgets. Gaussian weight noises (std.=0.005) are injected into all on-chip computations for attackers. We compare ours to two types of defense baselines.

(1) Training-based Defense – Binarization-aware Training (BAT) (DefendBFA_CVPR2020_He, ) quantized all weights to 1-bit to reduce bit-flip sensitivity. Also, we compare ours to noise-aware training (NAT) (NP_DATE2020_Gu, ), which is usually used to boost ONN robustness.

(2) Other Training-free Defense – We use a previous pruning-based protection method (RADAR_DATE21_Li, ) as another training-free baseline representing a special case in our weight locking, i.e., centroids are fixed to 0.

4.2. Ablation Study

4.2.1. Batch Size BS𝐵𝑆BSitalic_B italic_S and Inference Budget Tinfsubscript𝑇𝑖𝑛𝑓T_{inf}italic_T start_POSTSUBSCRIPT italic_i italic_n italic_f end_POSTSUBSCRIPT

To make sure we evaluate our method against the strongest attacker model, we first evaluate the most efficient batch size settings across different inference budgets in Table 2. We can conclude that 16 images are enough for the attacker to get informative sensitivity scores via stochastic gradient calculation to perform an effective bit-flip attack, which leads to the lowest post-attack accuracy given the same hardware cost (BS×Tinf𝐵𝑆subscript𝑇𝑖𝑛𝑓BS\times T_{inf}italic_B italic_S × italic_T start_POSTSUBSCRIPT italic_i italic_n italic_f end_POSTSUBSCRIPT). An overly small BS𝐵𝑆BSitalic_B italic_S gives inaccurate gradients, while too many images consume the inference budget rapidly, helping the attacker marginally.

Table 4. Memory overhead required by TCU-format and unary representation on 8-bit VGG8-CIFAR10.
Protected Weight Percent α𝛼\alphaitalic_α 0.05% 0.10% 0.20% 1.00% 2.00% 4.00%
Unary Encoding: mUsubscript𝑚𝑈m_{U}italic_m start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT 1.66% 3.34% 6.70% 33.90% 67.90% 136.39%
Proposed TCU: mTCUsubscript𝑚𝑇𝐶𝑈m_{TCU}italic_m start_POSTSUBSCRIPT italic_T italic_C italic_U end_POSTSUBSCRIPT 0.17% 0.51% 1.07% 3.39% 6.03% 12.70%
Reduction (mU/mTCUsubscript𝑚𝑈subscript𝑚𝑇𝐶𝑈m_{U}/m_{TCU}italic_m start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT / italic_m start_POSTSUBSCRIPT italic_T italic_C italic_U end_POSTSUBSCRIPT) 9.86×\times\downarrow× ↓ 6.51×\times\downarrow× ↓ 6.24×\times\downarrow× ↓ 9.99×\times\downarrow× ↓ 11.26×\times\downarrow× ↓ 10.74×\times\downarrow× ↓
Table 5. Results of post-attack accuracy recovery by proposed weight locking. 8-bit VGG-8 on CIFAR-10 is evaluated. G𝐺Gitalic_G and K𝐾Kitalic_K show solutions for all 6 convolutional and linear layers.
η𝜂\etaitalic_η (%) Layer-wise Weight Locking Solutions Mem. OV mLsubscript𝑚𝐿m_{L}italic_m start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT Inference Budget Tinfsubscript𝑇𝑖𝑛𝑓T_{inf}italic_T start_POSTSUBSCRIPT italic_i italic_n italic_f end_POSTSUBSCRIPT Mean Acc.
20 80 160 400 900
- w/o locking 0.00% 53.62 35.64 40.30 18.59 13.52 32.33
1 G=[1, 1, 4, 16, 128, 2] 1.29% 83.49 80.77 78.36 84.60 86.59 82.76
K=[8, 2, 1, 1, 2, 16]
1.5 G=[1, 1, 8, 16, 128, 2] 1.15% 79.82 80.96 76.67 72.01 86.59 79.21
K=[4, 1, 1, 1, 2, 1]
2 G=[1, 2, 16, 32, 128, 2] 0.97% 78.84 80.50 76.16 71.78 86.65 78.79
K=[4, 4, 1, 1, 1, 1]
Table 6. Main comparison results among our method with prior defense methods against BFA attackers.
Memory Overhead
Model + dataset Category Quant. Bit Defense Method Prior-attack Accuracy Best Acc. Worst Acc. Mean Acc. Training/Searching Runtime Pre (mTCUsubscript𝑚𝑇𝐶𝑈m_{TCU}italic_m start_POSTSUBSCRIPT italic_T italic_C italic_U end_POSTSUBSCRIPT) Post (mLsubscript𝑚𝐿m_{L}italic_m start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT) Total
4-bit - 87.73 82.24 59.87 75.85
6-bit - 88.00 75.66 33.58 61.74
w/o Def 8-bit - 88.00 53.62 13.52 32.89 -
1-bit BAT (RADAR_DATE21_Li, ) 87.09 86.12 74.62 80.39 0.33 hrs
4-bit NAT (NP_DATE2020_Gu, ) 87.96 83.32 66.71 77.06 2.8 hrs
6-bit NAT (NP_DATE2020_Gu, ) 87.19 77.68 33.39 64.78 2.8 hrs
Training-based 8-bit NAT (NP_DATE2020_Gu, ) 85.91 67.74 26.14 55.88 2.8 hrs -
Pruning (RADAR_DATE21_Li, ) 87.73 80.88 57.23 70.68 - 3.13% (G=16)
4-bit Ours 87.73 86.96 83.08 84.74 0.03hrs + 0.33 hrs 0.84% 0.000% 0.84%
Pruning (RADAR_DATE21_Li, ) 88.00 79.91 40.74 66.14 - 4.17% (G=8)
6-bit Ours 88.00 86.90 86.25 86.48 0.03hrs + 0.50 hrs 0.93% 1.11% 2.04%
Pruning (RADAR_DATE21_Li, ) 88.00 70.11 18.68 48.59 - 3.13% (G=8)
VGG-8 + CIFAR10 Training-free 8-bit Ours 88.00 87.21 86.08 86.73 0.03hrs + 0.75 hrs 1.07% 1.29% 2.36%
4-bit - 61.28 56.49 49.06 54.76
6-bit - 60.61 41.45 3.69 22.89
w/o Def 8-bit - 60.22 38.01 2.19 9.93 -
1-bit BAT (DefendBFA_CVPR2020_He, ) 60.03 59.69 54.01 56.84 0.4 hrs
4-bit NAT (NP_DATE2020_Gu, ) 61.58 58.27 46.72 51.81 8.9 hrs
6-bit NAT (NP_DATE2020_Gu, ) 60.61 39.24 7.20 22.89 8.9 hrs
Training-based 8-bit NAT (NP_DATE2020_Gu, ) 60.22 38.98 2.19 12.99 8.9 hrs -
Pruning (RADAR_DATE21_Li, ) 61.28 57.24 42.02 47.68 - 3.13% (G=16)
4-bit Ours 61.28 60.20 53.78 57.40 0.04 hrs + 1.17 hrs 1.12% 1.05% 2.17%
Pruning (RADAR_DATE21_Li, ) 60.61 54.96 10.99 44.76 - 4.17% (G=8)
6-bit Ours 60.61 59.46 58.21 58.88 0.04 hrs + 1.62 hrs 1.06% 1.20% 2.26%
Pruning (RADAR_DATE21_Li, ) 60.22 53.62 16.33 39.72 - 3.13% (G=8)
ResNet-18 + CIFAR100 Training-free 8-bit Ours 60.22 58.82 57.21 58.10 0.04 hrs + 3.01 hrs 1.26% 1.77% 3.03%

4.2.2. Memory Overhead Assignment in Pre-Attack Unary Protection

We compare Top-Sensitive-Layer Assignment to an Even Assignment baseline, i.e., mUl=mU/Lsuperscriptsubscript𝑚𝑈𝑙subscript𝑚𝑈𝐿m_{U}^{l}=m_{U}/Litalic_m start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = italic_m start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT / italic_L, in Table 3. With unary-coded pre-attack protection, the even assignment method consumes significantly higher memory overhead than our sensitivity-aware method when reaching the same level of post-attack accuracy.

4.2.3. Truncated Complementary Unary Protection

Table 4 compares the memory storage required by Unary Protection and Truncated Complementary Unary Protection. TCU can significantly reduce the memory overhead by 6-11×\times× than the original unary encoding.

4.2.4. Post-Attack Weight Locking

For weight locking, the accuracy drop threshold η𝜂\etaitalic_η is the key parameter to balance resumed accuracy and memory overhead. In Table 5, we show the performance of the proposed Weight Locking on 8-bit VGG-8 and CIFAR-10. Within the layer-wise acceptable accuracy drop η𝜂\etaitalic_η, Weight Locking can provide significant post-attack accuracy recovery with only less than 2% of extra memory overhead. However, to achieve lower memory overhead, accuracy recovery will be largely compromised from 87% to 70%. Weight Locking actually employs the accuracy-storage trade-off.

We also compare our method with Weight Pruning proposed in (RADAR_DATE21_Li, ). Fig. 8(a) shows the effectiveness and memory overhead of protection by Weight Pruning under different detection group sizes G𝐺Gitalic_G. Weight Locking can achieve higher accuracy recovery with more than 10×\times× reduction in memory consumption.

4.2.5. Optimal Combination of TCU and Locking

Fig. 8(b) presents the searching process with different combinations of (α,η)𝛼𝜂(\alpha,\eta)( italic_α , italic_η ). Pure unary protection will consume high memory overhead to achieve effective protection, while pure locking cannot offer comparable accuracy recovery. (α,η)=(0.2%,1)𝛼𝜂percent0.21(\alpha,\eta)=(0.2\%,1)( italic_α , italic_η ) = ( 0.2 % , 1 ) will give the optimal solution considering both the accuracy recovery and memory overhead.

Refer to caption
((a))
Refer to caption
((b))
Figure 8. (a) Weight locking outperforms pruning (RADAR_DATE21_Li, ) with higher resumed accuracy and lower memory overhead. (b) Average resumed accuracy and memory overhead with various protection rates α𝛼\alphaitalic_α in TCU and accuracy drop thresholds η𝜂\etaitalic_η in locking with 8-bit VGG8-CIFAR10.

4.3. Main Results

In Table 6, we compare our TCU+Locking scheme with NAT (NP_DATE2020_Gu, ) (std.=0.005 weight noise injection), BAT (DefendBFA_CVPR2020_He, ), and pruning (RADAR_DATE21_Li, ). Our method can resume accuracy with only a 2% drop after BFA attacks at a marginal 3% memory overhead, significantly outperforming all prior arts. Our method is training-free, which also saves significant runtime compared to training-based methods.

4.4. Discussion: Can Noises Become Defender?

Besides low-bit quantization and sparsity, on-chip hardware noises are the main source of non-idealities. While noises often degrade the inference accuracy, they also hinder the attack process by adding uncertainty to loss functions or gradients. To reduce the uncertainty, attackers might need to average over multiple (NSsubscript𝑁𝑆N_{S}italic_N start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT) samples, which equivalently reduces the attack efficiency.

Table 7. BFA attack performance with different samples NSsubscript𝑁𝑆N_{S}italic_N start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT on 8-bit VGG8-CIFAR10. BS𝐵𝑆BSitalic_B italic_S=16.
Sample Times NSsubscript𝑁𝑆N_{S}italic_N start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT Inference Budget Tinfsubscript𝑇𝑖𝑛𝑓T_{inf}italic_T start_POSTSUBSCRIPT italic_i italic_n italic_f end_POSTSUBSCRIPT Mean Acc
60 120 240 400 480
1 44.94 34.83 20.88 16.14 13.61 26.08
2 56.24 47.32 29.79 25.19 20.14 35.74
3 60.47 50.86 39.58 35.46 25.84 42.44

However, we find that sampling the noisy gradients (noise std.=0.005) once without averaging gives the best attack performance.

Table 8. Performance of adversarial attacker with different on-chip Gaussian weight noise level (std.) on 8-bit VGG8-CIFAR10.
noise std. \\\backslash\ Tinfsubscript𝑇𝑖𝑛𝑓T_{inf}italic_T start_POSTSUBSCRIPT italic_i italic_n italic_f end_POSTSUBSCRIPT 0 40 180 400 600 800 900
0.005 87.58 52.45 35.18 18.59 17.02 15.67 13.52
0.01 86.82 59.68 34.64 18.42 16.15 31.88 21.77
0.02 83.81 54.87 32.12 21.97 17.21 34.19 14.48
0.03 74.69 41.06 25.82 20.74 17.29 20.45 18.10

We increase the noise intensity in Table 8 to create higher uncertainty for the BFA attacker. Unfortunately, adding larger Gaussian noises to the weights during on-chip computations does not provide clear protection effects but leads to severe accuracy drops. It is worth investigating potential protective noise injection and the trade-offs between noise-induced error and protection effects in the future.

5. Conclusion

In this work, for the first time, we investigate the security issue of analog optical neural networks and present a novel nonideality-enabled built-in defender against adversarial bit-flip attacks. We introduce quantization-inspired pre-attack protection based on truncated complementary unary weight representation to minimize the weight sensitivity with optimized memory overhead. A complementary pruning-inspired weight-locking method is introduced to resume accuracy with precise error correction. Our method outperforms prior defense approaches with near-ideal accuracy recovery under bit-flip attacks with marginal (<<<3%) memory overhead. Our work makes significant strides toward reliable ONN against adversarial weight attacks and unlocking future applications in security-thirst scenarios.

References

  • (1) Yichen Shen, Nicholas C. Harris, Scott Skirlo, et al. Deep learning with coherent nanophotonic circuits. Nature Photonics, 2017.
  • (2) Q. Cheng, J. Kwon, M. Glick, M. Bahadori, L. P. Carloni, and K. Bergman. Silicon Photonics Codesign for Deep Learning. Proceedings of the IEEE, 2020.
  • (3) Bhavin J. Shastri, Alexander N. Tait, et al. Photonics for Artificial Intelligence and Neuromorphic Computing. Nature Photonics, 2021.
  • (4) Chenghao Feng, Jiaqi Gu, Hanqing Zhu, Zhoufeng Ying, Zheng Zhao, et al. A compact butterfly-style silicon photonic–electronic neural chip for hardware-efficient deep learning. ACS Photonics, 9(12):3906–3916, 2022.
  • (5) Zhihao Xu, Tiankuang Zhou, Muzhou Ma, ChenChen Deng, Qionghai Dai, and Lu Fang. Large-scale photonic chiplet taichi empowers 160-tops/w artificial general intelligence. Science, 384(6692):202–209, 2024.
  • (6) Alexander N. Tait, Thomas Ferreira de Lima, Ellen Zhou, et al. Neuromorphic photonic networks using silicon photonic weight banks. Sci. Rep., 2017.
  • (7) Xingyuan Xu, Mengxi Tan, Bill Corcoran, Jiayang Wu, Andreas Boes, Thach G. Nguyen, Sai T. Chu, Brent E. Little, Damien G. Hicks, Roberto Morandotti, Arnan Mitchell, and David J. Moss. 11 TOPS photonic convolutional accelerator for optical neural networks. Nature, 2021.
  • (8) Johannes Feldmann, Nathan Youngblood, Maxim Karpov, Helge Gehring, Xuan Li, Maik Stappers, Manuel Le Gallo, Xin Fu, Anton Lukashchuk, Arslan Raja, Junqiu Liu, David Wright, Abu Sebastian, Tobias Kippenberg, Wolfram Pernice, and Harish Bhaskaran. Parallel convolutional processing using an integrated photonic tensor core. Nature, 2021.
  • (9) H.H. Zhu, J. Zou, H. Zhang, et al. Space-efficient optical computing with an integrated chip diffractive neural network. Nature Commun., 2022.
  • (10) Jiaqi Gu, Zheng Zhao, Chenghao Feng, Hanqing Zhu, Ray T. Chen, and David Z. Pan. ROQ: A noise-aware quantization scheme towards robust optical neural networks with low-bit controls. In Proc. DATE, 2020.
  • (11) Zheng Zhao, Jiaqi Gu, Zhoufeng Ying, et al. Design technology for scalable and robust photonic integrated circuits. In Proc. ICCAD, 2019.
  • (12) Ying Zhu, Grace Li Zhang, Bing Li, et al. Countering Variations and Thermal Effects for Accurate Optical Neural Networks. In Proc. ICCAD, 2020.
  • (13) Asif Mirza, Febin Sunny, et al. Silicon photonic microring resonators: A comprehensive design-space exploration and optimization under fabrication-process variations. IEEE TCAD, 41(10):3359–3372, 2022.
  • (14) Yannan Liu, Lingxiao Wei, Bo Luo, and Qiang Xu. Fault injection attack on deep neural network. In Proc. ICCAD, 2017.
  • (15) A. Rakin, Z. He, and D. Fan. Bit-flip attack: Crushing neural network with progressive bit search. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 1211–1220, 2019.
  • (16) Adnan Siraj Rakin, Zhezhi He, Jingtao Li, Fan Yao, Chaitali Chakrabarti, and Deliang Fan. T-bfa: Targeted bit-flip adversarial weight attack. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7928–7939, 2022.
  • (17) Zhezhi He, Adnan Siraj Rakin, Jingtao Li, Chaitali Chakrabarti, and Deliang Fan. Defending and harnessing the bit-flip based adversarial weight attack. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14083–14091, 2020.
  • (18) Jingtao Li, Adnan Siraj Rakin, Yan Xiong, Liangliang Chang, Zhezhi He, Deliang Fan, and Chaitali Chakrabarti. Defending bit-flip attack through dnn weight reconstruction. In Proc. DAC, 2020.
  • (19) Jingtao Li, Adnan Siraj Rakin, Zhezhi He, Deliang Fan, and Chaitali Chakrabarti. Radar: Run-time adversarial weight attack detection and accuracy recovery. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 790–795, 2021.
  • (20) Qi Liu, Wujie Wen, and Yanzhi Wang. Concurrent weight encoding-based detection for bit-flip attack on neural network accelerators. In Proc. ICCAD, 2020.
  • (21) W. Liu, W. Liu, Y. Ye, Q. Lou, Y. Xie, and L. Jiang. Holylight: A nanophotonic accelerator for deep learning in data centers. In Proc. DATE, 2019.
  • (22) Hanqing Zhu, Jiaqi Gu, Hanrui Wang, et al. Lightening-transformer: A dynamically-operated optically-interconnected photonic transformer accelerator. In Proc. HPCA, pages 686–703, 2024.
  • (23) Alireza Samani, David Patel, Mathieu Chagnon, Eslam El-Fiky, Rui Li, Maxime Jacques, Nicolás Abadía, Venkat Veerasubramanian, and David V. Plant. Experimental parametric study of 128 Gb/s PAM-4 transmission system using a multi-electrode silicon photonic Mach Zehnder modulator. Opt. Express, 25(12):13252, June 2017.
  • (24) Sajjad Moazeni, Sen Lin, Mark Wade, Luca Alloatti, Rajeev J. Ram, Milos Popovic, and Vladimir Stojanovic. A 40-Gb/s PAM-4 Transmitter Based on a Ring-Resonator Optical DAC in 45-nm SOI CMOS. IEEE J. Solid-State Circuits, 52(12):3503–3516, December 2017.
  • (25) Liang Liu, Yanan Guo, Yueqiang Cheng, Youtao Zhang, and Jun Yang. Generating robust dnn with resistance to bit-flip based adversarial weight attack. IEEE Transactions on Computers, 72(2):401–413, 2023.
  • (26) Jiaqi Gu, Zheng Zhao, Chenghao Feng, et al. Towards Hardware-Efficient Optical Neural Networks: Beyond FFT Architecture via Joint Learnability. IEEE TCAD, 2020.
  • (27) Sanmitra Banerjee, Mahdi Nikdast, Sudeep Pasricha, and Krishnendu Chakrabarty. Pruning coherent integrated photonic neural networks. IEEE Journal of Selected Topics in Quantum Electronics, 29:1–13, 2023.
  • (28) Ziang Yin, Nicholas Gangi, Meng Zhang, Jeff Zhang, Rena Huang, and Jiaqi Gu. Scatter: Algorithm-circuit co-sparse photonic accelerator with thermal-tolerant, power-efficient in-situ light redistribution. In Proc. ICCAD, 2024.
  • (29) Sanmitra Banerjee, Mahdi Nikdast, Sudeep Pasricha, and Krishnendu Chakrabarty. Pruning coherent integrated photonic neural networks using the lottery ticket hypothesis. In IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2022.
  • (30) Sanmitra Banerjee, Mahdi Nikdast, Sudeep Pasricha, and Krishnendu Chakrabarty. Champ: Coherent hardware-aware magnitude pruning of integrated photonic neural networks. In Optical Fiber Communication Conference (OFC), 2022.