subscribe to arXiv mailings

Heracles: A HfO$\mathrm{_2}$ Ferroelectric Capacitor Compact Model for Efficient Circuit Simulations

Authors: Luca Fehlings, Md Hanif Ali, Paolo Gibertini, Egidio A. Gallicchio, Udayan Ganguly, Veeresh Deshpande, Erika Covi

Abstract: This paper presents a physics-based compact model for circuit simulations in a SPICE environment for HfO2-based ferroelectric capacitors (FeCaps). The model has been calibrated based on experimental data obtained from HfO2-based FeCaps. A thermal model with an accurate description of the device parasitics is included to derive precise device characteristics based on first principles. The model inc… ▽ More This paper presents a physics-based compact model for circuit simulations in a SPICE environment for HfO2-based ferroelectric capacitors (FeCaps). The model has been calibrated based on experimental data obtained from HfO2-based FeCaps. A thermal model with an accurate description of the device parasitics is included to derive precise device characteristics based on first principles. The model incorporates statistical data that enables Monte Carlo analysis based on realistic distributions, thereby making it particularly well-suited for design-technology co-optimization (DTCO). Furthermore, the model is demonstrated in circuit simulations using an integrated circuit with current programming, wherein partial switching of the ferroelectric polarization is observed. Finally, the model was benchmarked in an array simulation, reaching convergence in 1.8 s with an array size of 100 kb. △ Less

Submitted 10 October, 2024; originally announced October 2024.

Comments: 6 pages, 7 figures

arXiv:2402.09792 [pdf]

System-level Impact of Non-Ideal Program-Time of Charge Trap Flash (CTF) on Deep Neural Network

Authors: S. Shrivastava, A. Biswas, S. Chakrabarty, G. Dash, V. Saraswat, U. Ganguly

Abstract: Learning of deep neural networks (DNN) using Resistive Processing Unit (RPU) architecture is energy-efficient as it utilizes dedicated neuromorphic hardware and stochastic computation of weight updates for in-memory computing. Charge Trap Flash (CTF) devices can implement RPU-based weight updates in DNNs. However, prior work has shown that the weight updates (V_T) in CTF-based RPU are impacted by… ▽ More Learning of deep neural networks (DNN) using Resistive Processing Unit (RPU) architecture is energy-efficient as it utilizes dedicated neuromorphic hardware and stochastic computation of weight updates for in-memory computing. Charge Trap Flash (CTF) devices can implement RPU-based weight updates in DNNs. However, prior work has shown that the weight updates (V_T) in CTF-based RPU are impacted by the non-ideal program time of CTF. The non-ideal program time is affected by two factors of CTF. Firstly, the effects of the number of input pulses (N) or pulse width (pw), and secondly, the gap between successive update pulses (t_gap) used for the stochastic computation of weight updates. Therefore, the impact of this non-ideal program time must be studied for neural network training simulations. In this study, Firstly, we propose a pulse-train design compensation technique to reduce the total error caused by non-ideal program time of CTF and stochastic variance of a network. Secondly, we simulate RPU-based DNN with non-ideal program time of CTF on MNIST and Fashion-MNIST datasets. We find that for larger N (~1000), learning performance approaches the ideal (software-level) training level and, therefore, is not much impacted by the choice of t_gap used to implement RPU-based weight updates. However, for lower N (<500), learning performance depends on T_gap of the pulses. Finally, we also performed an ablation study to isolate the causal factor of the improved learning performance. We conclude that the lower noise level in the weight updates is the most likely significant factor to improve the learning performance of DNN. Thus, our study attempts to compensate for the error caused by non-ideal program time and standardize the pulse length (N) and pulse gap (t_gap) specifications for CTF-based RPUs for accurate system-level on-chip training. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2311.18577 [pdf]

Design Space and Variability Analysis of SOI MOSFET for Ultra-Low Power Band-to-Band Tunneling Neurons

Authors: Jay Sonawane, Shubham Patil, Abhishek Kadam, Ajay Kumar Singh, Sandip Lashkare, Veeresh Deshpande, Udayan Ganguly

Abstract: Large spiking neural networks (SNNs) require ultra-low power and low variability hardware for neuromorphic computing applications. Recently, a band-to-band tunneling-based (BTBT) integrator, enabling sub-kHz operation of neurons with area and energy efficiency, was proposed. For an ultra-low power implementation of such neurons, a very low BTBT current is needed, so minimizing current without degr… ▽ More Large spiking neural networks (SNNs) require ultra-low power and low variability hardware for neuromorphic computing applications. Recently, a band-to-band tunneling-based (BTBT) integrator, enabling sub-kHz operation of neurons with area and energy efficiency, was proposed. For an ultra-low power implementation of such neurons, a very low BTBT current is needed, so minimizing current without degrading neuronal properties is essential. Low variability is needed in the ultra-low current integrator to avoid network performance degradation in a large BTBT neuron-based SNN. To address this, we conducted design space and variability analysis in TCAD, utilizing a well-calibrated TCAD deck with experimental data from GlobalFoundries 32nm PD-SOI MOSFET. First, we discuss the physics-based explanation of the tunneling mechanism. Second, we explore the impact of device design parameters on SOI MOSFET performance, highlighting parameter sensitivities to tunneling current. With device parameters' optimization, we demonstrate a ~20x reduction in BTBT current compared to the experimental data. Finally, a variability analysis that includes the effects of random dopant fluctuations (RDF), oxide thickness variability (OTV), and channel-oxide interface traps DIT in the BTBT, SS, and ON regimes of operation is shown. The BTBT regime shows high sensitivity to the RDF and OTV as any variation in them directly modulates the tunnel length or the electric field at the drain-channel junction, whereas minimal sensitivity to DIT is observed. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2307.06088 [pdf]

doi 10.1088/1361-6641/aceea6

Non-Ideal Program-Time Conservation in Charge Trap Flash for Deep Learning

Authors: Shalini Shrivastava, Vivek Saraswat, Gayatri Dash, Samyak Chakrabarty, Udayan Ganguly

Abstract: Training deep neural networks (DNNs) is computationally intensive but arrays of non-volatile memories like Charge Trap Flash (CTF) can accelerate DNN operations using in-memory computing. Specifically, the Resistive Processing Unit (RPU) architecture uses the voltage-threshold program by stochastic encoded pulse trains and analog memory features to accelerate vector-vector outer product and weight… ▽ More Training deep neural networks (DNNs) is computationally intensive but arrays of non-volatile memories like Charge Trap Flash (CTF) can accelerate DNN operations using in-memory computing. Specifically, the Resistive Processing Unit (RPU) architecture uses the voltage-threshold program by stochastic encoded pulse trains and analog memory features to accelerate vector-vector outer product and weight update for the gradient descent algorithms. Although CTF, offering high precision, has been regarded as an excellent choice for implementing RPU, the accumulation of charge due to the applied stochastic pulse trains is ultimately of critical significance in determining the final weight update. In this paper, we report the non-ideal program-time conservation in CTF through pulsing input measurements. We experimentally measure the effect of pulse width and pulse gap, keeping the total ON-time of the input pulse train constant, and report three non-idealities: (1) Cumulative V_T shift reduces when total ON-time is fragmented into a larger number of shorter pulses, (2) Cumulative V_T shift drops abruptly for pulse widths < 2 μs, (3) Cumulative V_T shift depends on the gap between consecutive pulses and the V_T shift reduction gets recovered for smaller gaps. We present an explanation based on a transient tunneling field enhancement due to blocking oxide trap-charge dynamics to explain these non-idealities. Identifying and modeling the responsible mechanisms and predicting their system-level effects during learning is critical. This non-ideal accumulation is expected to affect algorithms and architectures relying on devices for implementing mathematically equivalent functions for in-memory computing-based acceleration. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Report number: Semiconductor Science Technology 38 105008

Journal ref: Published 6 September 2023 IOP Publishing Ltd

arXiv:2307.04705 [pdf]

Ferroelectric MirrorBit-Integrated Field-Programmable Memory Array for TCAM, Storage, and In-Memory Computing Applications

Authors: Paritosh Meihar, Rowtu Srinu, Sandip Lashkare, Ajay Kumar Singh, Halid Mulaosmanovic, Veeresh Deshpande, Stefan Dünkel, Sven Beyer, Udayan Ganguly

Abstract: In-memory computing on a reconfigurable architecture is the emerging field which performs an application-based resource allocation for computational efficiency and energy optimization. In this work, we propose a Ferroelectric MirrorBit-integrated field-programmable reconfigurable memory. We show the conventional 1-Bit FeFET, the MirrorBit, and MirrorBit-based Ternary Content-addressable memory (MC… ▽ More In-memory computing on a reconfigurable architecture is the emerging field which performs an application-based resource allocation for computational efficiency and energy optimization. In this work, we propose a Ferroelectric MirrorBit-integrated field-programmable reconfigurable memory. We show the conventional 1-Bit FeFET, the MirrorBit, and MirrorBit-based Ternary Content-addressable memory (MCAM or MirrorBit-based TCAM) within the same field-programmable array. Apart from the conventional uniform Up and Down polarization states, the additional states in the MirrorBit are programmed by applying a non-uniform electric field along the transverse direction, which produces a gradient in the polarization and the conduction band energy. This creates two additional states, thereby, creating a total of 4 states or 2-bit of information. The gradient in the conduction band resembles a Schottky barrier (Schottky diode), whose orientation can be configured by applying an appropriate field. The TCAM operation is demonstrated using the MirrorBit-based diode on the reconfigurable array. The reconfigurable array architecture can switch from AND-type to NOR-type and vice-versa. The AND-type array is appropriate for programming the conventional bit and the MirrorBit. The MirrorBit-based Schottky diode in the NOR-array resembles a crossbar structure, which is appropriate for diode-based CAM operation. Our proposed memory system can enable fast write via 1-bit FeFET, the dense data storage capability by Mirror-bit technology and the fast search capability of the MCAM. Further, the dual configurability enables power, area and speed optimization making the reconfigurable Fe-Mirrorbit memory a compelling solution for In-memory and associative computing. △ Less

Submitted 10 July, 2023; originally announced July 2023.

arXiv:2306.11640 [pdf]

Process Voltage Temperature Variability Estimation of Tunneling Current for Band-to-Band-Tunneling based Neuron

Authors: Shubham Patil, Anand Sharma, Gaurav R, Abhishek Kadam, Ajay Kumar Singh, Sandip Lashkare, Nihar Ranjan Mohapatra, Udayan Ganguly

Abstract: Compact and energy-efficient Synapse and Neurons are essential to realize the full potential of neuromorphic computing. In addition, a low variability is indeed needed for neurons in Deep neural networks for higher accuracy. Further, process (P), voltage (V), and temperature (T) variation (PVT) are essential considerations for low-power circuits as performance impact and compensation complexities… ▽ More Compact and energy-efficient Synapse and Neurons are essential to realize the full potential of neuromorphic computing. In addition, a low variability is indeed needed for neurons in Deep neural networks for higher accuracy. Further, process (P), voltage (V), and temperature (T) variation (PVT) are essential considerations for low-power circuits as performance impact and compensation complexities are added costs. Recently, band-to-band tunneling (BTBT) neuron has been demonstrated to operate successfully in a network to enable a Liquid State Machine. A comparison of the PVT with competing modes of operation (e.g., BTBT vs. sub-threshold and above threshold) of the same transistor is a critical factor in assessing performance. In this work, we demonstrate the PVT variation impact in the BTBT regime and benchmark the operation against the subthreshold slope (SS) and ON-regime (ION) of partially depleted-Silicon on Insulator MOSFET. It is shown that the On-state regime offers the lowest variability but dissipates higher power. Hence, not usable for low-power sources. Among the BTBT and SS regimes, which can enable the low-power neuron, the BTBT regime has shown ~3x variability reduction (σ_I_D/μ_I_D) than the SS regime, considering the cumulative PVT variability. The improvement is due to the well-known weaker P, V, and T dependence of BTBT vs. SS. We show that the BTBT variation is uncorrelated with mutually correlated SS & ION operation - indicating its different origin from the mechanism and location perspectives. Hence, the BTBT regime is promising for low-current, low-power, and low device-to-device variability neuron operation. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2304.12924 [pdf]

Evolution of ferroelectricity with annealing temperature and thickness in sputter deposited undoped HfO$_2$ on silicon

Authors: Md Hanif Ali, Adityanarayan Pandey, Rowtu Srinu, Paritosh Meihar, Shubham Patil, Sandip Lashkare, Udayan Ganguly

Abstract: Ferroelectricity in sputtered undoped-HfO$_2$ is attractive for composition control for low power and non-volatile memory and logic applications. Unlike doped HfO$_2$, evolution of ferroelectricity with annealing and film thickness effect in sputter deposited undoped HfO$_2$ on Si is not yet reported. In present study, we have demonstrated the impact of post metallization annealing temperature and… ▽ More Ferroelectricity in sputtered undoped-HfO$_2$ is attractive for composition control for low power and non-volatile memory and logic applications. Unlike doped HfO$_2$, evolution of ferroelectricity with annealing and film thickness effect in sputter deposited undoped HfO$_2$ on Si is not yet reported. In present study, we have demonstrated the impact of post metallization annealing temperature and film thickness on ferroelectric properties in dopant-free sputtered HfO$_2$ on Si-substrate. A rich correlation of polarization with phase, lattice constant, and crystallite size and interface reaction is observed. First, anneal temperature shows o-phase saturation beyond 600 oC followed by interface reaction beyond 700 oC to show an optimal temperature window on 600-700 oC. Second, thickness study at the optimal temperature window shows an alluring o-phase crystallite scaling with thickness till a critical thickness of 20 nm indicating that the films are completely o-phase. However, the lattice constants (volume) are high in the 15-20 nm thickness range which correlates with the enhanced value of 2Pr. Beyond 20 nm, crystallite scaling with thickness saturates with the correlated appearance of m-phase and reduction in 2Pr. The optimal thickness-temperature window range of 15-20 nm films annealed at 600-700 oC show 2Pr of ~35.5 micro-C/cm$^2$ is comparable to state-of-the-art. The robust wakeup-free endurance of ~$10^$8 cycles showcased in the promising temperature-thickness window has been identified systematically for non-volatile memory applications. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: 7 pages, 7 figures, 2 tables, IEEE TED journal

arXiv:2304.08504 [pdf]

Schottky Barrier MOSFET Enabled Ultra-Low Power Real-Time Neuron for Neuromorphic Computing

Authors: Shubham Patil, Jayatika Sakhuja, Ajay Kumar Singh, Anmol Biswas, Vivek Saraswat, Sandeep Kumar, Sandip Lashkare, Udayan Ganguly

Abstract: Energy-efficient real-time synapses and neurons are essential to enable large-scale neuromorphic computing. In this paper, we propose and demonstrate the Schottky-Barrier MOSFET-based ultra-low power voltage-controlled current source to enable real-time neurons for neuromorphic computing. Schottky-Barrier MOSFET is fabricated on a Silicon-on-insulator platform with polycrystalline Silicon as the c… ▽ More Energy-efficient real-time synapses and neurons are essential to enable large-scale neuromorphic computing. In this paper, we propose and demonstrate the Schottky-Barrier MOSFET-based ultra-low power voltage-controlled current source to enable real-time neurons for neuromorphic computing. Schottky-Barrier MOSFET is fabricated on a Silicon-on-insulator platform with polycrystalline Silicon as the channel and Nickel/Platinum as the source/drain. The Poly-Si and Nickel make the back-to-back Schottky junction enabling ultra-low ON current required for energy-efficient neurons. △ Less

Submitted 17 April, 2023; originally announced April 2023.

arXiv:2304.03124 [pdf]

FeFET-based MirrorBit cell for High-density NVM storage

Authors: Paritosh Meihar, Rowtu Srinu, Vivek Saraswat, Sandip Lashkare, Halid Mulaosmanovic, Ajay Kumar Singh, Stefan Dünkel, Sven Beyer, Udayan Ganguly

Abstract: HfO2-based Ferroelectric field-effect transistor (FeFET) has become a center of attraction for non-volatile memory applications because of their low power, fast switching speed, high scalability, and CMOS compatibility. In this work, we show an n-channel FeFET-based Multibit memory, termed MirrorBit, which effectively doubles the chip density via programming the gradient ferroelectric polarization… ▽ More HfO2-based Ferroelectric field-effect transistor (FeFET) has become a center of attraction for non-volatile memory applications because of their low power, fast switching speed, high scalability, and CMOS compatibility. In this work, we show an n-channel FeFET-based Multibit memory, termed MirrorBit, which effectively doubles the chip density via programming the gradient ferroelectric polarizations in the gate using an appropriate biasing scheme. We have experimentally demonstrated MirrorBit on GlobalFoundries HfO2-based FeFET devices fabricated at 28 nm bulk HKMG CMOS technology. Retention of MirrorBit states has been shown up to $10^5$ s at different temperatures. Also, the endurance is found to be more than $10^3$ cycles. A TCAD simulation is also presented to explain the origin and working of MirrorBit states based on the FeFET model calibrated using the GlobalFoundries FeFET device. We have also proposed the array-level implementation and sensing methodology of the MirrorBit memory. Thus, we have converted 1-bit FeFET into 2-bit FeFET using a particular programming scheme in existing FeFET, without needing any notable fabrication process alteration, to double the chip density for high-density non-volatile memory storage. △ Less

Submitted 14 September, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

Comments: 6 pages, 9 figures

arXiv:2212.05026 [pdf]

doi 10.1109/TED.2023.3277804

Interlayer-engineered local epitaxial templating induced enhancement in polarization (2P$_r$ > 70$μ$C/cm$^2$) in Hf$_{0.5}$Zr$_{0.5}$O$_2$ thin films

Authors: Srinu Rowtu, Paritosh Meihar, Adityanarayan Pandey, Md. Hanif Ali, Sandip Lashkare, Udayan Ganguly

Abstract: In this work, we report a high remnant polarization, 2Pr >70$μ$C/cm$^2$ in thermally processed atomic layer deposited Hf0.5Zr0.5O2 (HZO) film on Silicon with NH3 plasma exposed thin TiN interlayer and Tungsten (W) as a top electrode. The effect of interlayer on the ferroelectric properties of HZO is compared with standard Metal-Ferroelectric-Metal and Metal-Ferroelectric-Semiconductor structures.… ▽ More In this work, we report a high remnant polarization, 2Pr >70$μ$C/cm$^2$ in thermally processed atomic layer deposited Hf0.5Zr0.5O2 (HZO) film on Silicon with NH3 plasma exposed thin TiN interlayer and Tungsten (W) as a top electrode. The effect of interlayer on the ferroelectric properties of HZO is compared with standard Metal-Ferroelectric-Metal and Metal-Ferroelectric-Semiconductor structures. X-Ray Diffraction shows that the Orthorhombic (o) phase increases as TiN is thinned. However, the strain in the o-phase is highest at 2 nm TiN and then relaxes significantly for the no-TiN case. HRTEM images reveal that the ultra-thin TiN acts as a seed layer for the local epitaxy in HZO potentially increasing the strain to produce a 2X improvement in the remnant polarization. Finally, the HZO devices are shown to be wake-up-free, and exhibit endurance >10^6 cycles. This study opens a pathway to achieve epitaxial ferroelectric HZO films on Si with improved memory performance. △ Less

Submitted 1 June, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

Comments: 6 pages, 6 figures

arXiv:2207.09755 [pdf]

doi 10.1088/2634-4386/acf1c5

A temporally and spatially local spike-based backpropagation algorithm to enable training in hardware

Authors: Anmol Biswas, Vivek Saraswat, Udayan Ganguly

Abstract: Spiking Neural Networks (SNNs) have emerged as a hardware efficient architecture for classification tasks. The challenge of spike-based encoding has been the lack of a universal training mechanism performed entirely using spikes. There have been several attempts to adopt the powerful backpropagation (BP) technique used in non-spiking artificial neural networks (ANN): (1) SNNs can be trained by ext… ▽ More Spiking Neural Networks (SNNs) have emerged as a hardware efficient architecture for classification tasks. The challenge of spike-based encoding has been the lack of a universal training mechanism performed entirely using spikes. There have been several attempts to adopt the powerful backpropagation (BP) technique used in non-spiking artificial neural networks (ANN): (1) SNNs can be trained by externally computed numerical gradients. (2) A major advancement towards native spike-based learning has been the use of approximate Backpropagation using spike-time dependent plasticity (STDP) with phased forward/backward passes. However, the transfer of information between such phases for gradient and weight update calculation necessitates external memory and computational access. This is a challenge for standard neuromorphic hardware implementations. In this paper, we propose a stochastic SNN based Back-Prop (SSNN-BP) algorithm that utilizes a composite neuron to simultaneously compute the forward pass activations and backward pass gradients explicitly with spikes. Although signed gradient values are a challenge for spike-based representation, we tackle this by splitting the gradient signal into positive and negative streams. We show that our method approaches BP ANN baseline with sufficiently long spike-trains. Finally, we show that the well-performing softmax cross-entropy loss function can be implemented through inhibitory lateral connections enforcing a Winner Take All (WTA) rule. Our SNN with a 2-layer network shows excellent generalization through comparable performance to ANNs with equivalent architecture and regularization parameters on static image datasets like MNIST, Fashion-MNIST, Extended MNIST, and temporally encoded image datasets like Neuromorphic MNIST datasets. Thus, SSNN-BP enables BP compatible with purely spike-based neuromorphic hardware. △ Less

Submitted 24 August, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

arXiv:2111.02885 [pdf]

Stochasticity Invariance Control in Pr$_{1-x}$Ca$_x$MnO$_3$ RRAM to enable Large-Scale Stochastic Recurrent Neural Networks

Authors: Vivek Saraswat, Udayan Ganguly

Abstract: Emerging non-volatile memories have been proposed for a wide range of applications from easing the von-Neumann bottleneck to neuromorphic applications. Specifically, scalable RRAMs based on Pr$_{1-x}$Ca$_x$MnO$_3$ (PCMO) exhibit analog switching have been demonstrated as an integrating neuron, an analog synapse, and a voltage-controlled oscillator. More recently, the inherent stochasticity of memr… ▽ More Emerging non-volatile memories have been proposed for a wide range of applications from easing the von-Neumann bottleneck to neuromorphic applications. Specifically, scalable RRAMs based on Pr$_{1-x}$Ca$_x$MnO$_3$ (PCMO) exhibit analog switching have been demonstrated as an integrating neuron, an analog synapse, and a voltage-controlled oscillator. More recently, the inherent stochasticity of memristors has been proposed for efficient hardware implementations of Boltzmann Machines. However, as the problem size scales, the number of neurons increase and controlling the stochastic distribution tightly over many iterations is necessary. This requires parametric control over stochasticity. Here, we characterize the stochastic Set in PCMO RRAMs. We identify that the Set time distribution depends on the internal state of the device (i.e., resistance) in addition to external input (i.e., voltage pulse). This requires the confluence of contradictory properties like stochastic switching as well as deterministic state control in the same device. Unlike, "stochastic-everywhere" filamentary memristors, in PCMO RRAMs, we leverage the (i) stochastic Set in negative polarity and (ii) deterministic analog Reset in positive polarity to demonstrate 100x reduced Set time distribution drift. The impact on Boltzmann Machines' performance is analyzed and as opposed to the "fixed external input stochasticity", the "state-monitored stochasticity" can solve problems 20x larger in size. State monitoring also tunes out the device-to-device variability effect on distributions providing 10x better performance. In addition to the physical insights, this study establishes the use of experimental stochasticity in PCMO RRAMs in stochastic recurrent neural networks reliably over many iterations. △ Less

Submitted 2 November, 2021; originally announced November 2021.

arXiv:2109.00849 [pdf]

doi 10.1109/TED.2021.3131966

An Accurate Process Induced Variability Aware Compact Model-based Circuit Performance Estimation for Design-Technology Co-optimization

Authors: Shubham Patil, Amita Rawat, Udayan Ganguly

Abstract: In sub-10nm FinFETs, Line-edge-roughness (LER) and metal-gate granularity (MGG) are the two most dominant sources of variability and are mostly modeled semi-empirically. In this work, compact models of LER and MGG are used. We show an accurate process-induced variability (PIV) aware compact model-based circuit performance estimation for Design-Technology Co-optimization (DTCO). This work is carrie… ▽ More In sub-10nm FinFETs, Line-edge-roughness (LER) and metal-gate granularity (MGG) are the two most dominant sources of variability and are mostly modeled semi-empirically. In this work, compact models of LER and MGG are used. We show an accurate process-induced variability (PIV) aware compact model-based circuit performance estimation for Design-Technology Co-optimization (DTCO). This work is carried out using an experimentally validated BSIM-CMG model on a 7nm FinFET node. First, we have shown performance bench-marking of LER and MGG models with the state-of-the-art and shown {\textbackslash}4x({\textbackslash}2.3x) accuracy improvement for NMOS(PMOS) in the estimation of device figure of merits (DFoMs). Second, RO and SRAM circuits performance estimation is carried out for LER and MGG variability. Further, {\textbackslash}22\% more optimistic estimate of (σ/μ)\textsubscript{SHM} (Static Hold Margin) compared to the state-of-the-art model with V\textsubscript{DD} variation is shown. Finally, we demonstrate our improved DFoMs accuracy translated to more accurate circuits figure of merits (CFoMs) performance estimation. For worst-case SHM (3(σ/μ)\textsubscript{SHM}@VDD=0.75 V) compared to state-of-the-art, dynamic(standby) power reduction by {\textbackslash}73\%({\textbackslash}61\%) is shown. Thus, our enhanced variability model accuracy enables more credible DTCO with significantly better performance estimates. △ Less

Submitted 2 September, 2021; originally announced September 2021.

Comments: 7 pages, 14 figures

arXiv:2108.13389 [pdf]

doi 10.1088/1361-6641/ac24e8

Exploiting the Electrothermal Timescale in PrMnO3 RRAM for a compact, clock-less neuron exhibiting biological spiking patterns

Authors: Omkar Phadke, Jayatika Sakhuja, Vivek Saraswat, Udayan Ganguly

Abstract: Spiking Neural Networks (SNNs) are gaining widespread momentum in the field of neuromorphic computing. These network systems integrated with neurons and synapses provide computational efficiency by mimicking the human brain. It is desired to incorporate the biological neuronal dynamics, including complex spiking patterns which represent diverse brain activities within the neural networks. Earlier… ▽ More Spiking Neural Networks (SNNs) are gaining widespread momentum in the field of neuromorphic computing. These network systems integrated with neurons and synapses provide computational efficiency by mimicking the human brain. It is desired to incorporate the biological neuronal dynamics, including complex spiking patterns which represent diverse brain activities within the neural networks. Earlier hardware realization of neurons was (1) area intensive because of large capacitors in the circuit design, (2) neuronal spiking patterns were demonstrated with clocked neurons at the device level. To achieve more realistic biological neuron spiking behavior, emerging memristive devices are considered promising alternatives. In this paper, we propose, PrMnO3(PMO) -RRAM device-based neuron. The voltage-controlled electrothermal timescales of the compact PMO RRAM device replace the electrical timescales of charging a large capacitor. The electrothermal timescale is used to implement an integration block with multiple voltage-controlled timescales coupled with a refractory block to generate biological neuronal dynamics. Here, first, a Verilog-A implementation of the thermal device model is demonstrated, which captures the current-temperature dynamics of the PMO device. Second, a driving circuitry is designed to mimic different spiking patterns of cortical neurons, including Intrinsic bursting (IB) and Chattering (CH). Third, a neuron circuit model is simulated, which includes the PMO RRAM device model and the driving circuitry to demonstrate the asynchronous neuron behavior. Finally, a hardware-software hybrid analysis is done in which the PMO RRAM device is experimentally characterized to mimic neuron spiking dynamics. The work presents a realizable and more biologically comparable hardware-efficient solution for large-scale SNNs. △ Less

Submitted 30 August, 2021; originally announced August 2021.

arXiv:2106.16215 [pdf]

Algorithm For 3D-Chemotaxis Using Spiking Neural Network

Authors: Jayesh Choudhary, Vivek Saraswat, Udayan Ganguly

Abstract: In this work, we aim to devise an end-to-end spiking implementation for contour tracking in 3D media inspired by chemotaxis, where the worm reaches the region which has the given set concentration. For a planer medium, efficient contour tracking algorithms have already been devised, but a new degree of freedom has quite a few challenges. Here we devise an algorithm based on klinokinesis - where th… ▽ More In this work, we aim to devise an end-to-end spiking implementation for contour tracking in 3D media inspired by chemotaxis, where the worm reaches the region which has the given set concentration. For a planer medium, efficient contour tracking algorithms have already been devised, but a new degree of freedom has quite a few challenges. Here we devise an algorithm based on klinokinesis - where the motion of the worm is in response to the stimuli but not proportional to it. Thus the path followed is not the shortest, but we can track the set concentration successfully. We are using simple LIF neurons for the neural network implementation, considering the feasibility of its implementation in the neuromorphic computing hardware. △ Less

Submitted 30 June, 2021; originally announced June 2021.

Comments: 12 pages, 8 figures, accepted for the '30th International Conference on Artificial Neural Networks, ICANN2021'

arXiv:2106.15420 [pdf, other]

Spiking-GAN: A Spiking Generative Adversarial Network Using Time-To-First-Spike Coding

Authors: Vineet Kotariya, Udayan Ganguly

Abstract: Spiking Neural Networks (SNNs) have shown great potential in solving deep learning problems in an energy-efficient manner. However, they are still limited to simple classification tasks. In this paper, we propose Spiking-GAN, the first spike-based Generative Adversarial Network (GAN). It employs a kind of temporal coding scheme called time-to-first-spike coding. We train it using approximate backp… ▽ More Spiking Neural Networks (SNNs) have shown great potential in solving deep learning problems in an energy-efficient manner. However, they are still limited to simple classification tasks. In this paper, we propose Spiking-GAN, the first spike-based Generative Adversarial Network (GAN). It employs a kind of temporal coding scheme called time-to-first-spike coding. We train it using approximate backpropagation in the temporal domain. We use simple integrate-and-fire (IF) neurons with very high refractory period for our network which ensures a maximum of one spike per neuron. This makes the model much sparser than a spike rate-based system. Our modified temporal loss function called 'Aggressive TTFS' improves the inference time of the network by over 33% and reduces the number of spikes in the network by more than 11% compared to previous works. Our experiments show that on training the network on the MNIST dataset using this approach, we can generate high quality samples. Thereby demonstrating the potential of this framework for solving such problems in the spiking domain. △ Less

Submitted 29 June, 2021; originally announced June 2021.

arXiv:2105.01358 [pdf]

Simplified Klinokinesis using Spiking Neural Networks for Resource-Constrained Navigation on the Neuromorphic Processor Loihi

Authors: Apoorv Kishore, Vivek Saraswat, Udayan Ganguly

Abstract: C. elegans shows chemotaxis using klinokinesis where the worm senses the concentration based on a single concentration sensor to compute the concentration gradient to perform foraging through gradient ascent/descent towards the target concentration followed by contour tracking. The biomimetic implementation requires complex neurons with multiple ion channel dynamics as well as interneurons for con… ▽ More C. elegans shows chemotaxis using klinokinesis where the worm senses the concentration based on a single concentration sensor to compute the concentration gradient to perform foraging through gradient ascent/descent towards the target concentration followed by contour tracking. The biomimetic implementation requires complex neurons with multiple ion channel dynamics as well as interneurons for control. While this is a key capability of autonomous robots, its implementation on energy-efficient neuromorphic hardware like Intel's Loihi requires adaptation of the network to hardware-specific constraints, which has not been achieved. In this paper, we demonstrate the adaptation of chemotaxis based on klinokinesis to Loihi by implementing necessary neuronal dynamics with only LIF neurons as well as a complete spike-based implementation of all functions e.g. Heaviside function and subtractions. Our results show that Loihi implementation is equivalent to the software counterpart on Python in terms of performance - both during foraging and contour tracking. The Loihi results are also resilient in noisy environments. Thus, we demonstrate a successful adaptation of chemotaxis on Loihi - which can now be combined with the rich array of SNN blocks for SNN based complex robotic control. △ Less

Submitted 4 May, 2021; originally announced May 2021.

arXiv:2104.14264 [pdf]

Hardware-Friendly Synaptic Orders and Timescales in Liquid State Machines for Speech Classification

Authors: Vivek Saraswat, Ajinkya Gorad, Anand Naik, Aakash Patil, Udayan Ganguly

Abstract: Liquid State Machines are brain inspired spiking neural networks (SNNs) with random reservoir connectivity and bio-mimetic neuronal and synaptic models. Reservoir computing networks are proposed as an alternative to deep neural networks to solve temporal classification problems. Previous studies suggest 2nd order (double exponential) synaptic waveform to be crucial for achieving high accuracy for… ▽ More Liquid State Machines are brain inspired spiking neural networks (SNNs) with random reservoir connectivity and bio-mimetic neuronal and synaptic models. Reservoir computing networks are proposed as an alternative to deep neural networks to solve temporal classification problems. Previous studies suggest 2nd order (double exponential) synaptic waveform to be crucial for achieving high accuracy for TI-46 spoken digits recognition. The proposal of long-time range (ms) bio-mimetic synaptic waveforms is a challenge to compact and power efficient neuromorphic hardware. In this work, we analyze the role of synaptic orders namely: δ (high output for single time step), 0th (rectangular with a finite pulse width), 1st (exponential fall) and 2nd order (exponential rise and fall) and synaptic timescales on the reservoir output response and on the TI-46 spoken digits classification accuracy under a more comprehensive parameter sweep. We find the optimal operating point to be correlated to an optimal range of spiking activity in the reservoir. Further, the proposed 0th order synapses perform at par with the biologically plausible 2nd order synapses. This is substantial relaxation for circuit designers as synapses are the most abundant components in an in-memory implementation for SNNs. The circuit benefits for both analog and mixed-signal realizations of 0th order synapse are highlighted demonstrating 2-3 orders of savings in area and power consumptions by eliminating Op-Amps and Digital to Analog Converter circuits. This has major implications on a complete neural network implementation with focus on peripheral limitations and algorithmic simplifications to overcome them. △ Less

Submitted 29 April, 2021; originally announced April 2021.

arXiv:2011.11251 [pdf]

India's Rise in Nanoelectronics Research

Authors: Udayan Ganguly, Sandip Lashkare, Swaroop Ganguly

Abstract: Modern semiconductors innovation has a strong relation to scale and skill. While India has a significant demand for semiconductors, it has a daunting challenge to create a semiconductor ecosystem. Yet, India has quietly come a long way. Starting with Centers of Excellence in Nanoelectronics (CENs) initiated in 2006 and broad science and technology funding, India has transformed its nanoelectronics… ▽ More Modern semiconductors innovation has a strong relation to scale and skill. While India has a significant demand for semiconductors, it has a daunting challenge to create a semiconductor ecosystem. Yet, India has quietly come a long way. Starting with Centers of Excellence in Nanoelectronics (CENs) initiated in 2006 and broad science and technology funding, India has transformed its nanoelectronics research ecosystem. From negligible contributions as late as 2011, India has risen to be a top contributor to IEEE Electron Devices journals today. Our study presents important observations in terms of ecosystem development. First, there is a 6 year incubation time from infrastructure initiation to first papers. Then, 4 more years to become globally competitive. Second, growth in experimental research is essential along with modeling & simulations. Finally, the aspirational goals of translational research to contribute to the global technology roadmap requires cutting-edge manufacturing infrastructure & ecosystem access, which still needs development. The learning informs a call to action for the research ecosystem i.e. academia, industry, and policy-makers. First, sustain and amplify successful strategies of national research infrastructure & funding growth. Second, enhance international collaborations to add further scale & infrastructure to R&D. Finally, strengthen the industry-academia-policy consortium approach to transform to an innovation-based economy. Ultimately, the electron devices community is entering an exciting phase where Beyond Moore offers open opportunities in materials, devices to systems, and algorithms. India must build on its success to play a significant role in this new world of disruptive innovation. △ Less

Submitted 30 November, 2020; v1 submitted 23 November, 2020; originally announced November 2020.

Comments: 11 pages, 8 figures

arXiv:2008.00317 [pdf, other]

Adaptive Chemotaxis for improved Contour Tracking using Spiking Neural Networks

Authors: Shashwat Shukla, Rohan Pathak, Vivek Saraswat, Udayan Ganguly

Abstract: In this paper we present a Spiking Neural Network (SNN) for autonomous navigation, inspired by the chemotaxis network of the worm Caenorhabditis elegans. In particular, we focus on the problem of contour tracking, wherein the bot must reach and subsequently follow a desired concentration setpoint. Past schemes that used only klinokinesis can follow the contour efficiently but take excessive time t… ▽ More In this paper we present a Spiking Neural Network (SNN) for autonomous navigation, inspired by the chemotaxis network of the worm Caenorhabditis elegans. In particular, we focus on the problem of contour tracking, wherein the bot must reach and subsequently follow a desired concentration setpoint. Past schemes that used only klinokinesis can follow the contour efficiently but take excessive time to reach the setpoint. We address this shortcoming by proposing a novel adaptive klinotaxis mechanism that builds upon a previously proposed gradient climbing circuit. We demonstrate how our klinotaxis circuit can autonomously be configured to perform gradient ascent, gradient descent and subsequently be disabled to seamlessly integrate with the aforementioned klinokinesis circuit. We also incorporate speed regulation (orthokinesis) to further improve contour tracking performance. Thus for the first time, we present a model that successfully integrates klinokinesis, klinotaxis and orthokinesis. We demonstrate via contour tracking simulations that our proposed scheme achieves an 2.4x reduction in the time to reach the setpoint, along with a simultaneous 8.7x reduction in average deviation from the setpoint. △ Less

Submitted 1 August, 2020; originally announced August 2020.

arXiv:2006.04636 [pdf]

doi 10.1109/TED.2020.3034289

Investigating Transient Characteristics of Volatile Hysteresis and Self-Heating of PrMnO$_3$ based RRAM

Authors: J. Sakhuja, S. Lashkare, K. Jana, U. Ganguly

Abstract: PrMnO$_3$ (PMO) based RRAM shows selector-less behavior due to high-non-linearity. Recently, the non-linearity, along with volatile hysteresis, is demonstrated and utilized as a compact oscillator to enable highly scaled oscillatory neurons, which enable oscillatory neuromorphic systems found in the human cortex. Hence, it is vital to understand the physical mechanisms behind such a volatile hyste… ▽ More PrMnO$_3$ (PMO) based RRAM shows selector-less behavior due to high-non-linearity. Recently, the non-linearity, along with volatile hysteresis, is demonstrated and utilized as a compact oscillator to enable highly scaled oscillatory neurons, which enable oscillatory neuromorphic systems found in the human cortex. Hence, it is vital to understand the physical mechanisms behind such a volatile hysteretic behavior to provide useful insights in developing a device for various neuromorphic applications. In this paper, we present a comprehensive investigation of the transient characteristics and propose a physical mechanism to replicate the observations by simulations. First, we investigate the complex dynamics of the hysteresis with the voltage ramp rate. We observe that the voltage window initially increases and later decreases as the ramp rate is increased - while the current window reduces monotonically. Second, an analytical electrothermal model based on space charge limited current (SCLC) and Fourier Heat equation is proposed to model the co-dependent heat and current flow. Finally, we show that the interplay between the self-heating due to the current and the current dependence on the temperature is accurately modeled to reproduce the hysteresis dependence on the various voltage ramp rate. Such a detailed understanding of the device PMO RRAM volatile hysteresis may enable efficient device design required in neuromorphic computing applications. △ Less

Submitted 8 June, 2020; originally announced June 2020.

arXiv:2005.07398 [pdf]

doi 10.1109/TED.2020.3011387

Reaction-Drift Model for Switching Transients in Pr$_{0.7}$Ca$_{0.3}$MnO$_3$-Based Resistive RAM

Authors: Vivek Saraswat, Shankar Prasad, Abhishek Khanna, Ashwin Wagh, Ashwin Bhat, Neeraj Panwar, Sandip Lashkare, Udayan Ganguly

Abstract: Pr$_{0.7}$Ca$_{0.3}$MnO$_3$ (PCMO) based RRAM shows promising memory properties like non-volatility, low variability, multiple resistance states and scalability. From a modeling perspective, the charge carrier DC current modeling of PCMO RRAM by drift diffusion (DD) in the presence of fixed oxygen ion vacancy traps and self-heating (SH) in Technology Computer Aided Design (TCAD) (but without oxyge… ▽ More Pr$_{0.7}$Ca$_{0.3}$MnO$_3$ (PCMO) based RRAM shows promising memory properties like non-volatility, low variability, multiple resistance states and scalability. From a modeling perspective, the charge carrier DC current modeling of PCMO RRAM by drift diffusion (DD) in the presence of fixed oxygen ion vacancy traps and self-heating (SH) in Technology Computer Aided Design (TCAD) (but without oxygen ionic transport) was able to explain the experimentally observed space charge limited conduction (SCLC) characteristics, prior to resistive switching. Further, transient analysis using DD+SH model was able to reproduce the experimentally observed fast current increase at ~100 ns timescale, prior to resistive switching. However, a complete quantitative transient current transport plus resistive switching model requires the inclusion of ionic transport. We propose a Reaction-Drift (RD) model for oxygen ion vacancy related trap density variation, which is combined with the DD+SH model. Earlier we have shown that the Set transient consists of 3 stages and Reset transient consists of 4 stages experimentally. In this work, the DD+SH+RD model is able to reproduce the entire transient behavior over 10 ns - 1 s range in timescale for both the Set and Reset operations for different applied biases and ambient temperatures. Remarkably, a universal Reset experimental behavior, log(I) is proportional to (m X log(t)) where m~-1/10 is reproduced in simulations. This model is the first model for PCMO RRAMs to significantly reproduce transient Set/Reset behavior. This model establishes the presence of self-heating and ionic-drift limited resistive switching as primary physical phenomena in these RRAMs. △ Less

Submitted 15 May, 2020; originally announced May 2020.

arXiv:2004.11120 [pdf, other]

Software-Level Accuracy Using Stochastic Computing With Charge-Trap-Flash Based Weight Matrix

Authors: Varun Bhatt, Shalini Shrivastava, Tanmay Chavan, Udayan Ganguly

Abstract: The in-memory computing paradigm with emerging memory devices has been recently shown to be a promising way to accelerate deep learning. Resistive processing unit (RPU) has been proposed to enable the vector-vector outer product in a crossbar array using a stochastic train of identical pulses to enable one-shot weight update, promising intense speed-up in matrix multiplication operations, which fo… ▽ More The in-memory computing paradigm with emerging memory devices has been recently shown to be a promising way to accelerate deep learning. Resistive processing unit (RPU) has been proposed to enable the vector-vector outer product in a crossbar array using a stochastic train of identical pulses to enable one-shot weight update, promising intense speed-up in matrix multiplication operations, which form the bulk of training neural networks. However, the performance of the system suffers if the device does not satisfy the condition of linear conductance change over around 1,000 conductance levels. This is a challenge for nanoscale memories. Recently, Charge Trap Flash (CTF) memory was shown to have a large number of levels before saturation, but variable non-linearity. In this paper, we explore the trade-off between the range of conductance change and linearity. We show, through simulations, that at an optimum choice of the range, our system performs nearly as well as the models trained using exact floating point operations, with less than 1% reduction in the performance. Our system reaches an accuracy of 97.9% on MNIST dataset, 89.1% and 70.5% accuracy on CIFAR-10 and CIFAR-100 datasets (using pre-extracted features). We also show its use in reinforcement learning, where it is used for value function approximation in Q-Learning, and learns to complete an episode the mountain car control problem in around 146 steps. Benchmarked to state-of-the-art, the CTF based RPU shows best in class performance to enable software equivalent performance. △ Less

Submitted 8 March, 2020; originally announced April 2020.

Comments: 8 pages, 8 figures, submitted to the International Joint Conference on Neural Networks (IJCNN) 2020

arXiv:2003.00821 [pdf]

Understanding the Location of Resistance Change in the Pr0.7Ca0.3MnO3 RRAM

Authors: Sandip Lashkare, Udayan Ganguly

Abstract: Pr1-xCaxMnO3 (PCMO) based resistance random access memory (RRAM) is attractive in large scale memory and neuromorphic applications as it is non-filamentary, area scalable and has multiple resistance states along with excellent endurance and retention. The PCMO RRAM exhibit area scalable resistive switching when in contact with the reactive electrode. The interface redox reaction based resistance s… ▽ More Pr1-xCaxMnO3 (PCMO) based resistance random access memory (RRAM) is attractive in large scale memory and neuromorphic applications as it is non-filamentary, area scalable and has multiple resistance states along with excellent endurance and retention. The PCMO RRAM exhibit area scalable resistive switching when in contact with the reactive electrode. The interface redox reaction based resistance switching is observed electrically. Yet, whether resistance change occurs through partial (close to interface) or entire bulk is largely debated. Essentially, a two-terminal device is unable to provide direct evidence of the resistance change location in the PCMO RRAM. In this paper, we propose and experimentally demonstrate a novel three-terminal RRAM device in which a thin third terminal (~20nm) is inserted laterally in a typical vertical 2 terminal RRAM device of PCMO thickness of ~80nm. Using the 3T-RRAM method, we show that resistance change occurs largely at the upper bulk (near reactive electrode interface) - which is highly asymmetric. Yet it produces SCLC based resistance change with symmetric IV characteristics. It is the first time that an interface redox and bulk SCLC based resistance change has been experimentally shown as correlated and consistent - enabled by the 3rd terminal of the RRAM. Such a study enables a critical understanding of the device which enables the design and development of PCMO RRAM for memory and neuromorphic computing applications. △ Less

Submitted 14 April, 2020; v1 submitted 18 February, 2020; originally announced March 2020.

Comments: 6 pages, 11 figures

arXiv:2002.00703 [pdf]

Temperature Dependence of Volatile Current shoot-up in PrMnO3 based Selector-less RRAM

Authors: S. Lashkare, A. Bhat, U. Ganguly

Abstract: PrMnO3 (PMO) based Resistance Random Access Memory (RRAM) has recently been considered for selector-less RRAM and neuromorphic computing applications by utilizing its current shoot-up. This current shoot-up in the PMO device is attributed to the thermal runaway in the device. Hence, the understanding of the ambient temperature dependence on the current shoot-up of the PMO device is essential for t… ▽ More PrMnO3 (PMO) based Resistance Random Access Memory (RRAM) has recently been considered for selector-less RRAM and neuromorphic computing applications by utilizing its current shoot-up. This current shoot-up in the PMO device is attributed to the thermal runaway in the device. Hence, the understanding of the ambient temperature dependence on the current shoot-up of the PMO device is essential for the various applications that utilize the negative differential resistance (NDR). In this paper, we characterize the ambient thermal dependence of dc IV, accompanied by the development of analytical modeling. First, the temperature-dependent current-voltage characteristic and shift in the threshold voltage of the PMO device are shown experimentally. Second, a Joule heating based thermal feedback model coupled with current transport by space charge limited current (SCLC) is developed to explain the experimentally observed NDR region. Finally, the model successfully predicts device behavior over a range of experimental ambient temperatures. As an alternative to TCAD, such a compact and accurate dc model sets up a platform to enable understanding, design with device and systems-level simulations of memory and neuromorphic applications. △ Less

Submitted 12 December, 2019; originally announced February 2020.

Comments: 4 pages, 8 figures

arXiv:1911.05943 [pdf, other]

Structured Mean-field Variational Inference and Learning in Winner-take-all Spiking Neural Networks

Authors: Shashwat Shukla, Hideaki Shimazaki, Udayan Ganguly

Abstract: The Bayesian view of the brain hypothesizes that the brain constructs a generative model of the world, and uses it to make inferences via Bayes' rule. Although many types of approximate inference schemes have been proposed for hierarchical Bayesian models of the brain, the questions of how these distinct inference procedures can be realized by hierarchical networks of spiking neurons remains large… ▽ More The Bayesian view of the brain hypothesizes that the brain constructs a generative model of the world, and uses it to make inferences via Bayes' rule. Although many types of approximate inference schemes have been proposed for hierarchical Bayesian models of the brain, the questions of how these distinct inference procedures can be realized by hierarchical networks of spiking neurons remains largely unresolved. Based on a previously proposed multi-compartment neuron model in which dendrites perform logarithmic compression, and stochastic spiking winner-take-all (WTA) circuits in which firing probability of each neuron is normalized by activities of other neurons, here we construct Spiking Neural Networks that perform \emph{structured} mean-field variational inference and learning, on hierarchical directed probabilistic graphical models with discrete random variables. In these models, we do away with symmetric synaptic weights previously assumed for \emph{unstructured} mean-field variational inference by learning both the feedback and feedforward weights separately. The resulting online learning rules take the form of an error-modulated local Spike-Timing-Dependent Plasticity rule. Importantly, we consider two types of WTA circuits in which only one neuron is allowed to fire at a time (hard WTA) or neurons can fire independently (soft WTA), which makes neurons in these circuits operate in regimes of temporal and rate coding respectively. We show how the hard WTA circuits can be used to perform Gibbs sampling whereas the soft WTA circuits can be used to implement a message passing algorithm that computes the marginals approximately. Notably, a simple change in the amount of lateral inhibition realizes switching between the hard and soft WTA spiking regimes. Hence the proposed network provides a unified view of the two previously disparate modes of inference and coding by spiking neurons. △ Less

Submitted 14 November, 2019; originally announced November 2019.

arXiv:1902.09726 [pdf]

doi 10.1109/TED.2020.2985167

Band-to-Band Tunneling based Ultra-Energy Efficient Silicon Neuron

Authors: Tanmay Chavan, Sangya Dutta, Nihar R. Mohapatra, Udayan Ganguly

Abstract: The human brain comprises about a hundred billion neurons connected through quadrillion synapses. Spiking Neural Networks (SNNs) take inspiration from the brain to model complex cognitive and learning tasks. Neuromorphic engineering implements SNNs in hardware, aspiring to mimic the brain at scale (i.e., 100 billion neurons) with biological area and energy efficiency. The design of ultra-energy ef… ▽ More The human brain comprises about a hundred billion neurons connected through quadrillion synapses. Spiking Neural Networks (SNNs) take inspiration from the brain to model complex cognitive and learning tasks. Neuromorphic engineering implements SNNs in hardware, aspiring to mimic the brain at scale (i.e., 100 billion neurons) with biological area and energy efficiency. The design of ultra-energy efficient and compact neurons is essential for the large-scale implementation of SNNs in hardware. In this work, we have experimentally demonstrated a Partially Depleted (PD) Silicon-On-Insulator (SOI) MOSFET based Leaky-Integrate & Fire (LIF) neuron where energy-and area-efficiency is enabled by two elements of design - first tunneling based operation and second compact sub-threshold SOI control circuit design. Band-to-Band Tunneling (BTBT) induced hole storage in the body is used for the "Integrate" function of the neuron. A compact control circuit "Fires" a spike when the body potential exceeds the firing threshold. The neuron then "Resets" by removing the stored holes from the body contact of the device. Additionally, the control circuit provides "Leakiness" in the neuron which is an essential property of biological neurons. The proposed neuron provides 10x higher area efficiency compared to CMOS design with equivalent energy/spike. Alternatively, it has 10^4x higher energy efficiency at area-equivalent neuron technologies. Biologically comparable energy- and area-efficiency along with CMOS compatibility make the proposed device attractive for large-scale hardware implementation of SNNs. △ Less

Submitted 25 February, 2019; originally announced February 2019.

arXiv:1902.09417 [pdf]

Ultra-low Energy charge trap flash based synapse enabled by parasitic leakage mitigation

Authors: Shalini Shrivastava, Tanmay Chavan, Udayan Ganguly

Abstract: Brain-inspired computation promises complex cognitive tasks at biological energy efficiencies. The brain contains $10^4$ synapses per neuron. Hence, ultra-low energy, high-density synapses are needed for spiking neural networks (SNN). In this paper, we use tunneling enabled CTF (Charge Trap Flash) stack for ultra-low-energy operation (1F); Further, CTF on an SOI platform and back-to-back connected… ▽ More Brain-inspired computation promises complex cognitive tasks at biological energy efficiencies. The brain contains $10^4$ synapses per neuron. Hence, ultra-low energy, high-density synapses are needed for spiking neural networks (SNN). In this paper, we use tunneling enabled CTF (Charge Trap Flash) stack for ultra-low-energy operation (1F); Further, CTF on an SOI platform and back-to-back connected pn diode and Zener diode (2D) prevent parasitic leakage to preserve energy advantage in array operation. A bulk $100 μm $ x $100 μm$ CTF operation offers tunable, gradual conductance change $(ΔG) i.e. 10^4 $levels, which gives $100$x improvement over literature. SPICE simulations of 1F2D synapse shows ultra-low energy $(\leqslant 3 fJ/pulse)$ at 180 nm node for long-term potentiation (LTP) and depression (LTD), at 180nm node for long-term potentiation (LTP) and depression (LTD), which is comparable to energy estimate in biological synapses (10 fJ). A record low learning rate (i.e., maximum $ΔG< 1%$ of G-range) is observed - which is tunable. Excellent reliability ($>10^6 endurance cycles at full conductance swing) is observed. Such a highly energy efficient synapse with tunable learning rate on the CMOS platform is a key enabler for the human-brain-scale systems. Keywords: Spiking Neural Network; Charge trap flash, SONAS, Fowler-Nordheim Tunneling, Synapse △ Less

Submitted 25 February, 2019; v1 submitted 25 February, 2019; originally announced February 2019.

arXiv:1901.06240 [pdf, other]

Predicting Performance using Approximate State Space Model for Liquid State Machines

Authors: Ajinkya Gorad, Vivek Saraswat, Udayan Ganguly

Abstract: Liquid State Machine (LSM) is a brain-inspired architecture used for solving problems like speech recognition and time series prediction. LSM comprises of a randomly connected recurrent network of spiking neurons. This network propagates the non-linear neuronal and synaptic dynamics. Maass et al. have argued that the non-linear dynamics of LSMs is essential for its performance as a universal compu… ▽ More Liquid State Machine (LSM) is a brain-inspired architecture used for solving problems like speech recognition and time series prediction. LSM comprises of a randomly connected recurrent network of spiking neurons. This network propagates the non-linear neuronal and synaptic dynamics. Maass et al. have argued that the non-linear dynamics of LSMs is essential for its performance as a universal computer. Lyapunov exponent (mu), used to characterize the "non-linearity" of the network, correlates well with LSM performance. We propose a complementary approach of approximating the LSM dynamics with a linear state space representation. The spike rates from this model are well correlated to the spike rates from LSM. Such equivalence allows the extraction of a "memory" metric (tau_M) from the state transition matrix. tau_M displays high correlation with performance. Further, high tau_M system require lesser epochs to achieve a given accuracy. Being computationally cheap (1800x time efficient compared to LSM), the tau_M metric enables exploration of the vast parameter design space. We observe that the performance correlation of the tau_M surpasses the Lyapunov exponent (mu), (2-4x improvement) in the high-performance regime over multiple datasets. In fact, while mu increases monotonically with network activity, the performance reaches a maxima at a specific activity described in literature as the "edge of chaos". On the other hand, tau_M remains correlated with LSM performance even as mu increases monotonically. Hence, tau_M captures the useful memory of network activity that enables LSM performance. It also enables rapid design space exploration and fine-tuning of LSM parameters for high performance. △ Less

Submitted 18 January, 2019; originally announced January 2019.

Comments: Submitted to IJCNN 2019

arXiv:1805.07053 [pdf]

doi 10.1109/TED.2018.2846360

Transient Phenomena in Sub-Band Gap Impact Ionization in Si NIPIN Diode

Authors: Bhaskar Das, J. Schulze, Udayan Ganguly

Abstract: Sub-band-gap (SBG) impact ionization (II) enables steep subthreshold slope that enables devices to overcome the thermal limit of 60mV/decade. This phenomenon at low voltage enables various applications in logic, memory and neuromorphic engineering. Recently, we have demonstrated sub-0.2V II in NIPIN diode experimentally primarily based on the steady-state analysis. In this paper, we present the de… ▽ More Sub-band-gap (SBG) impact ionization (II) enables steep subthreshold slope that enables devices to overcome the thermal limit of 60mV/decade. This phenomenon at low voltage enables various applications in logic, memory and neuromorphic engineering. Recently, we have demonstrated sub-0.2V II in NIPIN diode experimentally primarily based on the steady-state analysis. In this paper, we present the detailed experimental transient behavior of SBG-II in NIPIN. The SBG-II generated holes are stored in the p-well. First, we extract the leakage mechanism from the p-well to show two mechanisms (i) recombination-generation (RG) and (ii) over the barrier (OTB) where OTB dominates when barrier height $ phy_b<0.59eV $. Second, we analytically extract the SBG II current (Iii) at 300K from experimental results. The drain current (Id), the electric field (E-field), and Iii are plotted in time. We observe that Iii increase as E-field reduces which indicates that E-field does not primarily contribute to Iii. Further, the Id shows two distinct behaviors (i) Iii (Id) is constant at the beginning and (ii) eventually universal Iii (Id) is linear, i.e. Iii=k X Id where $ k=10^-3$; We also show that the electrons primarily contributing to Id are directly incapable of II due to insufficient energy $(<Eg)$. Fischetti's model showed that SBG-II is primarily caused by hot electrons that accept energy in an Auger-like process from cold drain electrons to enable SBG-II. We speculate that if the Id electrons heat-up the cold drain electrons, which would further energize the hot electrons to produce the observed Iii (Id) universal dependence. △ Less

Submitted 18 May, 2018; originally announced May 2018.

Comments: 6 pages, 9 figures, journal IEEE TED

arXiv:1803.04773 [pdf]

A case for multiple and parallel RRAMs as synaptic model for training SNNs

Authors: Aditya Shukla, Sidharth Prasad, Sandip Lashkare, Udayan Ganguly

Abstract: To enable a dense integration of model synapses in a spiking neural networks hardware, various nano-scale devices are being considered. Such a device, besides exhibiting spike-time dependent plasticity (STDP), needs to be highly scalable, have a large endurance and require low energy for transitioning between states. In this work, we first introduce and empirically determine two new specifications… ▽ More To enable a dense integration of model synapses in a spiking neural networks hardware, various nano-scale devices are being considered. Such a device, besides exhibiting spike-time dependent plasticity (STDP), needs to be highly scalable, have a large endurance and require low energy for transitioning between states. In this work, we first introduce and empirically determine two new specifications for an synapse in SNNs: number of conductance levels per synapse and maximum learning-rate. To the best of our knowledge, there are no RRAMs that meet the latter specification. As a solution, we propose the use of multiple PCMO-RRAMs in parallel within a synapse. While synaptic reading, all PCMO-RRAMs are simultaneously read and for each synaptic conductance-change event, the mechanism for conductance STDP is initiated for only one RRAM, randomly picked from the set. Second, to validate our solution, we experimentally demonstrate STDP of conductance of a PCMO-RRAM and then show that due to a large learning-rate, a single PCMO-RRAM fails to model a synapse in the training of an SNN. As anticipated, network training improves as more PCMO-RRAMs are added to the synapse. Fourth, we discuss the circuit-requirements for implementing such a scheme, to conclude that the requirements are within bounds. Thus, our work presents specifications for synaptic devices in trainable SNNs, indicates the shortcomings of state-of-art synaptic contenders, and provides a solution to extrinsically meet the specifications and discusses the peripheral circuitry that implements the solution. △ Less

Submitted 13 March, 2018; originally announced March 2018.

Comments: 8 pages, 18 figures and 1 table

arXiv:1801.00935 [pdf, other]

doi 10.1016/j.sse.2018.05.007

Analytical Modeling of Metal Gate Granularity based Threshold Voltage Variability in NWFET

Authors: P Harsha Vardhan, Sushant Mittal, Swaroop Ganguly, Udayan Ganguly

Abstract: Estimation of threshold voltage V T variability for NWFETs has been compu- tationally expensive due to lack of analytical models. Variability estimation of NWFET is essential to design the next generation logic circuits. Compared to any other process induced variabilities, Metal Gate Granularity (MGG) is of paramount importance due to its large impact on V T variability. Here, an analytical model… ▽ More Estimation of threshold voltage V T variability for NWFETs has been compu- tationally expensive due to lack of analytical models. Variability estimation of NWFET is essential to design the next generation logic circuits. Compared to any other process induced variabilities, Metal Gate Granularity (MGG) is of paramount importance due to its large impact on V T variability. Here, an analytical model is proposed to estimate V T variability caused by MGG. We extend our earlier FinFET based MGG model to a cylindrical NWFET by sat- isfying three additional requirements. First, the gate dielectric layer is replaced by Silicon of electro-statically equivalent thickness using long cylinder approxi- mation; Second, metal grains in NWFETs satisfy periodic boundary condition in azimuthal direction; Third, electrostatics is analytically solved in cylindri- cal polar coordinates with gate boundary condition defined by MGG. We show that quantum effects only shift the mean of the V T distribution without sig- nificant impact on the variability estimated by our electrostatics-based model. The V T distribution estimated by our model matches TCAD simulations. The model quantitatively captures grain size dependence with σ(V T ) with excellent accuracy (6%error) compared to stochastic 3D TCAD simulations, which is a significant improvement over the state-of- the-art model with fails to produce even a qualitative agreement. The proposed model is 63 times faster compared to commercial TCAD simulations. △ Less

Submitted 3 January, 2018; originally announced January 2018.

arXiv:1709.02699 [pdf]

doi 10.1109/TBCAS.2018.2831618

An On-chip Trainable and Clock-less Spiking Neural Network with 1R Memristive Synapses

Authors: Aditya Shukla, Udayan Ganguly

Abstract: Spiking neural networks (SNNs) are being explored in an attempt to mimic brain's capability to learn and recognize at low power. Crossbar architecture with highly scalable Resistive RAM or RRAM array serving as synaptic weights and neuronal drivers in the periphery is an attractive option for SNN. Recognition (akin to reading the synaptic weight) requires small amplitude bias applied across the RR… ▽ More Spiking neural networks (SNNs) are being explored in an attempt to mimic brain's capability to learn and recognize at low power. Crossbar architecture with highly scalable Resistive RAM or RRAM array serving as synaptic weights and neuronal drivers in the periphery is an attractive option for SNN. Recognition (akin to reading the synaptic weight) requires small amplitude bias applied across the RRAM to minimize conductance change. Learning (akin to writing or updating the synaptic weight) requires large amplitude bias pulses to produce a conductance change. The contradictory bias amplitude requirement to perform reading and writing simultaneously and asynchronously, akin to biology, is a major challenge. Solutions suggested in the literature rely on time-division-multiplexing of read and write operations based on clocks, or approximations ignoring the reading when coincidental with writing. In this work, we overcome this challenge and present a clock-less approach wherein reading and writing are performed in different frequency domains. This enables learning and recognition simultaneously on an SNN. We validate our scheme in SPICE circuit simulator by translating a two-layered feed-forward Iris classifying SNN to demonstrate software-equivalent performance. The system performance is not adversely affected by a voltage dependence of conductance in realistic RRAMs, despite departing from linearity. Overall, our approach enables direct implementation of biological SNN algorithms in hardware. △ Less

Submitted 3 November, 2017; v1 submitted 8 September, 2017; originally announced September 2017.

arXiv:1704.02012 [pdf]

doi 10.1109/IJCNN.2017.7966447

A Software-equivalent SNN Hardware using RRAM-array for Asynchronous Real-time Learning

Authors: Aditya Shukla, Vinay Kumar, Udayan Ganguly

Abstract: Spiking Neural Network (SNN) naturally inspires hardware implementation as it is based on biology. For learning, spike time dependent plasticity (STDP) may be implemented using an energy efficient waveform superposition on memristor based synapse. However, system level implementation has three challenges. First, a classic dilemma is that recognition requires current reading for short voltage$-$spi… ▽ More Spiking Neural Network (SNN) naturally inspires hardware implementation as it is based on biology. For learning, spike time dependent plasticity (STDP) may be implemented using an energy efficient waveform superposition on memristor based synapse. However, system level implementation has three challenges. First, a classic dilemma is that recognition requires current reading for short voltage$-$spikes which is disturbed by large voltage$-$waveforms that are simultaneously applied on the same memristor for real$-$time learning i.e. the simultaneous read$-$write dilemma. Second, the hardware needs to exactly replicate software implementation for easy adaptation of algorithm to hardware. Third, the devices used in hardware simulations must be realistic. In this paper, we present an approach to address the above concerns. First, the learning and recognition occurs in separate arrays simultaneously in real$-$time, asynchronously $-$ avoiding non$-$biomimetic clocking based complex signal management. Second, we show that the hardware emulates software at every stage by comparison of SPICE (circuit$-$simulator) with MATLAB (mathematical SNN algorithm implementation in software) implementations. As an example, the hardware shows 97.5 per cent accuracy in classification which is equivalent to software for a Fisher$-$Iris dataset. Third, the STDP is implemented using a model of synaptic device implemented using HfO2 memristor. We show that an increasingly realistic memristor model slightly reduces the hardware performance (85 per cent), which highlights the need to engineer RRAM characteristics specifically for SNN. △ Less

Submitted 6 April, 2017; originally announced April 2017.

Comments: Eight pages, ten figures and two tables

arXiv:1612.05293 [pdf]

Reaction-Drift Model for Switching Transients in Pr$_{0.7}$Ca$_{0.3}$MnO$_3$ Based Resistive RAM

Authors: A. Khanna, S. Prasad, N. Panwar, U. Ganguly

Abstract: Earlier, the DC hole-current modeling of PCMO RRAM by drift-diffusion (DD) including self-heating (SH) in TCAD (but without ionic transport) was able to explain the experimentally observed SCLC characteristics, prior to resistive switching. Further, transient analysis using DD+SH model was able to reproduce the experimentally observed fast current increase at ~100ns timescale followed by saturatio… ▽ More Earlier, the DC hole-current modeling of PCMO RRAM by drift-diffusion (DD) including self-heating (SH) in TCAD (but without ionic transport) was able to explain the experimentally observed SCLC characteristics, prior to resistive switching. Further, transient analysis using DD+SH model was able to reproduce the experimentally observed fast current increase at ~100ns timescale followed by saturation increases, prior to resistive switching. However, resistive switching requires the inclusion of ionic transport. We propose a Reaction-Drift (RD) model of oxide ions, which is combined with the DD+SH model. Experimentally, SET operations consist of 3 stages and RESET operations consists of 4 stages. The DD+SH+RD model is able to reproduce the entire transient behavior over 10$^{-8}$-1s range in timescale for both SET and RESET operations for a range of bias, temperature. Remarkably, a universal RESET behaviour of $log(I)\propto m*log(t)$, where $m\approx -1/10$, is reproduced. The quantitatively different voltage time dilemma for SET and RESET is also replicated for a range of ambient temperature. This demonstrates a comprehensive model for resistance switching in PCMO based RRAM. △ Less

Submitted 6 August, 2017; v1 submitted 15 December, 2016; originally announced December 2016.

Comments: 6 pages, 9 figures

arXiv:1612.02233 [pdf]

A simple and efficient SNN and its performance & robustness evaluation method to enable hardware implementation

Authors: Anmol Biswas, Sidharth Prasad, Sandip Lashkare, Udayan Ganguly

Abstract: Spiking Neural Networks (SNN) are more closely related to brain-like computation and inspire hardware implementation. This is enabled by small networks that give high performance on standard classification problems. In literature, typical SNNs are deep and complex in terms of network structure, weight update rules and learning algorithms. This makes it difficult to translate them into hardware. In… ▽ More Spiking Neural Networks (SNN) are more closely related to brain-like computation and inspire hardware implementation. This is enabled by small networks that give high performance on standard classification problems. In literature, typical SNNs are deep and complex in terms of network structure, weight update rules and learning algorithms. This makes it difficult to translate them into hardware. In this paper, we first develop a simple 2-layered network in software which compares with the state of the art on four different standard data-sets within SNNs and has improved efficiency. For example, it uses lower number of neurons (3 x), synapses (3.5 x) and epochs for training (30 x) for the Fisher Iris classification problem. The efficient network is based on effective population coding and synapse-neuron co-design. Second, we develop a computationally efficient (15000 x) and accurate (correlation of 0.98) method to evaluate the performance of the network without standard recognition tests. Third, we show that the method produces a robustness metric that can be used to evaluate noise tolerance. △ Less

Submitted 7 December, 2016; originally announced December 2016.

Comments: 9 page conference paper submitted at IJCNN 2017

arXiv:1605.08755 [pdf]

Space Charge Limited Current with Self-heating in Pr$_{0.7}$Ca$_{0.3}$MnO$_3$ based RRAM

Authors: I. Chakraborty, N. Panwar, A. Khanna, U. Ganguly

Abstract: Space Charge Limited Current (SCLC) based conduction has been identified for PCMO-based RRAM devices based on the observation that $I \propto V^α$ where $α\approx 2$. A critical feature of the IV characteristics is a sharp rise in current ($α\gg 2$) which has been widely attributed to trap-filled limit (TFL) followed by an apparent trap-free SCLC conduction. In this paper, we show by TCAD analysis… ▽ More Space Charge Limited Current (SCLC) based conduction has been identified for PCMO-based RRAM devices based on the observation that $I \propto V^α$ where $α\approx 2$. A critical feature of the IV characteristics is a sharp rise in current ($α\gg 2$) which has been widely attributed to trap-filled limit (TFL) followed by an apparent trap-free SCLC conduction. In this paper, we show by TCAD analysis that trap-filled limit (TFL) is insufficient to explain the sharp current rise ($α\gg 2$). As an alternative, we propose a shallow trap SCLC model with selfheating effect based thermal runaway to explain the sharp current rise followed by a series resistance dominated regime. Experimental results over a range of 25°C-125°C demonstrate all 4 regimes (i) Ohmic ($α= 1$), (ii) shallow trap SCLC ($α\approx 2$), (iii) current shoot up ($α\gg 2$) and (iv) series resistance ($α= 1$). Further, TCAD simulations with thermal modeling are able to match the experimental IV characteristics in all the regimes. Thus, a current conduction mechanism in PCMO-based RRAM supported by detailed TCAD model is presented. Such a model is essential for further quantitative understanding and design for PCMO-based RRAM. △ Less

Submitted 27 May, 2016; originally announced May 2016.

Comments: 9 pages, 5 figures, Submitted to Applied Physics Letters

arXiv:1604.04454 [pdf]

Enhanced Circuit Densities in Epitaxially Defined FinFETs (EDFinFETs) over FinFETs

Authors: Sushant Mittal, Aneesh Nainani, M. C. Abraham, Saurabh Lodha, Udayan Ganguly

Abstract: FinFET technology is prone to suffer from Line Edge Roughness (LER) based VT variation with scaling. To address this, we proposed an Epitaxially Defined (ED) FinFET (EDFinFET) as an alternate to FinFET architecture for 10 nm node and beyond. We showed by statistical simulations that EDFinFET reduces LER based VT variability by 90% and overall variability by 59%. However, EDFinFET consists of wider… ▽ More FinFET technology is prone to suffer from Line Edge Roughness (LER) based VT variation with scaling. To address this, we proposed an Epitaxially Defined (ED) FinFET (EDFinFET) as an alternate to FinFET architecture for 10 nm node and beyond. We showed by statistical simulations that EDFinFET reduces LER based VT variability by 90% and overall variability by 59%. However, EDFinFET consists of wider fins as the fin widths are not constrained by electrostatics and variability (cf. FinFETs have fin width ~ LG/3 where LG is gate-length). This indicates that EDFinFET based circuits may be less dense. In this study we show that wide fins enable taller fin heights. The ability to engineer multiple STI levels on tall fins enables different transistor widths (i.e. various W/Ls e.g. 1-10) in a single fin. This capability ensures that even though individual EDFinFET devices have ~2x larger footprints than FinFETs, EDFinFET may produce equal or higher circuit density for basic building blocks like inverters or NAND gates for W/Ls of 2 and higher. △ Less

Submitted 15 April, 2016; originally announced April 2016.

Showing 1–38 of 38 results for author: Ganguly, U