subscribe to arXiv mailings

Dependence of the kinetic energy absorption capacity of bistable mechanical metamaterials on impactor mass and velocity

Authors: Ryan Fancher, Ian Frankel, Kyle Chin, Maroun Abi Ghanem, Brianna MacNider, Logan S. Shannahan, James F. Berry, Muge Fermen-Coker, Andrew J. Boydston, Nicholas Boechler

Abstract: Using an alternative mechanism to dissipation or scattering, bistable structures and mechanical metamaterials have shown promise for mitigating the detrimental effects of impact by reversibly locking energy into strained material. Herein, we extend prior works on impact absorption via bistable metamaterials to computationally explore the dependence of kinetic energy transmission on the velocity an… ▽ More Using an alternative mechanism to dissipation or scattering, bistable structures and mechanical metamaterials have shown promise for mitigating the detrimental effects of impact by reversibly locking energy into strained material. Herein, we extend prior works on impact absorption via bistable metamaterials to computationally explore the dependence of kinetic energy transmission on the velocity and mass of the impactor, with strain rates exceeding $10^2$ s$^{-1}$. We observe a large dependence on both impactor parameters, ranging from significantly better to worse performance than a comparative linear material. We then correlate the variability in performance to solitary wave formation in the system and give analytical estimates of idealized energy absorption capacity under dynamic loading. In addition, we find a significant dependence on damping accompanied by a qualitative difference in solitary wave propagation within the system. The complex dynamics revealed in this study offer potential future guidance for the application of bistable metamaterials to applications including human and engineered system shock and impact protection devices. △ Less

Submitted 22 January, 2023; originally announced January 2023.

arXiv:2212.10122 [pdf, other]

doi 10.1021/acs.nanolett.2c04936

Taming Friedrich-Wintgen interference in resonant metasurface: vortex laser emitting at on-demand tilted-angle

Authors: Raphael Mermet-Lyaudoz, Clémentine Symond, Florian Berry, Emmanuel Drouard, Céline Chevalier, Gaëlle Trippé-Allard, Emmanuelle Deleporte, Joel Bellessa, Christian Seassal, Hai Son Nguyen

Abstract: Friedrich-Wintgen (FW) interference is an atypical coupling mechanism that grants loss exchange between leaky resonances in non-Hermitian classical and quantum systems. Intriguingly, such an mechanism makes it possible for destructive interference scenario in which a radiating wave becomes a bound state in the continuum (BIC) by giving away all of its losses. Here we propose and demonstrate experi… ▽ More Friedrich-Wintgen (FW) interference is an atypical coupling mechanism that grants loss exchange between leaky resonances in non-Hermitian classical and quantum systems. Intriguingly, such an mechanism makes it possible for destructive interference scenario in which a radiating wave becomes a bound state in the continuum (BIC) by giving away all of its losses. Here we propose and demonstrate experimentally an original concept to tailor FW-BICs as polarization singularity at on-demand wavevectors in optical metasurface. As a proof-of-concept, using hybrid organic-inorganic halide perovskite as active material, we empower this novel polarization singularity to obtain lasing emission exhibiting both highly directional emission at oblique angles and polarization vortex in momentum space. Our results pave the way to steerable coherent emission with tailored polarization pattern for applications in optical communication/manipulation in free-space, high-resolution imaging /focusing and data storage. △ Less

Submitted 20 December, 2022; originally announced December 2022.

arXiv:2203.13891 [pdf, other]

Light Management in Perovskite Photovoltaic Solar Cells: a perspective

Authors: Florian Berry, Raphaël Mermet-Lyaudoz, Jose Maria Cuevas Davila, Djihad Amina Djemmah, Hai Son Nguyen, Christian Seassal, Erwann Fourmond, Céline Chevalier, Mohamed Amara, Emmanuel Drouard

Abstract: Light Management (LM) is essential for metal-halide perovskite solar cells in their race for record performance. In this review, criteria on materials, processes and photonic engineering are established such as to enhance mainly the short circuit current density, towards high energy yields. These criteria are used to analyse a large panel of solutions envisaged in the literature for single junctio… ▽ More Light Management (LM) is essential for metal-halide perovskite solar cells in their race for record performance. In this review, criteria on materials, processes and photonic engineering are established such as to enhance mainly the short circuit current density, towards high energy yields. These criteria are used to analyse a large panel of solutions envisaged in the literature for single junction cells. Moreover, a perspective based on rigorous electromagnetic simulations performed on various comparable structures is proposed in order to clarify the conclusions, and to pave the way to further performance enhancement in the case of all-perovskite, two-terminal tandem cells. △ Less

Submitted 25 March, 2022; originally announced March 2022.

arXiv:2102.01343 [pdf]

Why is FPGA-GPU Heterogeneity the Best Option for Embedded Deep Neural Networks?

Authors: Walther Carballo-Hernández, Maxime Pelcat, François Berry

Abstract: Graphics Processing Units (GPUs) are currently the dominating programmable architecture for Deep Learning (DL) accelerators. The adoption of Field Programmable Gate Arrays (FPGAs) in DL accelerators is however getting momentum. In this paper, we demonstrate that Direct Hardware Mapping (DHM) of a Convolutional Neural Network (CNN) on an embedded FPGA substantially outperforms a GPU implementation… ▽ More Graphics Processing Units (GPUs) are currently the dominating programmable architecture for Deep Learning (DL) accelerators. The adoption of Field Programmable Gate Arrays (FPGAs) in DL accelerators is however getting momentum. In this paper, we demonstrate that Direct Hardware Mapping (DHM) of a Convolutional Neural Network (CNN) on an embedded FPGA substantially outperforms a GPU implementation in terms of energy efficiency and execution time. However, DHM is highly resource intensive and cannot fully substitute the GPU when implementing a state-of-the-art CNN. We thus propose a hybrid FPGA-GPU DL acceleration method and demonstrate that heterogeneous acceleration outperforms GPU acceleration even including communication overheads. Experimental results are conducted on a heterogeneous multi-platform setup embedding an Nvidia(R) Jetson TX2 CPU-GPU board and an Intel(R) Cyclone10GX FPGA board. The SqueezeNet, MobileNetv2, and ShuffleNetv2 mobile-oriented CNNs are experimented. We show that heterogeneous FPG-AGPU acceleration outperforms GPU acceleration for classification inference task over MobileNetv2 (12%-30% energy reduction, 4% to 26% latency reduction), SqueezeNet (21%-28% energy reduction, same latency), and ShuffleNetv2 (25% energy reduction, 21% latency reduction). △ Less

Submitted 2 February, 2021; originally announced February 2021.

Comments: Presented at DATE Friday Workshop on System-level Design Methods for Deep Learning on Heterogeneous Architectures (SLOHA 2021) (arXiv:2102.00818)

Report number: SLOHA/2021/04

arXiv:2009.05749 [pdf]

Realistic estimate of the Covid-19 incidence and mortality rate in France

Authors: Guillaume Malpuech, Anne Tournadre, Francois Berry, Laurent Gerbaud

Abstract: Large scale virological testing of SARS-Cov2 is implemented since May 2020 in France. We assume that the positivity of asymptomatic people not being contact cases (ANBC) is representative of the positivity of the whole French population, which allows estimating the real incidence. We estimate, using Santé Public France reports, that the incidence at the beginning of August was about 0.8% and rose… ▽ More Large scale virological testing of SARS-Cov2 is implemented since May 2020 in France. We assume that the positivity of asymptomatic people not being contact cases (ANBC) is representative of the positivity of the whole French population, which allows estimating the real incidence. We estimate, using Santé Public France reports, that the incidence at the beginning of August was about 0.8% and rose to about 2.4% of the total population at the beginning of September. This corresponds to about 1.6 million people simultaneously infected and 230.000 new infections each day. These evaluations allow to deduce that intensive care units (ICU) admission rate and infection fatality rate (IFR) dropped by one order of magnitude since March, and are currently 0.036% and 0.027% respectively. Basic simulations of the outbreak evolution based on the hypothesis of negligible reinfection probability are performed for France, Ile de France, Puy de Dome, Bouches du Rhone and Grand Est. These simulations are using a reproduction rate (R) constant over time ranging from 1.3 to 1.45 (1.15 in Grand Est). They use an estimation of the number of infection which occurred during the first wave. These simulations also use the estimated incidence, which by reducing the susceptible population weeks after weeks induces a saturation of the otherwise exponential growth of the outbreak. An incidence peak of 3.5 % is expected at week 39 for France. The calculated total number of ICU admission and deaths during the second wave are 9000 and 7000 respectively for R=1.3. The cumulative incidence over the two waves is computed close to 60% for R=1.3 and 70% for R=1.4. This suggests that if individual immunity exists, herd immunity is likely to be achieved in France by the end of October 2020. We conclude that Covid-19 is much more spread than previously thought, but its severity became limited since the end of the first wave. △ Less

Submitted 2 November, 2020; v1 submitted 12 September, 2020; originally announced September 2020.

Comments: 21 pages, 7 figures

arXiv:2009.03993 [pdf]

doi 10.1016/j.optlaseng.2020.106308

When Deep Learning Meets Digital Image Correlation

Authors: S. Boukhtache, K. Abdelouahab, F. Berry, B. Blaysat, M. Grediac, F. Sur

Abstract: Convolutional Neural Networks (CNNs) constitute a class of Deep Learning models which have been used in the recent past to resolve many problems in computer vision, in particular optical flow estimation. Measuring displacement and strain fields can be regarded as a particular case of this problem. However, it seems that CNNs have never been used so far to perform such measurements. This work is ai… ▽ More Convolutional Neural Networks (CNNs) constitute a class of Deep Learning models which have been used in the recent past to resolve many problems in computer vision, in particular optical flow estimation. Measuring displacement and strain fields can be regarded as a particular case of this problem. However, it seems that CNNs have never been used so far to perform such measurements. This work is aimed at implementing a CNN able to retrieve displacement and strain fields from pairs of reference and deformed images of a flat speckled surface, as Digital Image Correlation (DIC) does. This paper explains how a CNN called StrainNet can be developed to reach this goal, and how specific ground truth datasets are elaborated to train this CNN. The main result is that StrainNet successfully performs such measurements, and that it achieves competing results in terms of metrological performance and computing time. The conclusion is that CNNs like StrainNet offer a viable alternative to DIC, especially for real-time applications. △ Less

Submitted 2 September, 2020; originally announced September 2020.

Comments: 35 pages, 25 figures. Accepted for publication in Optics and Lasers in Engineering on July 9, 2020

MSC Class: 74-05 ACM Class: I.2.m; I.4.m; J.2

arXiv:1807.00217 [pdf, other]

The Challenge of Multi-Operand Adders in CNNs on FPGAs: How not to solve it!

Authors: Kamel Abdelouahab, François Berry, Maxime Pelcat

Abstract: Convolutional Neural Networks (CNNs) are computationally intensive algorithms that currently require dedicated hardware to be executed. In the case of FPGA-Based accelerators, we point-out in this work the challenge of Multi-Operand Adders (MOAs) and their high resource utilization in an FPGA implementation of a CNN. To address this challenge, two optimization strategies, that rely on time-multipl… ▽ More Convolutional Neural Networks (CNNs) are computationally intensive algorithms that currently require dedicated hardware to be executed. In the case of FPGA-Based accelerators, we point-out in this work the challenge of Multi-Operand Adders (MOAs) and their high resource utilization in an FPGA implementation of a CNN. To address this challenge, two optimization strategies, that rely on time-multiplexing and approximate computing, are investigated. At first glance, the two strategies looked promising to reduce the footprint of a given architectural mapping, but when synthesized on the device, none of them gave the expected results. Experimental sections analyze the reasons of these unexpected results. △ Less

Submitted 30 June, 2018; originally announced July 2018.

Comments: Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation - SAMOS'18

arXiv:1806.01683 [pdf, other]

Accelerating CNN inference on FPGAs: A Survey

Authors: Kamel Abdelouahab, Maxime Pelcat, Jocelyn Serot, François Berry

Abstract: Convolutional Neural Networks (CNNs) are currently adopted to solve an ever greater number of problems, ranging from speech recognition to image classification and segmentation. The large amount of processing required by CNNs calls for dedicated and tailored hardware support methods. Moreover, CNN workloads have a streaming nature, well suited to reconfigurable hardware architectures such as FPGAs… ▽ More Convolutional Neural Networks (CNNs) are currently adopted to solve an ever greater number of problems, ranging from speech recognition to image classification and segmentation. The large amount of processing required by CNNs calls for dedicated and tailored hardware support methods. Moreover, CNN workloads have a streaming nature, well suited to reconfigurable hardware architectures such as FPGAs. The amount and diversity of research on the subject of CNN FPGA acceleration within the last 3 years demonstrates the tremendous industrial and academic interest. This paper presents a state-of-the-art of CNN inference accelerators over FPGAs. The computational workloads, their parallelism and the involved memory accesses are analyzed. At the level of neurons, optimizations of the convolutional and fully connected layers are explained and the performances of the different methods compared. At the network level, approximate computing and datapath optimization methods are covered and state-of-the-art approaches compared. The methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators and will fuel the future advances on efficient hardware deep learning. △ Less

Submitted 26 May, 2018; originally announced June 2018.

Comments: Cloning our HAL submission in ArXiv, Technical Report - Universite Clermont Auvergne, January 2018

arXiv:1712.04322 [pdf, ps, other]

doi 10.1109/LES.2017.2743247

Tactics to Directly Map CNN graphs on Embedded FPGAs

Authors: Kamel Abdelouahab, Maxime Pelcat, Jocelyn Sérot, Cédric Bourrasset, François Berry, Jocelyn Serot

Abstract: Deep Convolutional Neural Networks (CNNs) are the state-of-the-art in image classification. Since CNN feed forward propagation involves highly regular parallel computation, it benefits from a significant speed-up when running on fine grain parallel programmable logic devices. As a consequence, several studies have proposed FPGA-based accelerators for CNNs. However, because of the large computation… ▽ More Deep Convolutional Neural Networks (CNNs) are the state-of-the-art in image classification. Since CNN feed forward propagation involves highly regular parallel computation, it benefits from a significant speed-up when running on fine grain parallel programmable logic devices. As a consequence, several studies have proposed FPGA-based accelerators for CNNs. However, because of the large computationalpower required by CNNs, none of the previous studies has proposed a direct mapping of the CNN onto the physical resources of an FPGA, allocating each processing actor to its own hardware instance.In this paper, we demonstrate the feasibility of the so called direct hardware mapping (DHM) and discuss several tactics we explore to make DHM usable in practice. As a proof of concept, we introduce the HADDOC2 open source tool, that automatically transforms a CNN description into a synthesizable hardware description with platform-independent direct hardware mapping. △ Less

Submitted 20 November, 2017; originally announced December 2017.

Comments: IEEE Embedded Systems Letters, Institute of Electrical and Electronics Engineers, A Paraître, pp.1 - 1. arXiv admin note: text overlap with arXiv:1705.04543

arXiv:1705.04543 [pdf, ps, other]

Hardware Automated Dataflow Deployment of CNNs

Authors: Kamel Abdelouahab, Maxime Pelcat, Jocelyn Serot, Cedric Bourrasset, Jean-Charles Quinton, François Berry

Abstract: Deep Convolutional Neural Networks (CNNs) are the state of the art systems for image classification and scene understating. However, such techniques are computationally intensive and involve highly regular parallel computation. CNNs can thus benefit from a significant acceleration in execution time when running on fine grain programmable logic devices. As a consequence, several studies have propos… ▽ More Deep Convolutional Neural Networks (CNNs) are the state of the art systems for image classification and scene understating. However, such techniques are computationally intensive and involve highly regular parallel computation. CNNs can thus benefit from a significant acceleration in execution time when running on fine grain programmable logic devices. As a consequence, several studies have proposed FPGA-based accelerators for CNNs. However, because of the huge amount of the required hardware resources, none of these studies directly was based on a direct mapping of the CNN computing elements onto the FPGA physical resources. In this work, we demonstrate the feasibility of this so-called direct hardware mapping approach and discuss several associated implementation issues. As a proof of concept, we introduce the haddoc2 open source tool, that is able to automatically transform a CNN description into a platform independent hardware description for FPGA implementation. △ Less

Submitted 29 June, 2017; v1 submitted 4 May, 2017; originally announced May 2017.

Report number: Haddoc/2016-06TR03

arXiv:1703.09779 [pdf, other]

doi 10.1145/2967413.2967430

A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA

Authors: Kamel Abdelouahab, Cedric Bourrasset, Maxime Pelcat, François Berry, Jean-Charles Quinton, Jocelyn Serot

Abstract: Deep Neural Networks are becoming the de-facto standard models for image understanding, and more generally for computer vision tasks. As they involve highly parallelizable computations, CNN are well suited to current fine grain programmable logic devices. Thus, multiple CNN accelerators have been successfully implemented on FPGAs. Unfortunately, FPGA resources such as logic elements or DSP units r… ▽ More Deep Neural Networks are becoming the de-facto standard models for image understanding, and more generally for computer vision tasks. As they involve highly parallelizable computations, CNN are well suited to current fine grain programmable logic devices. Thus, multiple CNN accelerators have been successfully implemented on FPGAs. Unfortunately, FPGA resources such as logic elements or DSP units remain limited. This work presents a holistic method relying on approximate computing and design space exploration to optimize the DSP block utilization of a CNN implementation on an FPGA. This method was tested when implementing a reconfigurable OCR convolutional neural network on an Altera Stratix V device and varying both data representation and CNN topology in order to find the best combination in terms of DSP block utilization and classification accuracy. This exploration generated dataflow architectures of 76 CNN topologies with 5 different fixed point representation. Most efficient implementation performs 883 classifications/sec at 256 x 256 resolution using 8% of the available DSP blocks. △ Less

Submitted 21 March, 2017; originally announced March 2017.

Comments: 8 pages, 6 figures

Showing 1–11 of 11 results for author: Berry, F