-
Neural Architecture Search of Hybrid Models for NPU-CIM Heterogeneous AR/VR Devices
Authors:
Yiwei Zhao,
Ziyun Li,
Win-San Khwa,
Xiaoyu Sun,
Sai Qian Zhang,
Syed Shakib Sarwar,
Kleber Hugo Stangherlin,
Yi-Lun Lu,
Jorge Tomas Gomez,
Jae-Sun Seo,
Phillip B. Gibbons,
Barbara De Salvo,
Chiao Liu
Abstract:
Low-Latency and Low-Power Edge AI is essential for Virtual Reality and Augmented Reality applications. Recent advances show that hybrid models, combining convolution layers (CNN) and transformers (ViT), often achieve superior accuracy/performance tradeoff on various computer vision and machine learning (ML) tasks. However, hybrid ML models can pose system challenges for latency and energy-efficien…
▽ More
Low-Latency and Low-Power Edge AI is essential for Virtual Reality and Augmented Reality applications. Recent advances show that hybrid models, combining convolution layers (CNN) and transformers (ViT), often achieve superior accuracy/performance tradeoff on various computer vision and machine learning (ML) tasks. However, hybrid ML models can pose system challenges for latency and energy-efficiency due to their diverse nature in dataflow and memory access patterns. In this work, we leverage the architecture heterogeneity from Neural Processing Units (NPU) and Compute-In-Memory (CIM) and perform diverse execution schemas to efficiently execute these hybrid models. We also introduce H4H-NAS, a Neural Architecture Search framework to design efficient hybrid CNN/ViT models for heterogeneous edge systems with both NPU and CIM. Our H4H-NAS approach is powered by a performance estimator built with NPU performance results measured on real silicon, and CIM performance based on industry IPs. H4H-NAS searches hybrid CNN/ViT models with fine granularity and achieves significant (up to 1.34%) top-1 accuracy improvement on ImageNet dataset. Moreover, results from our Algo/HW co-design reveal up to 56.08% overall latency and 41.72% energy improvements by introducing such heterogeneous computing over baseline solutions. The framework guides the design of hybrid network architectures and system architectures of NPU+CIM heterogeneous systems.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
Swin transformers are robust to distribution and concept drift in endoscopy-based longitudinal rectal cancer assessment
Authors:
Jorge Tapias Gomez,
Aneesh Rangnekar,
Hannah Williams,
Hannah Thompson,
Julio Garcia-Aguilar,
Joshua Jesse Smith,
Harini Veeraraghavan
Abstract:
Endoscopic images are used at various stages of rectal cancer treatment starting from cancer screening, diagnosis, during treatment to assess response and toxicity from treatments such as colitis, and at follow up to detect new tumor or local regrowth (LR). However, subjective assessment is highly variable and can underestimate the degree of response in some patients, subjecting them to unnecessar…
▽ More
Endoscopic images are used at various stages of rectal cancer treatment starting from cancer screening, diagnosis, during treatment to assess response and toxicity from treatments such as colitis, and at follow up to detect new tumor or local regrowth (LR). However, subjective assessment is highly variable and can underestimate the degree of response in some patients, subjecting them to unnecessary surgery, or overestimate response that places patients at risk of disease spread. Advances in deep learning has shown the ability to produce consistent and objective response assessment for endoscopic images. However, methods for detecting cancers, regrowth, and monitoring response during the entire course of patient treatment and follow-up are lacking. This is because, automated diagnosis and rectal cancer response assessment requires methods that are robust to inherent imaging illumination variations and confounding conditions (blood, scope, blurring) present in endoscopy images as well as changes to the normal lumen and tumor during treatment. Hence, a hierarchical shifted window (Swin) transformer was trained to distinguish rectal cancer from normal lumen using endoscopy images. Swin as well as two convolutional (ResNet-50, WideResNet-50), and vision transformer (ViT) models were trained and evaluated on follow-up longitudinal images to detect LR on private dataset as well as on out-of-distribution (OOD) public colonoscopy datasets to detect pre/non-cancerous polyps. Color shifts were applied using optimal transport to simulate distribution shifts. Swin and ResNet models were similarly accurate in the in-distribution dataset. Swin was more accurate than other methods (follow-up: 0.84, OOD: 0.83) even when subject to color shifts (follow-up: 0.83, OOD: 0.87), indicating capability to provide robust performance for longitudinal cancer assessment.
△ Less
Submitted 26 August, 2024; v1 submitted 6 May, 2024;
originally announced May 2024.
-
Siracusa: A 16 nm Heterogenous RISC-V SoC for Extended Reality with At-MRAM Neural Engine
Authors:
Arpan Suravi Prasad,
Moritz Scherer,
Francesco Conti,
Davide Rossi,
Alfio Di Mauro,
Manuel Eggimann,
Jorge Tómas Gómez,
Ziyun Li,
Syed Shakib Sarwar,
Zhao Wang,
Barbara De Salvo,
Luca Benini
Abstract:
Extended reality (XR) applications are Machine Learning (ML)-intensive, featuring deep neural networks (DNNs) with millions of weights, tightly latency-bound (10-20 ms end-to-end), and power-constrained (low tens of mW average power). While ML performance and efficiency can be achieved by introducing neural engines within low-power systems-on-chip (SoCs), system-level power for nontrivial DNNs dep…
▽ More
Extended reality (XR) applications are Machine Learning (ML)-intensive, featuring deep neural networks (DNNs) with millions of weights, tightly latency-bound (10-20 ms end-to-end), and power-constrained (low tens of mW average power). While ML performance and efficiency can be achieved by introducing neural engines within low-power systems-on-chip (SoCs), system-level power for nontrivial DNNs depends strongly on the energy of non-volatile memory (NVM) access for network weights. This work introduces Siracusa, a near-sensor heterogeneous SoC for next-generation XR devices manufactured in 16 nm CMOS. Siracusa couples an octa-core cluster of RISC-V digital signal processing cores with a novel tightly-coupled "At-Memory" integration between a state-of-the-art digital neural engine called N-EUREKA and an on-chip NVM based on magnetoresistive memory(MRAM), achieving 1.7x higher throughput and 3x better energy efficiency than XR SoCs using NVM as background memory. The fabricated SoC prototype achieves an area efficiency of 65.2 GOp/s/mm2 and a peak energy efficiency of 8.84 TOp/J for DNN inference while supporting complex heterogeneous application workloads, which combine ML with conventional signal processing and control.
△ Less
Submitted 14 April, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
Insights from the Design Space Exploration of Flow-Guided Nanoscale Localization
Authors:
Filip Lemic,
Gerard Calvo Bartra,
Arnau Brosa López,
Jorge Torres Gómez,
Jakob Struye,
Falko Dressler,
Sergi Abadal,
Xavier Costa Perez
Abstract:
Nanodevices with Terahertz (THz)-based wireless communication capabilities are providing a primer for flow-guided localization within the human bloodstreams. Such localization is allowing for assigning the locations of sensed events with the events themselves, providing benefits along the lines of early and precise diagnostics, and reduced costs and invasiveness. Flow-guided localization is still…
▽ More
Nanodevices with Terahertz (THz)-based wireless communication capabilities are providing a primer for flow-guided localization within the human bloodstreams. Such localization is allowing for assigning the locations of sensed events with the events themselves, providing benefits along the lines of early and precise diagnostics, and reduced costs and invasiveness. Flow-guided localization is still in a rudimentary phase, with only a handful of works targeting the problem. Nonetheless, the performance assessments of the proposed solutions are already carried out in a non-standardized way, usually along a single performance metric, and ignoring various aspects that are relevant at such a scale (e.g., nanodevices' limited energy) and for such a challenging environment (e.g., extreme attenuation of in-body THz propagation). As such, these assessments feature low levels of realism and cannot be compared in an objective way. Toward addressing this issue, we account for the environmental and scale-related peculiarities of the scenario and assess the performance of two state-of-the-art flow-guided localization approaches along a set of heterogeneous performance metrics such as the accuracy and reliability of localization.
△ Less
Submitted 2 August, 2024; v1 submitted 29 May, 2023;
originally announced May 2023.
-
Focusing on Information Context for ITS using a Spatial Age of Information Model
Authors:
Julian Heinovski,
Jorge Torres Gómez,
Falko Dressler
Abstract:
New technologies for sensing and communication act as enablers for cooperative driving applications. Sensors are able to detect objects in the surrounding environment and information such as their current location is exchanged among vehicles. In order to cope with the vehicles' mobility, such information is required to be as fresh as possible for proper operation of cooperative driving application…
▽ More
New technologies for sensing and communication act as enablers for cooperative driving applications. Sensors are able to detect objects in the surrounding environment and information such as their current location is exchanged among vehicles. In order to cope with the vehicles' mobility, such information is required to be as fresh as possible for proper operation of cooperative driving applications. The age of information (AoI) has been proposed as a metric for evaluating freshness of information; recently also within the context of intelligent transportation systems (ITS). We investigate mechanisms to reduce the AoI of data transported in form of beacon messages while controlling their emission rate. We aim to balance packet collision probability and beacon frequency using the average peak age of information (PAoI) as a metric. This metric, however, only accounts for the generation time of the data but not for application-specific aspects, such as the location of the transmitting vehicle. We thus propose a new way of interpreting the AoI by considering information context, thereby incorporating vehicles' locations. As an example, we characterize such importance using the orientation and the distance of the involved vehicles. In particular, we introduce a weighting coefficient used in combination with the PAoI to evaluate the information freshness, thus emphasizing on information from more important neighbors. We further design the beaconing approach in a way to meet a given AoI requirement, thus, saving resources on the wireless channel while keeping the AoI minimal. We illustrate the effectiveness of our approach in Manhattan-like urban scenarios, reaching pre-specified targets for the AoI of beacon messages.
△ Less
Submitted 16 June, 2023; v1 submitted 25 April, 2023;
originally announced April 2023.
-
Toward Standardized Performance Evaluation of Flow-guided Nanoscale Localization
Authors:
Arnau Brosa López,
Filip Lemic,
Jakob Struye,
Jorge Torres Gómez,
Esteban Municio,
Carmen Delgado,
Gerard Calvo Bartra,
Falko Dressler,
Eduard Alarcón,
Jeroen Famaey,
Sergi Abadal,
Xavier Costa Pérez
Abstract:
Nanoscale devices with Terahertz (THz) communication capabilities are envisioned to be deployed within human bloodstreams. Such devices will enable fine-grained sensing-based applications for detecting early indications (i.e., biomarkers) of various health conditions, as well as actuation-based ones such as targeted drug delivery. Associating the locations of such events with the events themselves…
▽ More
Nanoscale devices with Terahertz (THz) communication capabilities are envisioned to be deployed within human bloodstreams. Such devices will enable fine-grained sensing-based applications for detecting early indications (i.e., biomarkers) of various health conditions, as well as actuation-based ones such as targeted drug delivery. Associating the locations of such events with the events themselves would provide an additional utility for precision diagnostics and treatment. This vision yielded a new class of in-body localization coined under the term "flow-guided nanoscale localization". Such localization can be piggybacked on THz communication for detecting body regions in which biological events were observed based on the duration of one circulation of a nanodevice in the bloodstream. From a decades-long research on objective benchmarking of "traditional" indoor localization, as well as its eventual standardization (e.g., ISO/IEC 18305:2016), we know that in early stages the reported performance results were often incomplete (e.g., targeting a subset of relevant performance metrics), carrying out benchmarking experiments in different evaluation environments and scenarios, and utilizing inconsistent performance indicators. To avoid such a "lock-in" in flow-guided localization, in this paper we propose a workflow for standardized performance evaluation of such localization. The workflow is implemented in the form of an open-source simulation framework that is able to jointly account for the mobility of the nanodevices, in-body THz communication between with on-body anchors, and energy-related and other technological constraints (e.g., pulse-based modulation) at the nanodevice level. Accounting for these constraints, the framework is able to generate the raw data that can be streamlined into different flow-guided localization solutions for generating standardized performance benchmarks.
△ Less
Submitted 7 March, 2024; v1 submitted 14 March, 2023;
originally announced March 2023.
-
Wi-Fi Meets ML: A Survey on Improving IEEE 802.11 Performance with Machine Learning
Authors:
Szymon Szott,
Katarzyna Kosek-Szott,
Piotr Gawłowicz,
Jorge Torres Gómez,
Boris Bellalta,
Anatolij Zubow,
Falko Dressler
Abstract:
Wireless local area networks (WLANs) empowered by IEEE 802.11 (Wi-Fi) hold a dominant position in providing Internet access thanks to their freedom of deployment and configuration as well as the existence of affordable and highly interoperable devices. The Wi-Fi community is currently deploying Wi-Fi 6 and developing Wi-Fi 7, which will bring higher data rates, better multi-user and multi-AP suppo…
▽ More
Wireless local area networks (WLANs) empowered by IEEE 802.11 (Wi-Fi) hold a dominant position in providing Internet access thanks to their freedom of deployment and configuration as well as the existence of affordable and highly interoperable devices. The Wi-Fi community is currently deploying Wi-Fi 6 and developing Wi-Fi 7, which will bring higher data rates, better multi-user and multi-AP support, and, most importantly, improved configuration flexibility. These technical innovations, including the plethora of configuration parameters, are making next-generation WLANs exceedingly complex as the dependencies between parameters and their joint optimization usually have a non-linear impact on network performance. The complexity is further increased in the case of dense deployments and coexistence in shared bands. While classical optimization approaches fail in such conditions, machine learning (ML) is able to handle complexity. Much research has been published on using ML to improve Wi-Fi performance and solutions are slowly being adopted in existing deployments. In this survey, we adopt a structured approach to describe the various Wi-Fi areas where ML is applied. To this end, we analyze over 250 papers in the field, providing readers with an overview of the main trends. Based on this review, we identify specific open challenges and provide general future research directions.
△ Less
Submitted 6 October, 2022; v1 submitted 10 September, 2021;
originally announced September 2021.