-
Rethinking the Role of Infrastructure in Collaborative Perception
Authors:
Hyunchul Bae,
Minhee Kang,
Minwoo Song,
Heejin Ahn
Abstract:
Collaborative Perception (CP) is a process in which an ego agent receives and fuses sensor information from surrounding vehicles and infrastructure to enhance its perception capability. To evaluate the need for infrastructure equipped with sensors, extensive and quantitative analysis of the role of infrastructure data in CP is crucial, yet remains underexplored. To address this gap, we first quant…
▽ More
Collaborative Perception (CP) is a process in which an ego agent receives and fuses sensor information from surrounding vehicles and infrastructure to enhance its perception capability. To evaluate the need for infrastructure equipped with sensors, extensive and quantitative analysis of the role of infrastructure data in CP is crucial, yet remains underexplored. To address this gap, we first quantitatively assess the importance of infrastructure data in existing vehicle-centric CP, where the ego agent is a vehicle. Furthermore, we compare vehicle-centric CP with infra-centric CP, where the ego agent is now the infrastructure, to evaluate the effectiveness of each approach. Our results demonstrate that incorporating infrastructure data improves 3D detection accuracy by up to 10.87%, and infra-centric CP shows enhanced noise robustness and increases accuracy by up to 42.53% compared with vehicle-centric CP.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
A GPT-based Decision Transformer for Multi-Vehicle Coordination at Unsignalized Intersections
Authors:
Eunjae Lee,
Minhee Kang,
Yoojin Choi,
Heejin Ahn
Abstract:
In this paper, we explore the application of the Decision Transformer, a decision-making algorithm based on the Generative Pre-trained Transformer (GPT) architecture, to multi-vehicle coordination at unsignalized intersections. We formulate the coordination problem so as to find the optimal trajectories for multiple vehicles at intersections, modeling it as a sequence prediction task to fully leve…
▽ More
In this paper, we explore the application of the Decision Transformer, a decision-making algorithm based on the Generative Pre-trained Transformer (GPT) architecture, to multi-vehicle coordination at unsignalized intersections. We formulate the coordination problem so as to find the optimal trajectories for multiple vehicles at intersections, modeling it as a sequence prediction task to fully leverage the power of GPTs as a sequence model. Through extensive experiments, we compare our approach to a reservation-based intersection management system. Our results show that the Decision Transformer can outperform the training data in terms of total travel time and can be generalized effectively to various scenarios, including noise-induced velocity variations, continuous interaction environments, and different vehicle numbers and road configurations.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
LLaVA Needs More Knowledge: Retrieval Augmented Natural Language Generation with Knowledge Graph for Explaining Thoracic Pathologies
Authors:
Ameer Hamza,
Abdullah,
Yong Hyun Ahn,
Sungyoung Lee,
Seong Tae Kim
Abstract:
Generating Natural Language Explanations (NLEs) for model predictions on medical images, particularly those depicting thoracic pathologies, remains a critical and challenging task. Existing methodologies often struggle due to general models' insufficient domain-specific medical knowledge and privacy concerns associated with retrieval-based augmentation techniques. To address these issues, we propo…
▽ More
Generating Natural Language Explanations (NLEs) for model predictions on medical images, particularly those depicting thoracic pathologies, remains a critical and challenging task. Existing methodologies often struggle due to general models' insufficient domain-specific medical knowledge and privacy concerns associated with retrieval-based augmentation techniques. To address these issues, we propose a novel Vision-Language framework augmented with a Knowledge Graph (KG)-based datastore, which enhances the model's understanding by incorporating additional domain-specific medical knowledge essential for generating accurate and informative NLEs. Our framework employs a KG-based retrieval mechanism that not only improves the precision of the generated explanations but also preserves data privacy by avoiding direct data retrieval. The KG datastore is designed as a plug-and-play module, allowing for seamless integration with various model architectures. We introduce and evaluate three distinct frameworks within this paradigm: KG-LLaVA, which integrates the pre-trained LLaVA model with KG-RAG; Med-XPT, a custom framework combining MedCLIP, a transformer-based projector, and GPT-2; and Bio-LLaVA, which adapts LLaVA by incorporating the Bio-ViT-L vision model. These frameworks are validated on the MIMIC-NLE dataset, where they achieve state-of-the-art results, underscoring the effectiveness of KG augmentation in generating high-quality NLEs for thoracic pathologies.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Redefining Data Pairing for Motion Retargeting Leveraging a Human Body Prior
Authors:
Xiyana Figuera,
Soogeun Park,
Hyemin Ahn
Abstract:
We propose MR HuBo(Motion Retargeting leveraging a HUman BOdy prior), a cost-effective and convenient method to collect high-quality upper body paired <robot, human> pose data, which is essential for data-driven motion retargeting methods. Unlike existing approaches which collect <robot, human> pose data by converting human MoCap poses into robot poses, our method goes in reverse. We first sample…
▽ More
We propose MR HuBo(Motion Retargeting leveraging a HUman BOdy prior), a cost-effective and convenient method to collect high-quality upper body paired <robot, human> pose data, which is essential for data-driven motion retargeting methods. Unlike existing approaches which collect <robot, human> pose data by converting human MoCap poses into robot poses, our method goes in reverse. We first sample diverse random robot poses, and then convert them into human poses. However, since random robot poses can result in extreme and infeasible human poses, we propose an additional technique to sort out extreme poses by exploiting a human body prior trained from a large amount of human pose data. Our data collection method can be used for any humanoid robots, if one designs or optimizes the system's hyperparameters which include a size scale factor and the joint angle ranges for sampling. In addition to this data collection method, we also present a two-stage motion retargeting neural network that can be trained via supervised learning on a large amount of paired data. Compared to other learning-based methods trained via unsupervised learning, we found that our deep neural network trained with ample high-quality paired data achieved notable performance. Our experiments also show that our data filtering method yields better retargeting results than training the model with raw and noisy data. Our code and video results are available on https://sites.google.com/view/mr-hubo/
△ Less
Submitted 1 October, 2024; v1 submitted 20 September, 2024;
originally announced September 2024.
-
Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching
Authors:
Sungmin Yun,
Kwanhee Kyung,
Juhwan Cho,
Jaewan Choi,
Jongmin Kim,
Byeongho Kim,
Sukhan Lee,
Kyomin Sohn,
Jung Ho Ahn
Abstract:
Large language models (LLMs) have emerged due to their capability to generate high-quality content across diverse contexts. To reduce their explosively increasing demands for computing resources, a mixture of experts (MoE) has emerged. The MoE layer enables exploiting a huge number of parameters with less computation. Applying state-of-the-art continuous batching increases throughput; however, it…
▽ More
Large language models (LLMs) have emerged due to their capability to generate high-quality content across diverse contexts. To reduce their explosively increasing demands for computing resources, a mixture of experts (MoE) has emerged. The MoE layer enables exploiting a huge number of parameters with less computation. Applying state-of-the-art continuous batching increases throughput; however, it leads to frequent DRAM access in the MoE and attention layers. We observe that conventional computing devices have limitations when processing the MoE and attention layers, which dominate the total execution time and exhibit low arithmetic intensity (Op/B). Processing MoE layers only with devices targeting low-Op/B such as processing-in-memory (PIM) architectures is challenging due to the fluctuating Op/B in the MoE layer caused by continuous batching.
To address these challenges, we propose Duplex, which comprises xPU tailored for high-Op/B and Logic-PIM to effectively perform low-Op/B operation within a single device. Duplex selects the most suitable processor based on the Op/B of each layer within LLMs. As the Op/B of the MoE layer is at least 1 and that of the attention layer has a value of 4-8 for grouped query attention, prior PIM architectures are not efficient, which place processing units inside DRAM dies and only target extremely low-Op/B (under one) operations. Based on recent trends, Logic-PIM adds more through-silicon vias (TSVs) to enable high-bandwidth communication between the DRAM die and the logic die and place powerful processing units on the logic die, which is best suited for handling low-Op/B operations ranging from few to a few dozens. To maximally utilize the xPU and Logic-PIM, we propose expert and attention co-processing.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
Neutrino Mass Origin and Flavored-QCD axion in an Extra-Dimension
Authors:
Y. H. Ahn
Abstract:
We propose a unified flavor model with the Standard Model fields on two 3-branes within an extra-dimensional setup, incorporating $Γ_N\times U(1)_X$ symmetry with a modulus and scalar field responsible for symmetry breaking. When compactified to four dimensions, Yukawa couplings, initially expressed as modular forms with mass dimensions, are normalized to conform to canonical four-dimensional theo…
▽ More
We propose a unified flavor model with the Standard Model fields on two 3-branes within an extra-dimensional setup, incorporating $Γ_N\times U(1)_X$ symmetry with a modulus and scalar field responsible for symmetry breaking. When compactified to four dimensions, Yukawa couplings, initially expressed as modular forms with mass dimensions, are normalized to conform to canonical four-dimensional theory, with the Yukawa coefficients being complex numbers of unit absolute value. We show that this model naturally explains the mass and mixing hierarchies of quarks and leptons, solves the strong CP problem, provides a natural solution to the hierarchy problem, and can inherently satisfy no axionic domain-wall problem. The $U(1)_X$ mixed gravitational anomaly-free condition necessitates that electrically neutral mirror bulk fermions couple to the normal neutrino field on the 3-brane, consistent with the boundary condition. Consequently, we demonstrate a mechanism for generating light neutrino masses, similar to the Weinberg operator, by transmitting the information of $U(1)_X$ breakdown between the two 3-branes. The scale of $U(1)_X$ breaking is estimated from neutrino data to be around $10^{15}$ GeV, leading to a QCD axion mass of approximately $2.5\times10^{-9}$ eV. Through numerical analysis, we demonstrate that the model yields results consistent with current experimental data on quarks and leptons, and it also provides predictions for neutrinos.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Listwise Reward Estimation for Offline Preference-based Reinforcement Learning
Authors:
Heewoong Choi,
Sangwon Jung,
Hongjoon Ahn,
Taesup Moon
Abstract:
In Reinforcement Learning (RL), designing precise reward functions remains to be a challenge, particularly when aligning with human intent. Preference-based RL (PbRL) was introduced to address this problem by learning reward models from human feedback. However, existing PbRL methods have limitations as they often overlook the second-order preference that indicates the relative strength of preferen…
▽ More
In Reinforcement Learning (RL), designing precise reward functions remains to be a challenge, particularly when aligning with human intent. Preference-based RL (PbRL) was introduced to address this problem by learning reward models from human feedback. However, existing PbRL methods have limitations as they often overlook the second-order preference that indicates the relative strength of preference. In this paper, we propose Listwise Reward Estimation (LiRE), a novel approach for offline PbRL that leverages second-order preference information by constructing a Ranked List of Trajectories (RLT), which can be efficiently built by using the same ternary feedback type as traditional methods. To validate the effectiveness of LiRE, we propose a new offline PbRL dataset that objectively reflects the effect of the estimated rewards. Our extensive experiments on the dataset demonstrate the superiority of LiRE, i.e., outperforming state-of-the-art baselines even with modest feedback budgets and enjoying robustness with respect to the number of feedbacks and feedback noise. Our code is available at https://github.com/chwoong/LiRE
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Convergence Speed for Fekete Points on Uniformly Polynomially Cuspidal Sets
Authors:
Hyunsoo Ahn,
Ngoc Cuong Nguyen
Abstract:
We obtain the convergence speed for Fekete points on uniformly polynomially cuspidal compact sets introduced by Pawlucki and Pleśniak. This is done by showing that these sets are $(\mathscr{C}^α, \mathscr{C}^{α'})$-regular in the sense of Dinh, Ma and Nguyen.
We obtain the convergence speed for Fekete points on uniformly polynomially cuspidal compact sets introduced by Pawlucki and Pleśniak. This is done by showing that these sets are $(\mathscr{C}^α, \mathscr{C}^{α'})$-regular in the sense of Dinh, Ma and Nguyen.
△ Less
Submitted 14 August, 2024; v1 submitted 6 August, 2024;
originally announced August 2024.
-
Stock-driven Household Attention
Authors:
Hie Joo Ahn,
Shihan Xie
Abstract:
We investigate the effects of stockholding on households' attention to the macroeconomy. Households' attentiveness is measured by their accuracy of inflation expectations and perceptions. Relative to non-stockholders, stockholders produce more accurate inflation forecasts and backcasts, disagree less about future inflation, and adjust their outlook more responsively to news, suggesting that stock-…
▽ More
We investigate the effects of stockholding on households' attention to the macroeconomy. Households' attentiveness is measured by their accuracy of inflation expectations and perceptions. Relative to non-stockholders, stockholders produce more accurate inflation forecasts and backcasts, disagree less about future inflation, and adjust their outlook more responsively to news, suggesting that stock-market participation raises households' attention. Frequent changes in stock prices incentivize stockholders to closely monitor financial markets for optimal trading, given the low cost of acquiring information. Consequently, paying attention to the macroeconomy helps hedge the risks associated with holding stocks. Therefore, attention heterogeneity driven by stockholdings can be a channel through which the distributional consequences of monetary policy are created.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Cheddar: A Swift Fully Homomorphic Encryption Library for CUDA GPUs
Authors:
Jongmin Kim,
Wonseok Choi,
Jung Ho Ahn
Abstract:
Fully homomorphic encryption (FHE) is a cryptographic technology capable of resolving security and privacy problems in cloud computing by encrypting data in use. However, FHE introduces tremendous computational overhead for processing encrypted data, causing FHE workloads to become 2-6 orders of magnitude slower than their unencrypted counterparts. To mitigate the overhead, we propose Cheddar, an…
▽ More
Fully homomorphic encryption (FHE) is a cryptographic technology capable of resolving security and privacy problems in cloud computing by encrypting data in use. However, FHE introduces tremendous computational overhead for processing encrypted data, causing FHE workloads to become 2-6 orders of magnitude slower than their unencrypted counterparts. To mitigate the overhead, we propose Cheddar, an FHE library for CUDA GPUs, which demonstrates significantly faster performance compared to prior GPU implementations. We develop optimized functionalities at various implementation levels ranging from efficient low-level primitives to streamlined high-level operational sequences. Especially, we improve major FHE operations, including number-theoretic transform and base conversion, based on efficient kernel designs using a small word size of 32 bits. By these means, Cheddar demonstrates 2.9 to 25.6 times higher performance for representative FHE workloads compared to prior GPU implementations.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
ParCon: Noise-Robust Collaborative Perception via Multi-module Parallel Connection
Authors:
Hyunchul Bae,
Minhee Kang,
Heejin Ahn
Abstract:
In this paper, we investigate improving the perception performance of autonomous vehicles through communication with other vehicles and road infrastructures. To this end, we introduce a novel collaborative perception architecture, called ParCon, which connects multiple modules in parallel, as opposed to the sequential connections used in most other collaborative perception methods. Through extensi…
▽ More
In this paper, we investigate improving the perception performance of autonomous vehicles through communication with other vehicles and road infrastructures. To this end, we introduce a novel collaborative perception architecture, called ParCon, which connects multiple modules in parallel, as opposed to the sequential connections used in most other collaborative perception methods. Through extensive experiments, we demonstrate that ParCon inherits the advantages of parallel connection. Specifically, ParCon is robust to noise, as the parallel architecture allows each module to manage noise independently and complement the limitations of other modules. As a result, ParCon achieves state-of-the-art accuracy, particularly in noisy environments, such as real-world datasets, increasing detection accuracy by 6.91%. Additionally, ParCon is computationally efficient, reducing floating-point operations (FLOPs) by 11.46%.
△ Less
Submitted 13 October, 2024; v1 submitted 16 July, 2024;
originally announced July 2024.
-
Mask-Free Neuron Concept Annotation for Interpreting Neural Networks in Medical Domain
Authors:
Hyeon Bae Kim,
Yong Hyun Ahn,
Seong Tae Kim
Abstract:
Recent advancements in deep neural networks have shown promise in aiding disease diagnosis and medical decision-making. However, ensuring transparent decision-making processes of AI models in compliance with regulations requires a comprehensive understanding of the model's internal workings. However, previous methods heavily rely on expensive pixel-wise annotated datasets for interpreting the mode…
▽ More
Recent advancements in deep neural networks have shown promise in aiding disease diagnosis and medical decision-making. However, ensuring transparent decision-making processes of AI models in compliance with regulations requires a comprehensive understanding of the model's internal workings. However, previous methods heavily rely on expensive pixel-wise annotated datasets for interpreting the model, presenting a significant drawback in medical domains. In this paper, we propose a novel medical neuron concept annotation method, named Mask-free Medical Model Interpretation (MAMMI), addresses these challenges. By using a vision-language model, our method relaxes the need for pixel-level masks for neuron concept annotation. MAMMI achieves superior performance compared to other interpretation methods, demonstrating its efficacy in providing rich representations for neurons in medical image analysis. Our experiments on a model trained on NIH chest X-rays validate the effectiveness of MAMMI, showcasing its potential for transparent clinical decision-making in the medical domain. The code is available at https://github.com/ailab-kyunghee/MAMMI.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Telescope control software and proto-model siderostat for the SDSS-V Local Volume Mapper
Authors:
Hojae Ahn,
Florian Briegel,
Jimin Han,
Mingyu Jeon,
Thomas M. Herbst,
Sumin Lee,
Woojin Park,
Sunwoo Lee,
Inhwan Jung,
Tae-Geun Ji,
Changgon Kim,
Geon Hee Kim,
Wolfgang Gaessler,
Markus Kuhlberg,
Hyun Chul Park,
Soojong Pak,
Nicholas P. Konidaris,
Niv Drory,
José R. Sánchez-Gallego,
Cynthia S. Froning,
Solange Ramirez,
Juna A. Kollmeier
Abstract:
The fifth Sloan Digital Sky Survey (SDSS-V) Local Volume Mapper (LVM) is a wide-field integral field unit (IFU) survey that uses an array of four 160 mm fixed telescopes with siderostats to minimize the number of moving parts. Individual telescope observes the science field or calibration field independently and is synchronized with the science exposure. We developed the LVM Acquisition and Guidin…
▽ More
The fifth Sloan Digital Sky Survey (SDSS-V) Local Volume Mapper (LVM) is a wide-field integral field unit (IFU) survey that uses an array of four 160 mm fixed telescopes with siderostats to minimize the number of moving parts. Individual telescope observes the science field or calibration field independently and is synchronized with the science exposure. We developed the LVM Acquisition and Guiding Package (LVMAGP) optimized telescope control software program for LVM observations, which can simultaneously control four focusers, three K-mirrors, one fiber selector, four mounts (siderostats), and seven guide cameras. This software is built on a hierarchical architecture and the SDSS framework and provides three key sequences: autofocus, field acquisition, and autoguide. We designed and fabricated a proto-model siderostat to test the telescope pointing model and LVMAGP software. The mirrors of the proto-model were designed as an isogrid open-back type, which reduced the weight by 46% and enabled reaching thermal equilibrium quickly. Additionally, deflection due to bolting torque, self-gravity, and thermal deformation was simulated, and the maximum scatter of the pointing model induced by the tilt of optomechanics was predicted to be $4'.4$, which can be compensated for by the field acquisition sequence. We performed a real sky test of LVMAGP with the proto-model siderostat and obtained field acquisition and autoguide accuracies of $0''.38$ and $1''.5$, respectively. It met all requirements except for the autoguide specification, which will be resolved by more precise alignment among the hardware components at Las Campanas Observatory.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Faster Metallic Surface Defect Detection Using Deep Learning with Channel Shuffling
Authors:
Siddiqui Muhammad Yasir,
Hyunsik Ahn
Abstract:
Deep learning has been constantly improving in recent years and a significant number of researchers have devoted themselves to the research of defect detection algorithms. Detection and recognition of small and complex targets is still a problem that needs to be solved. The authors of this research would like to present an improved defect detection model for detecting small and complex defect targ…
▽ More
Deep learning has been constantly improving in recent years and a significant number of researchers have devoted themselves to the research of defect detection algorithms. Detection and recognition of small and complex targets is still a problem that needs to be solved. The authors of this research would like to present an improved defect detection model for detecting small and complex defect targets in steel surfaces. During steel strip production mechanical forces and environmental factors cause surface defects of the steel strip. Therefore the detection of such defects is key to the production of high-quality products. Moreover surface defects of the steel strip cause great economic losses to the high-tech industry. So far few studies have explored methods of identifying the defects and most of the currently available algorithms are not sufficiently effective. Therefore this study presents an improved real-time metallic surface defect detection model based on You Only Look Once (YOLOv5) specially designed for small networks. For the smaller features of the target the conventional part is replaced with a depth-wise convolution and channel shuffle mechanism. Then assigning weights to Feature Pyramid Networks (FPN) output features and fusing them increases feature propagation and the networks characterization ability. The experimental results reveal that the improved proposed model outperforms other comparable models in terms of accuracy and detection time. The precision of the proposed model achieved by @mAP is 77.5% on the Northeastern University Dataset NEU-DET and 70.18% on the GC10-DET datasets
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
3D Instance Segmentation Using Deep Learning on RGB-D Indoor Data
Authors:
Siddiqui Muhammad Yasir,
Amin Muhammad Sadiq,
Hyunsik Ahn
Abstract:
3D object recognition is a challenging task for intelligent and robot systems in industrial and home indoor environments. It is critical for such systems to recognize and segment the 3D object instances that they encounter on a frequent basis. The computer vision, graphics, and machine learning fields have all given it a lot of attention. Traditionally, 3D segmentation was done with hand-crafted f…
▽ More
3D object recognition is a challenging task for intelligent and robot systems in industrial and home indoor environments. It is critical for such systems to recognize and segment the 3D object instances that they encounter on a frequent basis. The computer vision, graphics, and machine learning fields have all given it a lot of attention. Traditionally, 3D segmentation was done with hand-crafted features and designed approaches that did not achieve acceptable performance and could not be generalized to large-scale data. Deep learning approaches have lately become the preferred method for 3D segmentation challenges by their great success in 2D computer vision. However, the task of instance segmentation is currently less explored. In this paper, we propose a novel approach for efficient 3D instance segmentation using red green blue and depth (RGB-D) data based on deep learning. The 2D region based convolutional neural networks (Mask R-CNN) deep learning model with point based rending module is adapted to integrate with depth information to recognize and segment 3D instances of objects. In order to generate 3D point cloud coordinates (x, y, z), segmented 2D pixels (u, v) of recognized object regions in the RGB image are merged into (u, v) points of the depth image. Moreover, we conducted an experiment and analysis to compare our proposed method from various points of view and distances. The experimentation shows the proposed 3D object recognition and instance segmentation are sufficiently beneficial to support object handling in robotic and intelligent systems.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Deep Learning-Based 3D Instance and Semantic Segmentation: A Review
Authors:
Siddiqui Muhammad Yasir,
Hyunsik Ahn
Abstract:
The process of segmenting point cloud data into several homogeneous areas with points in the same region having the same attributes is known as 3D segmentation. Segmentation is challenging with point cloud data due to substantial redundancy, fluctuating sample density and lack of apparent organization. The research area has a wide range of robotics applications, including intelligent vehicles, aut…
▽ More
The process of segmenting point cloud data into several homogeneous areas with points in the same region having the same attributes is known as 3D segmentation. Segmentation is challenging with point cloud data due to substantial redundancy, fluctuating sample density and lack of apparent organization. The research area has a wide range of robotics applications, including intelligent vehicles, autonomous mapping and navigation. A number of researchers have introduced various methodologies and algorithms. Deep learning has been successfully used to a spectrum of 2D vision domains as a prevailing A.I. methods. However, due to the specific problems of processing point clouds with deep neural networks, deep learning on point clouds is still in its initial stages. This study examines many strategies that have been presented to 3D instance and semantic segmentation and gives a complete assessment of current developments in deep learning-based 3D segmentation. In these approaches benefits, draw backs, and design mechanisms are studied and addressed. This study evaluates the impact of various segmentation algorithms on competitiveness on various publicly accessible datasets, as well as the most often used pipelines, their advantages and limits, insightful findings and intriguing future research directions.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Empirical Evaluation of Integrated Trust Mechanism to Improve Trust in E-commerce Services
Authors:
Siddiqui Muhammad Yasir,
Hyunsik Ahn
Abstract:
There are mostly two approaches to tackle trust management worldwide Strong and crisp and Soft and Social. We analyze the impact of integrated trust mechanism in three different e-commerce services. The trust aspect is a dormant element between potential users and being developed expert or internet systems. We support our integration by preside over an experiment in controlled laboratory environme…
▽ More
There are mostly two approaches to tackle trust management worldwide Strong and crisp and Soft and Social. We analyze the impact of integrated trust mechanism in three different e-commerce services. The trust aspect is a dormant element between potential users and being developed expert or internet systems. We support our integration by preside over an experiment in controlled laboratory environment. The model selected for the experiment is a composite of policy and reputation based trust mechanisms and widely acknowledged in e-commerce industry. The integration between policy and trust mechanism was accomplished through mapping process, weakness of one brought to a close with the strength of other. Furthermore, experiment has been supervised to validate the effectiveness of implementation by segregating both integrated and traditional trust mechanisms in learning system
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Pulmonary Embolism Mortality Prediction Using Multimodal Learning Based on Computed Tomography Angiography and Clinical Data
Authors:
Zhusi Zhong,
Helen Zhang,
Fayez H. Fayad,
Andrew C. Lancaster,
John Sollee,
Shreyas Kulkarni,
Cheng Ting Lin,
Jie Li,
Xinbo Gao,
Scott Collins,
Colin Greineder,
Sun H. Ahn,
Harrison X. Bai,
Zhicheng Jiao,
Michael K. Atalay
Abstract:
Purpose: Pulmonary embolism (PE) is a significant cause of mortality in the United States. The objective of this study is to implement deep learning (DL) models using Computed Tomography Pulmonary Angiography (CTPA), clinical data, and PE Severity Index (PESI) scores to predict PE mortality. Materials and Methods: 918 patients (median age 64 years, range 13-99 years, 52% female) with 3,978 CTPAs w…
▽ More
Purpose: Pulmonary embolism (PE) is a significant cause of mortality in the United States. The objective of this study is to implement deep learning (DL) models using Computed Tomography Pulmonary Angiography (CTPA), clinical data, and PE Severity Index (PESI) scores to predict PE mortality. Materials and Methods: 918 patients (median age 64 years, range 13-99 years, 52% female) with 3,978 CTPAs were identified via retrospective review across three institutions. To predict survival, an AI model was used to extract disease-related imaging features from CTPAs. Imaging features and/or clinical variables were then incorporated into DL models to predict survival outcomes. Four models were developed as follows: (1) using CTPA imaging features only; (2) using clinical variables only; (3) multimodal, integrating both CTPA and clinical variables; and (4) multimodal fused with calculated PESI score. Performance and contribution from each modality were evaluated using concordance index (c-index) and Net Reclassification Improvement, respectively. Performance was compared to PESI predictions using the Wilcoxon signed-rank test. Kaplan-Meier analysis was performed to stratify patients into high- and low-risk groups. Additional factor-risk analysis was conducted to account for right ventricular (RV) dysfunction. Results: For both data sets, the PESI-fused and multimodal models achieved higher c-indices than PESI alone. Following stratification of patients into high- and low-risk groups by multimodal and PESI-fused models, mortality outcomes differed significantly (both p<0.001). A strong correlation was found between high-risk grouping and RV dysfunction. Conclusions: Multiomic DL models incorporating CTPA features, clinical data, and PESI achieved higher c-indices than PESI alone for PE survival prediction.
△ Less
Submitted 5 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
May the Dance be with You: Dance Generation Framework for Non-Humanoids
Authors:
Hyemin Ahn
Abstract:
We hypothesize dance as a motion that forms a visual rhythm from music, where the visual rhythm can be perceived from an optical flow. If an agent can recognize the relationship between visual rhythm and music, it will be able to dance by generating a motion to create a visual rhythm that matches the music. Based on this, we propose a framework for any kind of non-humanoid agents to learn how to d…
▽ More
We hypothesize dance as a motion that forms a visual rhythm from music, where the visual rhythm can be perceived from an optical flow. If an agent can recognize the relationship between visual rhythm and music, it will be able to dance by generating a motion to create a visual rhythm that matches the music. Based on this, we propose a framework for any kind of non-humanoid agents to learn how to dance from human videos. Our framework works in two processes: (1) training a reward model which perceives the relationship between optical flow (visual rhythm) and music from human dance videos, (2) training the non-humanoid dancer based on that reward model, and reinforcement learning. Our reward model consists of two feature encoders for optical flow and music. They are trained based on contrastive learning which makes the higher similarity between concurrent optical flow and music features. With this reward model, the agent learns dancing by getting a higher reward when its action creates an optical flow whose feature has a higher similarity with the given music feature. Experiment results show that generated dance motion can align with the music beat properly, and user study result indicates that our framework is more preferred by humans compared to the baselines. To the best of our knowledge, our work of non-humanoid agents which learn dance from human videos is unprecedented. An example video can be found at https://youtu.be/dOUPvo-O3QY.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions
Authors:
Sang Keun Choe,
Hwijeen Ahn,
Juhan Bae,
Kewen Zhao,
Minsoo Kang,
Youngseog Chung,
Adithya Pratapa,
Willie Neiswanger,
Emma Strubell,
Teruko Mitamura,
Jeff Schneider,
Eduard Hovy,
Roger Grosse,
Eric Xing
Abstract:
Large language models (LLMs) are trained on a vast amount of human-written data, but data providers often remain uncredited. In response to this issue, data valuation (or data attribution), which quantifies the contribution or value of each data to the model output, has been discussed as a potential solution. Nevertheless, applying existing data valuation methods to recent LLMs and their vast trai…
▽ More
Large language models (LLMs) are trained on a vast amount of human-written data, but data providers often remain uncredited. In response to this issue, data valuation (or data attribution), which quantifies the contribution or value of each data to the model output, has been discussed as a potential solution. Nevertheless, applying existing data valuation methods to recent LLMs and their vast training datasets has been largely limited by prohibitive compute and memory costs. In this work, we focus on influence functions, a popular gradient-based data valuation method, and significantly improve its scalability with an efficient gradient projection strategy called LoGra that leverages the gradient structure in backpropagation. We then provide a theoretical motivation of gradient projection approaches to influence functions to promote trust in the data valuation process. Lastly, we lower the barrier to implementing data valuation systems by introducing LogIX, a software package that can transform existing training code into data valuation code with minimal effort. In our data valuation experiments, LoGra achieves competitive accuracy against more expensive baselines while showing up to 6,500x improvement in throughput and 5x reduction in GPU memory usage when applied to Llama3-8B-Instruct and the 1B-token dataset.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Autonomous Cooperative Levels of Multiple-Heterogeneous Unmanned Vehicle Systems
Authors:
Yoo-Bin Bae,
Yeong-Ung Kim,
Jun-Oh Park,
Hyo-Sung Ahn
Abstract:
As multiple and heterogenous unmanned vehicle systems continue to play an increasingly important role in addressing complex missions in the real world, the need for effective cooperation among unmanned vehicles becomes paramount. The concept of autonomous cooperation, wherein unmanned vehicles cooperate without human intervention or human control, offers promising avenues for enhancing the efficie…
▽ More
As multiple and heterogenous unmanned vehicle systems continue to play an increasingly important role in addressing complex missions in the real world, the need for effective cooperation among unmanned vehicles becomes paramount. The concept of autonomous cooperation, wherein unmanned vehicles cooperate without human intervention or human control, offers promising avenues for enhancing the efficiency and adaptability of intelligence of multiple-heterogeneous unmanned vehicle systems. Despite the growing interests in this domain, as far as the authors are concerned, there exists a notable lack of comprehensive literature on defining explicit concept and classifying levels of autonomous cooperation of multiple-heterogeneous unmanned vehicle systems. In this aspect, this article aims to define the explicit concept of autonomous cooperation of multiple-heterogeneous unmanned vehicle systems. Furthermore, we provide a novel criterion to assess the technical maturity of the developed unmanned vehicle systems by classifying the autonomous cooperative levels of multiple-heterogeneous unmanned vehicle systems.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Fixed Node Determination and Analysis in Directed Acyclic Graphs of Structured Networks
Authors:
Nam-jin Park,
Yeong-Ung Kim,
Hyo-Sung Ahn
Abstract:
This paper explores the conditions for determining fixed nodes in structured networks, specifically focusing on directed acyclic graphs (DAGs). We introduce several necessary and sufficient conditions for determining fixed nodes in $p$-layered DAGs. This is accomplished by defining the problem of maximum disjoint stems, based on the observation that all DAGs can be represented as hierarchical stru…
▽ More
This paper explores the conditions for determining fixed nodes in structured networks, specifically focusing on directed acyclic graphs (DAGs). We introduce several necessary and sufficient conditions for determining fixed nodes in $p$-layered DAGs. This is accomplished by defining the problem of maximum disjoint stems, based on the observation that all DAGs can be represented as hierarchical structures with a unique label for each layer. For structured networks, we discuss the importance of fixed nodes by considering their controllability against the variations of network parameters. Moreover, we present an efficient algorithm that simultaneously performs labeling and fixed node search for $p$-layered DAGs with an analysis of its time complexity. The results presented in this paper have implications for the analysis of controllability at the individual node level in structured networks.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Composition Rules for Strong Structural Controllability and Minimum Input Problem in Diffusively-Coupled Networks
Authors:
Nam-Jin Park,
Seong-Ho Kwon,
Yoo-Bin Bae,
Byeong-Yeon Kim,
Kevin L. Moore,
Hyo-Sung Ahn
Abstract:
This paper presents new results and reinterpretation of existing conditions for strong structural controllability in a structured network determined by the zero/non-zero patterns of edges. For diffusively-coupled networks with self-loops, we first establish a necessary and sufficient condition for strong structural controllability, based on the concepts of dedicated and sharing nodes. Subsequently…
▽ More
This paper presents new results and reinterpretation of existing conditions for strong structural controllability in a structured network determined by the zero/non-zero patterns of edges. For diffusively-coupled networks with self-loops, we first establish a necessary and sufficient condition for strong structural controllability, based on the concepts of dedicated and sharing nodes. Subsequently, we define several conditions for strong structural controllability across various graph types by decomposing them into disjoint path graphs. We further extend our findings by introducing a composition rule, facilitating the analysis of strong structural controllability in larger networks. This rule allows us to determine the strong structural controllability of connected graphs called pactus graphs (a generalization of the well-known cactus graph) by consideration of the strong structural controllability of its disjoint component graphs. In this process, we introduce the notion of a component input node, which is a state node that functions identically to an external input node. Based on this concept, we present an algorithm with approximate polynomial complexity to determine the minimum number of external input nodes required to maintain strong structural controllability in a diffusively-coupled network with self-loops.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands
Authors:
Hwayong Nam,
Seungmin Baek,
Minbok Wi,
Michael Jaemin Kim,
Jaehyun Park,
Chihun Song,
Nam Sung Kim,
Jung Ho Ahn
Abstract:
The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses t…
▽ More
The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses this gap by presenting more rigorous findings on the microarchitectures of commodity DRAM chips and their impacts on the characteristics of activate-induced bitflips (AIBs), such as RowHammer and RowPress. The previous studies have also attempted to understand the DRAM microarchitectures and associated behaviors, but we have found some of their results to be misled by inaccurate address mapping and internal data swizzling, or lack of a deeper understanding of the modern DRAM cell structure. For accurate and efficient reverse-engineering, we use three tools: AIBs, retention time test, and RowCopy, which can be cross-validated. With these three tools, we first take a macroscopic view of modern DRAM chips to uncover the size, structure, and operation of their subarrays, memory array tiles (MATs), and rows. Then, we analyze AIB characteristics based on the microscopic view of the DRAM microarchitecture, such as 6F^2 cell layout, through which we rectify misunderstandings regarding AIBs and discover a new data pattern that accelerates AIBs. Lastly, based on our findings at both macroscopic and microscopic levels, we identify previously unknown AIB vulnerabilities and propose a simple yet effective protection solution.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
GNN-based Probabilistic Supply and Inventory Predictions in Supply Chain Networks
Authors:
Hyung-il Ahn,
Young Chol Song,
Santiago Olivar,
Hershel Mehta,
Naveen Tewari
Abstract:
Successful supply chain optimization must mitigate imbalances between supply and demand over time. While accurate demand prediction is essential for supply planning, it alone does not suffice. The key to successful supply planning for optimal and viable execution lies in maximizing predictability for both demand and supply throughout an execution horizon. Therefore, enhancing the accuracy of suppl…
▽ More
Successful supply chain optimization must mitigate imbalances between supply and demand over time. While accurate demand prediction is essential for supply planning, it alone does not suffice. The key to successful supply planning for optimal and viable execution lies in maximizing predictability for both demand and supply throughout an execution horizon. Therefore, enhancing the accuracy of supply predictions is imperative to create an attainable supply plan that matches demand without overstocking or understocking. However, in complex supply chain networks with numerous nodes and edges, accurate supply predictions are challenging due to dynamic node interactions, cascading supply delays, resource availability, production and logistic capabilities. Consequently, supply executions often deviate from their initial plans. To address this, we present the Graph-based Supply Prediction (GSP) probabilistic model. Our attention-based graph neural network (GNN) model predicts supplies, inventory, and imbalances using graph-structured historical data, demand forecasting, and original supply plan inputs. The experiments, conducted using historical data from a global consumer goods company's large-scale supply chain, demonstrate that GSP significantly improves supply and inventory prediction accuracy, potentially offering supply plan corrections to optimize executions.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Generative Probabilistic Planning for Optimizing Supply Chain Networks
Authors:
Hyung-il Ahn,
Santiago Olivar,
Hershel Mehta,
Young Chol Song
Abstract:
Supply chain networks in enterprises are typically composed of complex topological graphs involving various types of nodes and edges, accommodating numerous products with considerable demand and supply variability. However, as supply chain networks expand in size and complexity, traditional supply chain planning methods (e.g., those found in heuristic rule-based and operations research-based syste…
▽ More
Supply chain networks in enterprises are typically composed of complex topological graphs involving various types of nodes and edges, accommodating numerous products with considerable demand and supply variability. However, as supply chain networks expand in size and complexity, traditional supply chain planning methods (e.g., those found in heuristic rule-based and operations research-based systems) tend to become locally optimal or lack computational scalability, resulting in substantial imbalances between supply and demand across nodes in the network. This paper introduces a novel Generative AI technique, which we call Generative Probabilistic Planning (GPP). GPP generates dynamic supply action plans that are globally optimized across all network nodes over the time horizon for changing objectives like maximizing profits or service levels, factoring in time-varying probabilistic demand, lead time, and production conditions. GPP leverages attention-based graph neural networks (GNN), offline deep reinforcement learning (Offline RL), and policy simulations to train generative policy models and create optimal plans through probabilistic simulations, effectively accounting for various uncertainties. Our experiments using historical data from a global consumer goods company with complex supply chain networks demonstrate that GPP accomplishes objective-adaptable, probabilistically resilient, and dynamic planning for supply chain networks, leading to significant improvements in performance and profitability for enterprises. Our work plays a pivotal role in shaping the trajectory of AI adoption within the supply chain domain.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Can only LLMs do Reasoning?: Potential of Small Language Models in Task Planning
Authors:
Gawon Choi,
Hyemin Ahn
Abstract:
In robotics, the use of Large Language Models (LLMs) is becoming prevalent, especially for understanding human commands. In particular, LLMs are utilized as domain-agnostic task planners for high-level human commands. LLMs are capable of Chain-of-Thought (CoT) reasoning, and this allows LLMs to be task planners. However, we need to consider that modern robots still struggle to perform complex acti…
▽ More
In robotics, the use of Large Language Models (LLMs) is becoming prevalent, especially for understanding human commands. In particular, LLMs are utilized as domain-agnostic task planners for high-level human commands. LLMs are capable of Chain-of-Thought (CoT) reasoning, and this allows LLMs to be task planners. However, we need to consider that modern robots still struggle to perform complex actions, and the domains where robots can be deployed are limited in practice. This leads us to pose a question: If small LMs can be trained to reason in chains within a single domain, would even small LMs be good task planners for the robots? To train smaller LMs to reason in chains, we build `COmmand-STeps datasets' (COST) consisting of high-level commands along with corresponding actionable low-level steps, via LLMs. We release not only our datasets but also the prompt templates used to generate them, to allow anyone to build datasets for their domain. We compare GPT3.5 and GPT4 with the finetuned GPT2 for task domains, in tabletop and kitchen environments, and the result shows that GPT2-medium is comparable to GPT3.5 for task planning in a specific domain. Our dataset, code, and more output samples can be found in https://github.com/Gawon-Choi/small-LMs-Task-Planning
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Unified laser stabilization and isolation on a silicon chip
Authors:
Alexander D. White,
Geun Ho Ahn,
Richard Luhtaru,
Joel Guo,
Theodore J. Morin,
Abhi Saxena,
Lin Chang,
Arka Majumdar,
Kasper Van Gasse,
John E. Bowers,
Jelena Vučković
Abstract:
Rapid progress in photonics has led to an explosion of integrated devices that promise to deliver the same performance as table-top technology at the nanoscale; heralding the next generation of optical communications, sensing and metrology, and quantum technologies. However, the challenge of co-integrating the multiple components of high-performance laser systems has left application of these nano…
▽ More
Rapid progress in photonics has led to an explosion of integrated devices that promise to deliver the same performance as table-top technology at the nanoscale; heralding the next generation of optical communications, sensing and metrology, and quantum technologies. However, the challenge of co-integrating the multiple components of high-performance laser systems has left application of these nanoscale devices thwarted by bulky laser sources that are orders of magnitude larger than the devices themselves. Here we show that the two main ingredients for high-performance lasers -- noise reduction and isolation -- currently requiring serial combination of incompatible technologies, can be sourced simultaneously from a single, passive, CMOS-compatible nanophotonic device. To do this, we take advantage of both the long photon lifetime and the nonreciprocal Kerr nonlinearity of a high quality factor silicon nitride ring resonator to self-injection lock a semiconductor laser chip while also providing isolation. Additionally, we identify a previously unappreciated power regime limitation of current on-chip laser architectures which our system overcomes. Using our device, which we term a unified laser stabilizer, we demonstrate an on-chip integrated laser system with built-in isolation and noise reduction that operates with turnkey reliability. This approach departs from efforts to directly miniaturize and integrate traditional laser system components and serves to bridge the gap to fully integrated optical technologies.
△ Less
Submitted 24 May, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Epitaxially defined Luttinger liquids on MoS$_2$ bicrystals
Authors:
Bingchen Deng,
Heonsu Ahn,
Jue Wang,
Gunho Moon,
Ninad Dongre,
Chao Lei,
Giovanni Scuri,
Jiho Sung,
Elise Brutschea,
Kenji Watanabe,
Takashi Taniguchi,
Fan Zhang,
Moon-Ho Jo,
Hongkun Park
Abstract:
A mirror twin boundary (MTB) in a transition metal dichalcogenide (TMD) monolayer can host one-dimensional electron liquid of a topological nature with tunable interactions. Unfortunately, the electrical characterization of such boundaries has been challenging due to the paucity of samples with large enough size and high quality. Here, we report an epitaxial growth of monolayer molybdenum disulfid…
▽ More
A mirror twin boundary (MTB) in a transition metal dichalcogenide (TMD) monolayer can host one-dimensional electron liquid of a topological nature with tunable interactions. Unfortunately, the electrical characterization of such boundaries has been challenging due to the paucity of samples with large enough size and high quality. Here, we report an epitaxial growth of monolayer molybdenum disulfide (MoS$_2$) bicrystals with well-isolated MTBs that are tens of micrometers long. Conductance measurements of these MTBs exhibit power-law behaviors as a function of temperature and bias voltage up to room temperature, consistent with electrons tunneling into a Luttinger liquid. Transport measurements of two distinct types of MTBs reveal the critical role of the atomic-scale defects. This study demonstrates that MTBs in TMD monolayers provide an exciting new platform for studying the interplay between electronic interactions and topology.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning
Authors:
Hongjoon Ahn,
Jinu Hyeon,
Youngmin Oh,
Bosun Hwang,
Taesup Moon
Abstract:
We argue that the negative transfer problem occurring when the new task to learn arrives is an important problem that needs not be overlooked when developing effective Continual Reinforcement Learning (CRL) algorithms. Through comprehensive experimental validation, we demonstrate that such issue frequently exists in CRL and cannot be effectively addressed by several recent work on mitigating plast…
▽ More
We argue that the negative transfer problem occurring when the new task to learn arrives is an important problem that needs not be overlooked when developing effective Continual Reinforcement Learning (CRL) algorithms. Through comprehensive experimental validation, we demonstrate that such issue frequently exists in CRL and cannot be effectively addressed by several recent work on mitigating plasticity loss of RL agents. To that end, we develop Reset & Distill (R&D), a simple yet highly effective method, to overcome the negative transfer problem in CRL. R&D combines a strategy of resetting the agent's online actor and critic networks to learn a new task and an offline learning step for distilling the knowledge from the online actor and previous expert's action probabilities. We carried out extensive experiments on long sequence of Meta World tasks and show that our method consistently outperforms recent baselines, achieving significantly higher success rates across a range of tasks. Our findings highlight the importance of considering negative transfer in CRL and emphasize the need for robust strategies like R&D to mitigate its detrimental effects.
△ Less
Submitted 14 August, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Controllable Subspaces in Structured Networks of Hierarchical Directed Acyclic Graphs: Controllability of Individual Nodes
Authors:
Nam-Jin Park,
Yeong-Ung Kim,
Koog-Hwan Oh,
Hyo-Sung Ahn
Abstract:
Within the context of structured networks, this paper introduces the concept of the Fixed Strongly Structurally Controllable Subspace (FSSCS), enabling a comprehensive characterization of controllable subspaces. From a graph-theoretical viewpoint, the paper defines Fixed Strongly Structurally Controllable (FSSC) nodes based on the FSSCS concept and establishes the necessary and sufficient conditio…
▽ More
Within the context of structured networks, this paper introduces the concept of the Fixed Strongly Structurally Controllable Subspace (FSSCS), enabling a comprehensive characterization of controllable subspaces. From a graph-theoretical viewpoint, the paper defines Fixed Strongly Structurally Controllable (FSSC) nodes based on the FSSCS concept and establishes the necessary and sufficient conditions for their identification. This paper proposes a method for determining the exact dimension of the Strongly Structurally Controllable Subspace (SSCS) in hierarchical directed acyclic graphs, employing a blend of graph-theoretical approaches and controllability matrix analyses. This approach not only facilitates the identification of FSSC nodes but also enhances our understanding of the robustness of node controllability against variations in network parameters within structured networks, marking a significant advancement in the field of strong structural controllability of individual nodes.
△ Less
Submitted 21 August, 2024; v1 submitted 2 March, 2024;
originally announced March 2024.
-
WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts
Authors:
Yong Hyun Ahn,
Hyeon Bae Kim,
Seong Tae Kim
Abstract:
Recent advancements in neural networks have showcased their remarkable capabilities across various domains. Despite these successes, the "black box" problem still remains. Addressing this, we propose a novel framework, WWW, that offers the 'what', 'where', and 'why' of the neural network decisions in human-understandable terms. Specifically, WWW utilizes adaptive selection for concept discovery, e…
▽ More
Recent advancements in neural networks have showcased their remarkable capabilities across various domains. Despite these successes, the "black box" problem still remains. Addressing this, we propose a novel framework, WWW, that offers the 'what', 'where', and 'why' of the neural network decisions in human-understandable terms. Specifically, WWW utilizes adaptive selection for concept discovery, employing adaptive cosine similarity and thresholding techniques to effectively explain 'what'. To address the 'where' and 'why', we proposed a novel combination of neuron activation maps (NAMs) with Shapley values, generating localized concept maps and heatmaps for individual inputs. Furthermore, WWW introduces a method for predicting uncertainty, leveraging heatmap similarities to estimate 'how' reliable the prediction is. Experimental evaluations of WWW demonstrate superior performance in both quantitative and qualitative metrics, outperforming existing methods in interpretability. WWW provides a unified solution for explaining 'what', 'where', and 'why', introducing a method for localized explanations from global interpretations and offering a plug-and-play solution adaptable to various architectures.
△ Less
Submitted 11 April, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
Amplifying Training Data Exposure through Fine-Tuning with Pseudo-Labeled Memberships
Authors:
Myung Gyo Oh,
Hong Eun Ahn,
Leo Hyun Park,
Taekyoung Kwon
Abstract:
Neural language models (LMs) are vulnerable to training data extraction attacks due to data memorization. This paper introduces a novel attack scenario wherein an attacker adversarially fine-tunes pre-trained LMs to amplify the exposure of the original training data. This strategy differs from prior studies by aiming to intensify the LM's retention of its pre-training dataset. To achieve this, the…
▽ More
Neural language models (LMs) are vulnerable to training data extraction attacks due to data memorization. This paper introduces a novel attack scenario wherein an attacker adversarially fine-tunes pre-trained LMs to amplify the exposure of the original training data. This strategy differs from prior studies by aiming to intensify the LM's retention of its pre-training dataset. To achieve this, the attacker needs to collect generated texts that are closely aligned with the pre-training data. However, without knowledge of the actual dataset, quantifying the amount of pre-training data within generated texts is challenging. To address this, we propose the use of pseudo-labels for these generated texts, leveraging membership approximations indicated by machine-generated probabilities from the target LM. We subsequently fine-tune the LM to favor generations with higher likelihoods of originating from the pre-training data, based on their membership probabilities. Our empirical findings indicate a remarkable outcome: LMs with over 1B parameters exhibit a four to eight-fold increase in training data exposure. We discuss potential mitigations and suggest future research directions.
△ Less
Submitted 31 August, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Offline Imitation Learning by Controlling the Effective Planning Horizon
Authors:
Hee-Jun Ahn,
Seong-Woong Shim,
Byung-Jun Lee
Abstract:
In offline imitation learning (IL), we generally assume only a handful of expert trajectories and a supplementary offline dataset from suboptimal behaviors to learn the expert policy. While it is now common to minimize the divergence between state-action visitation distributions so that the agent also considers the future consequences of an action, a sampling error in an offline dataset may lead t…
▽ More
In offline imitation learning (IL), we generally assume only a handful of expert trajectories and a supplementary offline dataset from suboptimal behaviors to learn the expert policy. While it is now common to minimize the divergence between state-action visitation distributions so that the agent also considers the future consequences of an action, a sampling error in an offline dataset may lead to erroneous estimates of state-action visitations in the offline case. In this paper, we investigate the effect of controlling the effective planning horizon (i.e., reducing the discount factor) as opposed to imposing an explicit regularizer, as previously studied. Unfortunately, it turns out that the existing algorithms suffer from magnified approximation errors when the effective planning horizon is shortened, which results in a significant degradation in performance. We analyze the main cause of the problem and provide the right remedies to correct the algorithm. We show that the corrected algorithm improves on popular imitation learning benchmarks by controlling the effective planning horizon rather than an explicit regularization.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Maunakea Spectroscopic Explorer exposure time calculator for end-to-end simulator: to optimizing spectrograph design and observing simulation
Authors:
Tae-Geun Ji,
Jennifer Sobeck,
Changgon Kim,
Hojae Ahn,
Mingyeong Yang,
Taeeun Kim,
Sungwook E. Hong,
Kei Szeto,
Jennifer L. Marshall,
Christian Surace,
Soojong Pak
Abstract:
The Maunakea Spectroscopic Explorer (MSE) project will provide multi-object spectroscopy in the optical and near-infrared bands using an 11.25-m aperture telescope, repurposing the original Canada-France-Hawaii Telescope (CFHT) site. MSE will observe 4,332 objects per single exposure with a field of view of 1.5 square degrees, utilizing two spectrographs with low-moderate (R$\sim$3,000, 6,000) and…
▽ More
The Maunakea Spectroscopic Explorer (MSE) project will provide multi-object spectroscopy in the optical and near-infrared bands using an 11.25-m aperture telescope, repurposing the original Canada-France-Hawaii Telescope (CFHT) site. MSE will observe 4,332 objects per single exposure with a field of view of 1.5 square degrees, utilizing two spectrographs with low-moderate (R$\sim$3,000, 6,000) and high (R$\approx$30,000) spectral resolution. In general, an exposure time calculator (ETC) is used to estimate the performance of an observing system by calculating a signal-to-noise ratio (S/N) and exposure time. We present the design of the MSE exposure time calculator (ETC), which has four calculation modes (S/N, exposure time, S/N trend with wavelength, and S/N trend with magnitude) and incorporates the MSE system requirements as specified in the Conceptual Design. The MSE ETC currently allows for user-defined inputs of target AB magnitude, water vapor, airmass, and sky brightness AB magnitude (additional user inputs can be provided depending on computational mode). The ETC is built using Python 3.7 and features a graphical user interface that allows for cross-platform use. The development process of the ETC software follows an Agile methodology and utilizes the Unified Modeling Language (UML) diagrams to visualize the software architecture. We also describe the testing and verification of the MSE ETC.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Safe Chance-constrained Model Predictive Control under Gaussian Mixture Model Uncertainty
Authors:
Kai Ren,
Colin Chen,
Hyeontae Sung,
Heejin Ahn,
Ian Mitchell,
Maryam Kamgarpour
Abstract:
We present a chance-constrained model predictive control (MPC) framework under Gaussian mixture model (GMM) uncertainty. Specifically, we consider the uncertainty that arises from predicting future behaviors of moving obstacles, which may exhibit multiple modes (for example, turning left or right). To address the multi-modal uncertainty distribution, we propose three MPC formulations: nominal chan…
▽ More
We present a chance-constrained model predictive control (MPC) framework under Gaussian mixture model (GMM) uncertainty. Specifically, we consider the uncertainty that arises from predicting future behaviors of moving obstacles, which may exhibit multiple modes (for example, turning left or right). To address the multi-modal uncertainty distribution, we propose three MPC formulations: nominal chance-constrained planning, robust chance-constrained planning, and contingency planning. We prove that closed-loop trajectories generated by the three planners are safe. The approaches differ in conservativeness and performance guarantee. In particular, the robust chance-constrained planner is recursively feasible under certain assumptions on the propagation of prediction uncertainty. On the other hand, the contingency planner generates a less conservative closed-loop trajectory than the nominal planner. We validate our planners using state-of-the-art trajectory prediction algorithms in autonomous driving simulators.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Distributed solution methods for MPC based energy management method of interconnected microgrids: Dual ascent vs ADMM
Authors:
Viet Hoang Pham,
Hyo-Sung Ahn
Abstract:
This paper considers an optimal energy management problem for a network of interconnected microgrids. A model predictive control (MPC) approach is used to avoid capacity constraint violation and to cope with uncertainties of forecasted power demands. By employing a dual ascent method and a proximal alternative direction multiplier method (ADMM), respectively, two distributed methods are designed t…
▽ More
This paper considers an optimal energy management problem for a network of interconnected microgrids. A model predictive control (MPC) approach is used to avoid capacity constraint violation and to cope with uncertainties of forecasted power demands. By employing a dual ascent method and a proximal alternative direction multiplier method (ADMM), respectively, two distributed methods are designed to allow every agent using only local information to determine its own optimal control decisions. The effectiveness of the proposed method is verified via numerical simulations.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
NeuJeans: Private Neural Network Inference with Joint Optimization of Convolution and Bootstrapping
Authors:
Jae Hyung Ju,
Jaiyoung Park,
Jongmin Kim,
Minsik Kang,
Donghwan Kim,
Jung Hee Cheon,
Jung Ho Ahn
Abstract:
Fully homomorphic encryption (FHE) is a promising cryptographic primitive for realizing private neural network inference (PI) services by allowing a client to fully offload the inference task to a cloud server while keeping the client data oblivious to the server. This work proposes NeuJeans, an FHE-based solution for the PI of deep convolutional neural networks (CNNs). NeuJeans tackles the critic…
▽ More
Fully homomorphic encryption (FHE) is a promising cryptographic primitive for realizing private neural network inference (PI) services by allowing a client to fully offload the inference task to a cloud server while keeping the client data oblivious to the server. This work proposes NeuJeans, an FHE-based solution for the PI of deep convolutional neural networks (CNNs). NeuJeans tackles the critical problem of the enormous computational cost for the FHE evaluation of CNNs. We introduce a novel encoding method called Coefficients-in-Slot (CinS) encoding, which enables multiple convolutions in one HE multiplication without costly slot permutations. We further observe that CinS encoding is obtained by conducting the first several steps of the Discrete Fourier Transform (DFT) on a ciphertext in conventional Slot encoding. This property enables us to save the conversion between CinS and Slot encodings as bootstrapping a ciphertext starts with DFT. Exploiting this, we devise optimized execution flows for various two-dimensional convolution (conv2d) operations and apply them to end-to-end CNN implementations. NeuJeans accelerates the performance of conv2d-activation sequences by up to 5.68 times compared to state-of-the-art FHE-based PI work and performs the PI of a CNN at the scale of ImageNet within a mere few seconds.
△ Less
Submitted 19 September, 2024; v1 submitted 7 December, 2023;
originally announced December 2023.
-
Seamless monolithic three-dimensional integration of single-crystalline films by growth
Authors:
Ki Seok Kim,
Seunghwan Seo,
Junyoung Kwon,
Doyoon Lee,
Changhyun Kim,
Jung-El Ryu,
Jekyung Kim,
Min-Kyu Song,
Jun Min Suh,
Hang-Gyo Jung,
Youhwan Jo,
Hogeun Ahn,
Sangho Lee,
Kyeongjae Cho,
Jongwook Jeon,
Minsu Seol,
Jin-Hong Park,
Sang Won Kim,
Jeehwan Kim
Abstract:
The demand for the three-dimensional (3D) integration of electronic components is on a steady rise. The through-silicon-via (TSV) technique emerges as the only viable method for integrating single-crystalline device components in a 3D format, despite encountering significant processing challenges. While monolithic 3D (M3D) integration schemes show promise, the seamless connection of single-crystal…
▽ More
The demand for the three-dimensional (3D) integration of electronic components is on a steady rise. The through-silicon-via (TSV) technique emerges as the only viable method for integrating single-crystalline device components in a 3D format, despite encountering significant processing challenges. While monolithic 3D (M3D) integration schemes show promise, the seamless connection of single-crystalline semiconductors without intervening wafers has yet to be demonstrated. This challenge arises from the inherent difficulty of growing single crystals on amorphous or polycrystalline surfaces post the back-end-of-the-line process at low temperatures to preserve the underlying circuitry. Consequently, a practical growth-based solution for M3D of single crystals remains elusive. Here, we present a method for growing single-crystalline channel materials, specifically composed of transition metal dichalcogenides, on amorphous and polycrystalline surfaces at temperatures lower than 400 °C. Building on this developed technique, we demonstrate the seamless monolithic integration of vertical single-crystalline logic transistor arrays. This accomplishment leads to the development of unprecedented vertical CMOS arrays, thereby constructing vertical inverters. Ultimately, this achievement sets the stage to pave the way for M3D integration of various electronic and optoelectronic hardware in the form of single crystals.
△ Less
Submitted 6 December, 2023; v1 submitted 5 December, 2023;
originally announced December 2023.
-
Titanium:Sapphire-on-insulator for broadband tunable lasers and high-power amplifiers on chip
Authors:
Joshua Yang,
Kasper Van Gasse,
Daniil M. Lukin,
Melissa A. Guidry,
Geun Ho Ahn,
Alexander D. White,
Jelena Vučković
Abstract:
Titanium:Sapphire (Ti:Sa) lasers have been essential for advancing fundamental research and technological applications. Ti:Sa lasers are unmatched in bandwidth and tuning range, yet their use is severely restricted due to their large size, cost, and need for high optical pump powers. Here, we demonstrate a monocrystalline Ti:Sa-on-insulator (Ti:SaOI) photonics platform which enables dramatic minia…
▽ More
Titanium:Sapphire (Ti:Sa) lasers have been essential for advancing fundamental research and technological applications. Ti:Sa lasers are unmatched in bandwidth and tuning range, yet their use is severely restricted due to their large size, cost, and need for high optical pump powers. Here, we demonstrate a monocrystalline Ti:Sa-on-insulator (Ti:SaOI) photonics platform which enables dramatic miniaturization, cost-reduction, and scalability of Ti:Sa technology. First, through fabrication of low-loss whispering gallery mode resonators, we realize a Ti:Sa laser operating with an ultra-low lasing threshold of 290 $μ$W. Then, through orders-of-magnitude improvement in mode confinement in Ti:SaOI waveguides, we realize the first integrated solid-state (i.e., non-semiconductor) optical amplifier operating below 1 $μ$m, with an ultra-wide bandwidth of 700 - 950 nm and peak gain of 64 dB/cm. We demonstrate unprecedented 17 dB distortion-free amplification of picosecond pulses to up to 2.3 nJ pulse energy, corresponding to a peak power of 1.0 kW. Finally, we demonstrate the first tunable integrated Ti:Sa laser, featuring narrow linewidths and a 24.7 THz tuning range, which, for the first time, can be pumped with low-cost, miniature, off-the-shelf green laser diodes. This opens doors to new modalities of Ti:Sa lasers (now occupying a footprint less than 0.15 mm$^2$), such as massively-scalable Ti:Sa laser array systems for a variety of applications. As a proof-of-concept demonstration, we employ a Ti:SaOI laser array as the sole optical control for a cavity quantum electrodynamics experiment with artificial atoms in silicon carbide. This work is a key step towards the democratization of Ti:Sa technology through a three orders-of-magnitude reduction in cost and footprint, as well as the introduction of solid-state broadband amplification of sub-micron wavelength light.
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
DeepCompass: AI-driven Location-Orientation Synchronization for Navigating Platforms
Authors:
Jihun Lee,
SP Choi,
Bumsoo Kang,
Hyekyoung Seok,
Hyoungseok Ahn,
Sanghee Jung
Abstract:
In current navigating platforms, the user's orientation is typically estimated based on the difference between two consecutive locations. In other words, the orientation cannot be identified until the second location is taken. This asynchronous location-orientation identification often leads to our real-life question: Why does my navigator tell the wrong direction of my car at the beginning? We pr…
▽ More
In current navigating platforms, the user's orientation is typically estimated based on the difference between two consecutive locations. In other words, the orientation cannot be identified until the second location is taken. This asynchronous location-orientation identification often leads to our real-life question: Why does my navigator tell the wrong direction of my car at the beginning? We propose DeepCompass to identify the user's orientation by bridging the gap between the street-view and the user-view images. First, we explore suitable model architectures and design corresponding input configuration. Second, we demonstrate artificial transformation techniques (e.g., style transfer and road segmentation) to minimize the disparity between the street-view and the user's real-time experience. We evaluate DeepCompass with extensive evaluation in various driving conditions. DeepCompass does not require additional hardware and is also not susceptible to external interference, in contrast to magnetometer-based navigator. This highlights the potential of DeepCompass as an add-on to existing sensor-based orientation detection methods.
△ Less
Submitted 15 September, 2023;
originally announced November 2023.
-
Restoring the Broken Covenant Between Compilers and Deep Learning Accelerators
Authors:
Sean Kinzer,
Soroush Ghodrati,
Rohan Mahapatra,
Byung Hoon Ahn,
Edwin Mascarenhas,
Xiaolong Li,
Janarbek Matai,
Liang Zhang,
Hadi Esmaeilzadeh
Abstract:
Deep learning accelerators address the computational demands of Deep Neural Networks (DNNs), departing from the traditional Von Neumann execution model. They leverage specialized hardware to align with the application domain's structure. Compilers for these accelerators face distinct challenges compared to those for general-purpose processors. These challenges include exposing and managing more mi…
▽ More
Deep learning accelerators address the computational demands of Deep Neural Networks (DNNs), departing from the traditional Von Neumann execution model. They leverage specialized hardware to align with the application domain's structure. Compilers for these accelerators face distinct challenges compared to those for general-purpose processors. These challenges include exposing and managing more micro-architectural features, handling software-managed scratch pads for on-chip storage, explicitly managing data movement, and matching DNN layers with varying hardware capabilities. These complexities necessitate a new approach to compiler design, as traditional compilers mainly focused on generating fine-grained instruction sequences while abstracting micro-architecture details. This paper introduces the Architecture Covenant Graph (ACG), an abstract representation of an architectural structure's components and their programmable capabilities. By enabling the compiler to work with the ACG, it allows for adaptable compilation workflows when making changes to accelerator design, reducing the need for a complete compiler redevelopment. Codelets, which express DNN operation functionality and evolve into execution mappings on the ACG, are key to this process. The Covenant compiler efficiently targets diverse deep learning accelerators, achieving 93.8% performance compared to state-of-the-art, hand-tuned DNN layer implementations when compiling 14 DNN layers from various models on two different architectures.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Toward Practical Privacy-Preserving Convolutional Neural Networks Exploiting Fully Homomorphic Encryption
Authors:
Jaiyoung Park,
Donghwan Kim,
Jongmin Kim,
Sangpyo Kim,
Wonkyung Jung,
Jung Hee Cheon,
Jung Ho Ahn
Abstract:
Incorporating fully homomorphic encryption (FHE) into the inference process of a convolutional neural network (CNN) draws enormous attention as a viable approach for achieving private inference (PI). FHE allows delegating the entire computation process to the server while ensuring the confidentiality of sensitive client-side data. However, practical FHE implementation of a CNN faces significant hu…
▽ More
Incorporating fully homomorphic encryption (FHE) into the inference process of a convolutional neural network (CNN) draws enormous attention as a viable approach for achieving private inference (PI). FHE allows delegating the entire computation process to the server while ensuring the confidentiality of sensitive client-side data. However, practical FHE implementation of a CNN faces significant hurdles, primarily due to FHE's substantial computational and memory overhead. To address these challenges, we propose a set of optimizations, which includes GPU/ASIC acceleration, an efficient activation function, and an optimized packing scheme. We evaluate our method using the ResNet models on the CIFAR-10 and ImageNet datasets, achieving several orders of magnitude improvement compared to prior work and reducing the latency of the encrypted CNN inference to 1.4 seconds on an NVIDIA A100 GPU. We also show that the latency drops to a mere 0.03 seconds with a custom hardware design.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Coexistence of Anomalous Hall Effect and Weak Net Magnetization in Collinear Antiferromagnet MnTe
Authors:
K. P. Kluczyk,
K. Gas,
M. J. Grzybowski,
P. Skupiński,
M. A. Borysiewicz,
T. Fąs,
J. Suffczyński,
J. Z. Domagala,
K. Grasza,
A. Mycielski,
M. Baj,
K. H. Ahn,
K. Výborný,
M. Sawicki,
M. Gryglas-Borysiewicz
Abstract:
Anomalous Hall effect (AHE) plays important role in the rapidly developing field of antiferromagnetic spintronics. It has been recently discussed that it can be a feature of not only uncompensated magnetic systems but also in altermagnetic materials. Hexagonal MnTe belongs to this appealing group of compounds exhibiting AHE and is commonly perceived as magnetically compensated. Here, we demonstrat…
▽ More
Anomalous Hall effect (AHE) plays important role in the rapidly developing field of antiferromagnetic spintronics. It has been recently discussed that it can be a feature of not only uncompensated magnetic systems but also in altermagnetic materials. Hexagonal MnTe belongs to this appealing group of compounds exhibiting AHE and is commonly perceived as magnetically compensated. Here, we demonstrate that bulk form of MnTe exhibits small but detectable magnetic moment correlating with hysteretic behaviour of the AHE. We formulate a phenomenological model which explains how this feature allows to create a disbalance between states with opposite Néel vector and prevent the AHE signal from averaging out to zero. Moreover, we show how the dependence of AHE on the Néel vector arises on microscopical level and highlight the differences in Berry curvature between magnetically compensated and uncompensated systems.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Making Scalable Meta Learning Practical
Authors:
Sang Keun Choe,
Sanket Vaibhav Mehta,
Hwijeen Ahn,
Willie Neiswanger,
Pengtao Xie,
Emma Strubell,
Eric Xing
Abstract:
Despite its flexibility to learn diverse inductive biases in machine learning programs, meta learning (i.e., learning to learn) has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support. In this work, we focus on making scalable meta learning practical by introducing SAMA, which co…
▽ More
Despite its flexibility to learn diverse inductive biases in machine learning programs, meta learning (i.e., learning to learn) has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support. In this work, we focus on making scalable meta learning practical by introducing SAMA, which combines advances in both implicit differentiation algorithms and systems. Specifically, SAMA is designed to flexibly support a broad range of adaptive optimizers in the base level of meta learning programs, while reducing computational burden by avoiding explicit computation of second-order gradient information, and exploiting efficient distributed training techniques implemented for first-order gradients. Evaluated on multiple large-scale meta learning benchmarks, SAMA showcases up to 1.7/4.8x increase in throughput and 2.0/3.8x decrease in memory consumption respectively on single-/multi-GPU setups compared to other baseline meta learning algorithms. Furthermore, we show that SAMA-based data optimization leads to consistent improvements in text classification accuracy with BERT and RoBERTa large language models, and achieves state-of-the-art results in both small- and large-scale data pruning on image classification tasks, demonstrating the practical applicability of scalable meta learning across language and vision domains.
△ Less
Submitted 23 October, 2023; v1 submitted 9 October, 2023;
originally announced October 2023.
-
A Sign Language Recognition System with Pepper, Lightweight-Transformer, and LLM
Authors:
JongYoon Lim,
Inkyu Sa,
Bruce MacDonald,
Ho Seok Ahn
Abstract:
This research explores using lightweight deep neural network architectures to enable the humanoid robot Pepper to understand American Sign Language (ASL) and facilitate non-verbal human-robot interaction. First, we introduce a lightweight and efficient model for ASL understanding optimized for embedded systems, ensuring rapid sign recognition while conserving computational resources. Building upon…
▽ More
This research explores using lightweight deep neural network architectures to enable the humanoid robot Pepper to understand American Sign Language (ASL) and facilitate non-verbal human-robot interaction. First, we introduce a lightweight and efficient model for ASL understanding optimized for embedded systems, ensuring rapid sign recognition while conserving computational resources. Building upon this, we employ large language models (LLMs) for intelligent robot interactions. Through intricate prompt engineering, we tailor interactions to allow the Pepper Robot to generate natural Co-Speech Gesture responses, laying the foundation for more organic and intuitive humanoid-robot dialogues. Finally, we present an integrated software pipeline, embodying advancements in a socially aware AI interaction model. Leveraging the Pepper Robot's capabilities, we demonstrate the practicality and effectiveness of our approach in real-world scenarios. The results highlight a profound potential for enhancing human-robot interaction through non-verbal interactions, bridging communication gaps, and making technology more accessible and understandable.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
An inverse-designed nanophotonic interface for excitons in atomically thin materials
Authors:
Ryan J. Gelly,
Alexander D. White,
Giovanni Scuri,
Xing Liao,
Geun Ho Ahn,
Bingchen Deng,
Kenji Watanabe,
Takashi Taniguchi,
Jelena Vučković,
Hongkun Park
Abstract:
Efficient nanophotonic devices are essential for applications in quantum networking, optical information processing, sensing, and nonlinear optics. Extensive research efforts have focused on integrating two-dimensional (2D) materials into photonic structures, but this integration is often limited by size and material quality. Here, we use hexagonal boron nitride (hBN), a benchmark choice for encap…
▽ More
Efficient nanophotonic devices are essential for applications in quantum networking, optical information processing, sensing, and nonlinear optics. Extensive research efforts have focused on integrating two-dimensional (2D) materials into photonic structures, but this integration is often limited by size and material quality. Here, we use hexagonal boron nitride (hBN), a benchmark choice for encapsulating atomically thin materials, as a waveguiding layer while simultaneously improving the optical quality of the embedded films. When combined with photonic inverse design, it becomes a complete nanophotonic platform to interface with optically active 2D materials. Grating couplers and low-loss waveguides provide optical interfacing and routing, tunable cavities provide a large exciton-photon coupling to transition metal dichalcogenides (TMD) monolayers through Purcell enhancement, and metasurfaces enable the efficient detection of TMD dark excitons. This work paves the way for advanced 2D-material nanophotonic structures for classical and quantum nonlinear optics.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Nano-Imaging of Chiro-Optical Force
Authors:
Junsuke Yamanishi,
Hyo-Yong Ahn,
Hiromi Okamoto
Abstract:
Nanoscopic observation of chiro-optical phenomena is essential in wide scientific areas but has measurement difficulties; hence, its physics are still unknown. Currently, in most cases, chiro-optical phenomena have been investigated by polarized light handling far-field measurements or via predictions by theoretical simulations. To obtain a full understanding of the physics of chiro-optical system…
▽ More
Nanoscopic observation of chiro-optical phenomena is essential in wide scientific areas but has measurement difficulties; hence, its physics are still unknown. Currently, in most cases, chiro-optical phenomena have been investigated by polarized light handling far-field measurements or via predictions by theoretical simulations. To obtain a full understanding of the physics of chiro-optical systems and derive the full potentials, it is essential to perform in situ observation of the chiro-optical effect from the individual parts because the macroscopic chiro-optical effect cannot be translated directly into microscopic effects. In the present study, we observed the chiro-optical responses at the nanoscale level by detecting the chiro-optical forces, which were generated by illumination of the material/probe system with circularly polarized light. The induced optical force was dependent on the handedness of the incident circularly polarized light and well correlated to the electromagnetically simulated differential intensity of the longitudinal electric field. Our results facilitate the clarification of chiro-optical phenomena at the nanoscale level and could innovate chiro-optical nanotechnologies.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Seeing the Fruit for the Leaves: Robotically Mapping Apple Fruitlets in a Commercial Orchard
Authors:
Ans Qureshi,
David Smith,
Trevor Gee,
Mahla Nejati,
Jalil Shahabi,
JongYoon Lim,
Ho Seok Ahn,
Ben McGuinness,
Catherine Downes,
Rahul Jangali,
Kale Black,
Hin Lim,
Mike Duke,
Bruce MacDonald,
Henry Williams
Abstract:
Aotearoa New Zealand has a strong and growing apple industry but struggles to access workers to complete skilled, seasonal tasks such as thinning. To ensure effective thinning and make informed decisions on a per-tree basis, it is crucial to accurately measure the crop load of individual apple trees. However, this task poses challenges due to the dense foliage that hides the fruitlets within the t…
▽ More
Aotearoa New Zealand has a strong and growing apple industry but struggles to access workers to complete skilled, seasonal tasks such as thinning. To ensure effective thinning and make informed decisions on a per-tree basis, it is crucial to accurately measure the crop load of individual apple trees. However, this task poses challenges due to the dense foliage that hides the fruitlets within the tree structure. In this paper, we introduce the vision system of an automated apple fruitlet thinning robot, developed to tackle the labor shortage issue. This paper presents the initial design, implementation,and evaluation specifics of the system. The platform straddles the 3.4 m tall 2D apple canopy structures to create an accurate map of the fruitlets on each tree. We show that this platform can measure the fruitlet load on an apple tree by scanning through both sides of the branch. The requirement of an overarching platform was justified since two-sided scans had a higher counting accuracy of 81.17 % than one-sided scans at 73.7 %. The system was also demonstrated to produce size estimates within 5.9% RMSE of their true size.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.