-
SLIC: Secure Learned Image Codec through Compressed Domain Watermarking to Defend Image Manipulation
Authors:
Chen-Hsiu Huang,
Ja-Ling Wu
Abstract:
The digital image manipulation and advancements in Generative AI, such as Deepfake, has raised significant concerns regarding the authenticity of images shared on social media. Traditional image forensic techniques, while helpful, are often passive and insufficient against sophisticated tampering methods. This paper introduces the Secure Learned Image Codec (SLIC), a novel active approach to ensur…
▽ More
The digital image manipulation and advancements in Generative AI, such as Deepfake, has raised significant concerns regarding the authenticity of images shared on social media. Traditional image forensic techniques, while helpful, are often passive and insufficient against sophisticated tampering methods. This paper introduces the Secure Learned Image Codec (SLIC), a novel active approach to ensuring image authenticity through watermark embedding in the compressed domain. SLIC leverages neural network-based compression to embed watermarks as adversarial perturbations in the latent space, creating images that degrade in quality upon re-compression if tampered with. This degradation acts as a defense mechanism against unauthorized modifications. Our method involves fine-tuning a neural encoder/decoder to balance watermark invisibility with robustness, ensuring minimal quality loss for non-watermarked images. Experimental results demonstrate SLIC's effectiveness in generating visible artifacts in tampered images, thereby preventing their redistribution. This work represents a significant step toward developing secure image codecs that can be widely adopted to safeguard digital image integrity.
△ Less
Submitted 19 October, 2024;
originally announced October 2024.
-
Constraining a relativistic mean field model using neutron star mass-radius measurements II: Hyperonic models
Authors:
Chun Huang,
Laura Tolos,
Constança Providência,
Anna Watts
Abstract:
We investigate whether measurements of the neutron star mass and radius or the tidal deformability can provide information about the presence of hyperons inside a neutron star. This is achieved by considering two inference models, with and without hyperons, based on a field-theoretical approach. While current observations do not distinguish between the two scenarios, we have shown that data simula…
▽ More
We investigate whether measurements of the neutron star mass and radius or the tidal deformability can provide information about the presence of hyperons inside a neutron star. This is achieved by considering two inference models, with and without hyperons, based on a field-theoretical approach. While current observations do not distinguish between the two scenarios, we have shown that data simulating expected observations from future large area X-ray timing telescopes could provide some information through Bayes factors. Inference using simulated data generated from an EOS containing hyperons decisively favours the hyperonic model over the nucleonic model. However, a 2\% uncertainty in the mass and radius determination may not be sufficient to constrain the parameters of the model when only six neutron star mass-radius measurements are considered.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
HD 28185 Revisited: An Outer Planet, Instead of a Brown Dwarf, On a Saturn-like Orbit
Authors:
Alexander Venner,
Qier An,
Chelsea X. Huang,
Timothy D. Brandt,
Robert A. Wittenmyer,
Andrew Vanderburg
Abstract:
As exoplanet surveys reach ever-higher sensitivities and durations, planets analogous to the solar system giant planets are increasingly within reach. HD 28185 is a Sun-like star known to host a $m\sin i=6 M_J$ planet on an Earth-like orbit; more recently, a brown dwarf with a more distant orbit has been claimed. In this work we present a comprehensive reanalysis of the HD 28185 system, based on 2…
▽ More
As exoplanet surveys reach ever-higher sensitivities and durations, planets analogous to the solar system giant planets are increasingly within reach. HD 28185 is a Sun-like star known to host a $m\sin i=6 M_J$ planet on an Earth-like orbit; more recently, a brown dwarf with a more distant orbit has been claimed. In this work we present a comprehensive reanalysis of the HD 28185 system, based on 22 years of radial velocity observations and precision Hipparcos-Gaia astrometry. We confirm the previous characterisation of HD 28185 b as a temperate giant planet, with its $385.92^{+0.06}_{-0.07}$ day orbital period giving it an Earth-like incident flux. In contrast, we substantially revise the parameters of HD 28185 c; with a new mass of $m=6.0\pm0.6 M_J$ we reclassify this companion as a super-jovian planet. HD 28185 c has an orbital period of $24.9^{+1.3}_{-1.1}$ years, a semi-major axis of $8.50^{+0.29}_{-0.26}$ AU, and a modest eccentricity of $0.15\pm0.04$, resulting in one of the most Saturn-like orbits of any known exoplanet. HD 28185 c lies at the current intersection of detection limits for RVs and direct imaging, and highlights how the discovery of giant planets at $\approx$10 AU separations is becoming increasingly routine.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
Vacancy-induced suppression of CDW order and its impact on magnetic order in kagome antiferromagnet FeGe
Authors:
Mason L. Klemm,
Saif Siddique,
Yuan-Chun Chang,
Sijie Xu,
Yaofeng Xie,
Tanner Legvold,
Mehrdad T. Kiani,
Feng Ye,
Huibo Cao,
Yiqing Hao,
Wei Tian,
Hubertus Luetkens,
Masaaki Matsuda,
Douglas Natelson,
Zurab Guguchia,
Chien-Lung Huang,
Ming Yi,
Judy J. Cha,
Pengcheng Dai
Abstract:
Two-dimensional (2D) kagome lattice metals are interesting because they display flat electronic bands, Dirac points, Van Hove singularities, and can have interplay between charge density wave (CDW), magnetic order, and superconductivity. In kagome lattice antiferromagnet FeGe, a short-range CDW order was found deep within an antiferromagnetically ordered state, interacting with the magnetic order.…
▽ More
Two-dimensional (2D) kagome lattice metals are interesting because they display flat electronic bands, Dirac points, Van Hove singularities, and can have interplay between charge density wave (CDW), magnetic order, and superconductivity. In kagome lattice antiferromagnet FeGe, a short-range CDW order was found deep within an antiferromagnetically ordered state, interacting with the magnetic order. Surprisingly, post-growth annealing of FeGe at 560$^{\circ}$C can suppress the CDW order while annealing at 320$^{\circ}$C induces a long-range CDW order, with the ability to cycle between the states repeatedly by annealing. Here we perform transport, neutron scattering, scanning transmission electron microscopy (STEM), and muon spin rotation ($μ$SR) experiments to unveil the microscopic mechanism of the annealing process and its impact on magneto-transport, CDW, and magnetic properties of FeGe. We find that 560$^{\circ}$C annealing creates germanium vacancies uniformly distributed throughout the FeGe kagome lattice, which prevent the formation of Ge-Ge dimers necessary for the CDW order. Upon annealing at 320$^{\circ}$C, the system segregates into stoichiometric FeGe regions with long-range CDW order and regions with stacking faults that act as nucleation sites for the CDW. The presence or absence of CDW order greatly affects the anomalous Hall effect, incommensurate magnetic order, and spin-lattice coupling in FeGe, thus placing FeGe as the only known kagome lattice material with a tunable CDW and magnetic order, potentially useful for sensing and information transmission.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
TCP-Diffusion: A Multi-modal Diffusion Model for Global Tropical Cyclone Precipitation Forecasting with Change Awareness
Authors:
Cheng Huang,
Pan Mu,
Cong Bai,
Peter AG Watson
Abstract:
Precipitation from tropical cyclones (TCs) can cause disasters such as flooding, mudslides, and landslides. Predicting such precipitation in advance is crucial, giving people time to prepare and defend against these precipitation-induced disasters. Developing deep learning (DL) rainfall prediction methods offers a new way to predict potential disasters. However, one problem is that most existing m…
▽ More
Precipitation from tropical cyclones (TCs) can cause disasters such as flooding, mudslides, and landslides. Predicting such precipitation in advance is crucial, giving people time to prepare and defend against these precipitation-induced disasters. Developing deep learning (DL) rainfall prediction methods offers a new way to predict potential disasters. However, one problem is that most existing methods suffer from cumulative errors and lack physical consistency. Second, these methods overlook the importance of meteorological factors in TC rainfall and their integration with the numerical weather prediction (NWP) model. Therefore, we propose Tropical Cyclone Precipitation Diffusion (TCP-Diffusion), a multi-modal model for global tropical cyclone precipitation forecasting. It forecasts TC rainfall around the TC center for the next 12 hours at 3 hourly resolution based on past rainfall observations and multi-modal environmental variables. Adjacent residual prediction (ARP) changes the training target from the absolute rainfall value to the rainfall trend and gives our model the ability of rainfall change awareness, reducing cumulative errors and ensuring physical consistency. Considering the influence of TC-related meteorological factors and the useful information from NWP model forecasts, we propose a multi-model framework with specialized encoders to extract richer information from environmental variables and results provided by NWP models. The results of extensive experiments show that our method outperforms other DL methods and the NWP method from the European Centre for Medium-Range Weather Forecasts (ECMWF).
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
RS-MOCO: A deep learning-based topology-preserving image registration method for cardiac T1 mapping
Authors:
Chiyi Huang,
Longwei Sun,
Dong Liang,
Haifeng Liang,
Hongwu Zeng,
Yanjie Zhu
Abstract:
Cardiac T1 mapping can evaluate various clinical symptoms of myocardial tissue. However, there is currently a lack of effective, robust, and efficient methods for motion correction in cardiac T1 mapping. In this paper, we propose a deep learning-based and topology-preserving image registration framework for motion correction in cardiac T1 mapping. Notably, our proposed implicit consistency constra…
▽ More
Cardiac T1 mapping can evaluate various clinical symptoms of myocardial tissue. However, there is currently a lack of effective, robust, and efficient methods for motion correction in cardiac T1 mapping. In this paper, we propose a deep learning-based and topology-preserving image registration framework for motion correction in cardiac T1 mapping. Notably, our proposed implicit consistency constraint dubbed BLOC, to some extent preserves the image topology in registration by bidirectional consistency constraint and local anti-folding constraint. To address the contrast variation issue, we introduce a weighted image similarity metric for multimodal registration of cardiac T1-weighted images. Besides, a semi-supervised myocardium segmentation network and a dual-domain attention module are integrated into the framework to further improve the performance of the registration. Numerous comparative experiments, as well as ablation studies, demonstrated the effectiveness and high robustness of our method. The results also indicate that the proposed weighted image similarity metric, specifically crafted for our network, contributes a lot to the enhancement of the motion correction efficacy, while the bidirectional consistency constraint combined with the local anti-folding constraint ensures a more desirable topology-preserving registration mapping.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark
Authors:
Bin Shan,
Xiang Fei,
Wei Shi,
An-Lan Wang,
Guozhi Tang,
Lei Liao,
Jingqun Tang,
Xiang Bai,
Can Huang
Abstract:
The comprehension of text-rich visual scenes has become a focal point for evaluating Multi-modal Large Language Models (MLLMs) due to their widespread applications. Current benchmarks tailored to the scenario emphasize perceptual capabilities, while overlooking the assessment of cognitive abilities. To address this limitation, we introduce a Multimodal benchmark towards Text-rich visual scenes, to…
▽ More
The comprehension of text-rich visual scenes has become a focal point for evaluating Multi-modal Large Language Models (MLLMs) due to their widespread applications. Current benchmarks tailored to the scenario emphasize perceptual capabilities, while overlooking the assessment of cognitive abilities. To address this limitation, we introduce a Multimodal benchmark towards Text-rich visual scenes, to evaluate the Cognitive capabilities of MLLMs through visual reasoning and content-creation tasks (MCTBench). To mitigate potential evaluation bias from the varying distributions of datasets, MCTBench incorporates several perception tasks (e.g., scene text recognition) to ensure a consistent comparison of both the cognitive and perceptual capabilities of MLLMs. To improve the efficiency and fairness of content-creation evaluation, we conduct an automatic evaluation pipeline. Evaluations of various MLLMs on MCTBench reveal that, despite their impressive perceptual capabilities, their cognition abilities require enhancement. We hope MCTBench will offer the community an efficient resource to explore and enhance cognitive capabilities towards text-rich visual scenes.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
Complex-valued solutions of the mKdV equations in generalized Fourier-Lebesgue spaces
Authors:
Zijun Chen,
Zihua Guo,
Chunyan Huang
Abstract:
We study the \emph{complex-valued} solutions to the Cauchy problem of the modified Korteweg-de Vries equation on the real line. To study the low-regularity problems, we introduce a generalized Fourier-Lebesgue space $\widehat{M}^{s}_{r,q}(\mathbb{R})$ that unifies the modulation spaces and the Fourier-Lebesgue spaces. We then prove sharp local well-posedness results in this space by perturbation a…
▽ More
We study the \emph{complex-valued} solutions to the Cauchy problem of the modified Korteweg-de Vries equation on the real line. To study the low-regularity problems, we introduce a generalized Fourier-Lebesgue space $\widehat{M}^{s}_{r,q}(\mathbb{R})$ that unifies the modulation spaces and the Fourier-Lebesgue spaces. We then prove sharp local well-posedness results in this space by perturbation arguments using $X^{s,b}$-type spaces. Our results improve the previous one in \cite{GV}.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
TESS Giants Transiting Giants. VII. A Hot Saturn Orbiting an Oscillating Red Giant Star
Authors:
Nicholas Saunders,
Samuel K. Grunblatt,
Daniel Huber,
J. M. Joel Ong,
Kevin C. Schlaufman,
Daniel Hey,
Yaguang Li,
R. P. Butler,
Jeffrey D. Crane,
Steve Shectman,
Johanna K. Teske,
Samuel N. Quinn,
Samuel W. Yee,
Rafael Brahm,
Trifon Trifonov,
Andrés Jordán,
Thomas Henning,
David K. Sing,
Meredith MacGregor,
Emma Page,
David Rapetti,
Ben Falk,
Alan M. Levine,
Chelsea X. Huang,
Michael B. Lund
, et al. (4 additional authors not shown)
Abstract:
We present the discovery of TOI-7041 b (TIC 201175570 b), a hot Saturn transiting a red giant star with measurable stellar oscillations. We observe solar-like oscillations in TOI-7041 with a frequency of maximum power of $ν_{\rm max} = 218.50\pm2.23$ $μ$Hz and a large frequency separation of $Δν= 16.5282\pm0.0186$ $μ$Hz. Our asteroseismic analysis indicates that TOI-7041 has a radius of…
▽ More
We present the discovery of TOI-7041 b (TIC 201175570 b), a hot Saturn transiting a red giant star with measurable stellar oscillations. We observe solar-like oscillations in TOI-7041 with a frequency of maximum power of $ν_{\rm max} = 218.50\pm2.23$ $μ$Hz and a large frequency separation of $Δν= 16.5282\pm0.0186$ $μ$Hz. Our asteroseismic analysis indicates that TOI-7041 has a radius of $4.10 \pm 0.06$(stat) $\pm$ 0.05(sys) $R_\odot$, making it one of the largest stars around which a transiting planet has been discovered with the Transiting Exoplanet Survey Satellite (TESS), and the mission's first oscillating red giant with a transiting planet. TOI-7041 b has an orbital period of $9.691 \pm 0.006$ days and a low eccentricity of $e = 0.04 \pm 0.04$. We measure a planet radius of $1.02 \pm 0.03$ $R_J$ with photometry from TESS, and a planet mass of $0.36 \pm 0.16$ $M_J$ ($114 \pm 51$ $M_\oplus$) with ground-based radial velocity measurements. TOI-7041 b appears less inflated than similar systems receiving equivalent incident flux, and its circular orbit indicates that it is not undergoing tidal heating due to circularization. The asteroseismic analysis of the host star provides some of the tightest constraints on stellar properties for a TESS planet host and enables precise characterization of the hot Saturn. This system joins a small number of TESS-discovered exoplanets orbiting stars that exhibit clear stellar oscillations and indicates that extended TESS observations of evolved stars will similarly provide a path to improved exoplanet characterization.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Focus On What Matters: Separated Models For Visual-Based RL Generalization
Authors:
Di Zhang,
Bowen Lv,
Hai Zhang,
Feifan Yang,
Junqiao Zhao,
Hang Yu,
Chang Huang,
Hongtu Zhou,
Chen Ye,
Changjun Jiang
Abstract:
A primary challenge for visual-based Reinforcement Learning (RL) is to generalize effectively across unseen environments. Although previous studies have explored different auxiliary tasks to enhance generalization, few adopt image reconstruction due to concerns about exacerbating overfitting to task-irrelevant features during training. Perceiving the pre-eminence of image reconstruction in represe…
▽ More
A primary challenge for visual-based Reinforcement Learning (RL) is to generalize effectively across unseen environments. Although previous studies have explored different auxiliary tasks to enhance generalization, few adopt image reconstruction due to concerns about exacerbating overfitting to task-irrelevant features during training. Perceiving the pre-eminence of image reconstruction in representation learning, we propose SMG (Separated Models for Generalization), a novel approach that exploits image reconstruction for generalization. SMG introduces two model branches to extract task-relevant and task-irrelevant representations separately from visual observations via cooperatively reconstruction. Built upon this architecture, we further emphasize the importance of task-relevant features for generalization. Specifically, SMG incorporates two additional consistency losses to guide the agent's focus toward task-relevant areas across different scenarios, thereby achieving free from overfitting. Extensive experiments in DMC demonstrate the SOTA performance of SMG in generalization, particularly excelling in video-background settings. Evaluations on robotic manipulation tasks further confirm the robustness of SMG in real-world applications.
△ Less
Submitted 29 September, 2024;
originally announced October 2024.
-
Boosting Camera Motion Control for Video Diffusion Transformers
Authors:
Soon Yau Cheong,
Duygu Ceylan,
Armin Mustafa,
Andrew Gilbert,
Chun-Hao Paul Huang
Abstract:
Recent advancements in diffusion models have significantly enhanced the quality of video generation. However, fine-grained control over camera pose remains a challenge. While U-Net-based models have shown promising results for camera control, transformer-based diffusion models (DiT)-the preferred architecture for large-scale video generation - suffer from severe degradation in camera motion accura…
▽ More
Recent advancements in diffusion models have significantly enhanced the quality of video generation. However, fine-grained control over camera pose remains a challenge. While U-Net-based models have shown promising results for camera control, transformer-based diffusion models (DiT)-the preferred architecture for large-scale video generation - suffer from severe degradation in camera motion accuracy. In this paper, we investigate the underlying causes of this issue and propose solutions tailored to DiT architectures. Our study reveals that camera control performance depends heavily on the choice of conditioning methods rather than camera pose representations that is commonly believed. To address the persistent motion degradation in DiT, we introduce Camera Motion Guidance (CMG), based on classifier-free guidance, which boosts camera control by over 400%. Additionally, we present a sparse camera control pipeline, significantly simplifying the process of specifying camera poses for long videos. Our method universally applies to both U-Net and DiT models, offering improved camera control for video generation tasks.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention
Authors:
Dejia Xu,
Yifan Jiang,
Chen Huang,
Liangchen Song,
Thorsten Gernoth,
Liangliang Cao,
Zhangyang Wang,
Hao Tang
Abstract:
In recent years there have been remarkable breakthroughs in image-to-video generation. However, the 3D consistency and camera controllability of generated frames have remained unsolved. Recent studies have attempted to incorporate camera control into the generation process, but their results are often limited to simple trajectories or lack the ability to generate consistent videos from multiple di…
▽ More
In recent years there have been remarkable breakthroughs in image-to-video generation. However, the 3D consistency and camera controllability of generated frames have remained unsolved. Recent studies have attempted to incorporate camera control into the generation process, but their results are often limited to simple trajectories or lack the ability to generate consistent videos from multiple distinct camera paths for the same scene. To address these limitations, we introduce Cavia, a novel framework for camera-controllable, multi-view video generation, capable of converting an input image into multiple spatiotemporally consistent videos. Our framework extends the spatial and temporal attention modules into view-integrated attention modules, improving both viewpoint and temporal consistency. This flexible design allows for joint training with diverse curated data sources, including scene-level static videos, object-level synthetic multi-view dynamic videos, and real-world monocular dynamic videos. To our best knowledge, Cavia is the first of its kind that allows the user to precisely specify camera motion while obtaining object motion. Extensive experiments demonstrate that Cavia surpasses state-of-the-art methods in terms of geometric consistency and perceptual quality. Project Page: https://ir1d.github.io/Cavia/
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Relating Translation functor and Jacquet functor via Chan-Wong's comparison functor
Authors:
Chang Huang
Abstract:
Kei Yuen Chan and Kayue Daniel Wong constructed a functor from the category of Harish-Chandra modules of $\GL(n, \C)$ to the category of modules over graded Hecke algebra $\H_m$ of type A. This functor has several nice properties, such as compatible with parabolic inductions, and preserving standard and irreducible objects. Based on their results, we show this functor relates translation functor o…
▽ More
Kei Yuen Chan and Kayue Daniel Wong constructed a functor from the category of Harish-Chandra modules of $\GL(n, \C)$ to the category of modules over graded Hecke algebra $\H_m$ of type A. This functor has several nice properties, such as compatible with parabolic inductions, and preserving standard and irreducible objects. Based on their results, we show this functor relates translation functor on the real side and Jacquet functor on the $p$-adic side.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning
Authors:
Chengsong Huang,
Langlin Huang,
Jiaxin Huang
Abstract:
In-Context Learning (ICL) emerges as a key feature for Large Language Models (LLMs), allowing them to adapt to new tasks by leveraging task-specific examples without updating model parameters. However, ICL faces challenges with increasing numbers of examples due to performance degradation and quadratic computational costs. In this paper, we propose Logit Arithmetic Reweighting Approach (LARA), a n…
▽ More
In-Context Learning (ICL) emerges as a key feature for Large Language Models (LLMs), allowing them to adapt to new tasks by leveraging task-specific examples without updating model parameters. However, ICL faces challenges with increasing numbers of examples due to performance degradation and quadratic computational costs. In this paper, we propose Logit Arithmetic Reweighting Approach (LARA), a novel framework that enhances ICL by using logit-based ensembling of multiple demonstrations. Our approach divides long input demonstrations into parallelizable shorter inputs to significantly reduce memory requirements, and then effectively aggregate the information by reweighting logits of each group via a non-gradient optimization approach. We further introduce Binary LARA (B-LARA), a variant that constrains weights to binary values to simplify the search space and reduces memory usage by filtering out less informative demonstration groups. Experiments on BBH and MMLU demonstrate that LARA and B-LARA outperform all baseline methods in both accuracy and memory efficiency. We also conduct extensive analysis to show that LARA generalizes well to scenarios of varying numbers of examples from limited to many-shot demonstrations.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Revealing Fano Resonance in Dirac Materials ZrTe5 through Raman Scattering
Authors:
Di Cheng,
Tao Jiang,
Feng Zhang,
Genda Gu,
Liang Luo,
Chuankun Huang,
Boqun Song,
Martin Mootz,
Avinash Khatri,
Joong-Mok Park,
Qiang Li,
Yongxin Yao,
Jigang Wang
Abstract:
We explore the Fano resonance in ZrTe5, using Raman scattering measurements. We identified two closely spaced B2g phonon modes, B2g I and B2g II, around 9 meV and 10 meV, respectively. Interestingly, only B2g I exhibited the Fano resonance, an outcome of quantum interference between discrete phonon modes and continuous electronic excitations. This is consistent with the much stronger electron-phon…
▽ More
We explore the Fano resonance in ZrTe5, using Raman scattering measurements. We identified two closely spaced B2g phonon modes, B2g I and B2g II, around 9 meV and 10 meV, respectively. Interestingly, only B2g I exhibited the Fano resonance, an outcome of quantum interference between discrete phonon modes and continuous electronic excitations. This is consistent with the much stronger electron-phonon coupling of B2g I mode demonstrated by first-principles calculations. Additionally, temperature-dependent measurements highlight an enhanced Fano asymmetry at elevated temperatures, originating from the thermal effect on the joint electron-hole density of states. This study offers insights into the complex interrelation of electron-phonon coupling, thermal effect, and Fano resonances in ZrTe5.
△ Less
Submitted 16 October, 2024; v1 submitted 13 October, 2024;
originally announced October 2024.
-
Taming Overconfidence in LLMs: Reward Calibration in RLHF
Authors:
Jixuan Leng,
Chengsong Huang,
Banghua Zhu,
Jiaxin Huang
Abstract:
Language model calibration refers to the alignment between the confidence of the model and the actual performance of its responses. While previous studies point out the overconfidence phenomenon in Large Language Models (LLMs) and show that LLMs trained with Reinforcement Learning from Human Feedback (RLHF) are overconfident with a more sharpened output probability, in this study, we reveal that R…
▽ More
Language model calibration refers to the alignment between the confidence of the model and the actual performance of its responses. While previous studies point out the overconfidence phenomenon in Large Language Models (LLMs) and show that LLMs trained with Reinforcement Learning from Human Feedback (RLHF) are overconfident with a more sharpened output probability, in this study, we reveal that RLHF tends to lead models to express verbalized overconfidence in their own responses. We investigate the underlying cause of this overconfidence and demonstrate that reward models used for Proximal Policy Optimization (PPO) exhibit inherent biases towards high-confidence scores regardless of the actual quality of responses. Building upon this insight, we propose two PPO variants: PPO-M: PPO with Calibrated Reward Modeling and PPO-C: PPO with Calibrated Reward Calculation. PPO-M integrates explicit confidence scores in reward model training, which calibrates reward models to better capture the alignment between response quality and verbalized confidence. PPO-C adjusts the reward score during PPO based on the difference between the current reward and the moving average of past rewards. Both PPO-M and PPO-C can be seamlessly integrated into the current PPO pipeline and do not require additional golden labels. We evaluate our methods on both Llama3-8B and Mistral-7B across six diverse datasets including multiple-choice and open-ended generation. Experiment results demonstrate that both of our methods can reduce calibration error and maintain performance comparable to standard PPO. We further show that they do not compromise model capabilities in open-ended conversation settings.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Coupling of Electronic Transitions to Ferroelectric Order in a 2D Semiconductor
Authors:
Chun-Ying Huang,
Daniel G. Chica,
Zhi-Hao Cui,
Taketo Handa,
Morgan Thinel,
Nicholas Olsen,
Yufeng Liu,
Michael E. Ziebel,
Guiying He,
Yinming Shao,
Connor A. Occhialini,
Jonathan Pelliciari,
Dmitri N. Basov,
Matthew Sfeir,
Abhay Pasupathy,
Valentina Bisogni,
David R. Reichman,
Xavier Roy,
Xiaoyang Zhu
Abstract:
A ferroelectric material often exhibits a soft transvers optical (TO) phonon mode which governs it phase transition. Charge coupling to this ferroelectric soft mode may further mediate emergent physical properties, including superconductivity and defect tolerance. However, direct experimental evidence for such coupling is scarce. Here we show that a photo-launched coherent phonon couples strongly…
▽ More
A ferroelectric material often exhibits a soft transvers optical (TO) phonon mode which governs it phase transition. Charge coupling to this ferroelectric soft mode may further mediate emergent physical properties, including superconductivity and defect tolerance. However, direct experimental evidence for such coupling is scarce. Here we show that a photo-launched coherent phonon couples strongly to electronic transitions across the bandgap in the van der Waals (vdW) two-dimensional (2D) ferroelectric semiconductor NbOI2. Using terahertz time-domain spectroscopy and first-principles calculations, we identify this mode as the TO phonon responsible for ferroelectric order. This exclusive coupling occurs only with above-gap electronic transition and is absent in the valence band as revealed by resonant inelastic X-ray scattering. Our findings suggest a new role of the soft TO phonon mode in electronic and optical properties of ferroelectric semiconductors.
△ Less
Submitted 11 October, 2024;
originally announced October 2024.
-
Hyperspectral fluorescence imaging using a high-speed silicon photomultiplier array
Authors:
Chi Z. Huang,
Vincent D. Ching-Roa,
Connor M. Heckman,
Sherrif F. Ibrahim,
Michael G. Giacomelli
Abstract:
High-speed multiplex imaging of fluorescent probes is limited by a combination of spectral resolution, sensitivity, high cost and low light throughput of detectors, and filters. In this work, we present a hyperspectral detection system based on a silicon photomultiplier array that enables high-speed, high-light throughput hyperspectral imaging at low cost. We demonstrate 16 spectral channel imagin…
▽ More
High-speed multiplex imaging of fluorescent probes is limited by a combination of spectral resolution, sensitivity, high cost and low light throughput of detectors, and filters. In this work, we present a hyperspectral detection system based on a silicon photomultiplier array that enables high-speed, high-light throughput hyperspectral imaging at low cost. We demonstrate 16 spectral channel imaging at 50 MP/s (800M spectra per second) with a conventional two photon microscope combined with a generalized spectral unmixing model that enables extraction of spectrally overlapping fluorophores. We show that the high spectral resolution combined with high throughput enables the multiplexing of multiple contrast agents over large areas and the detection of subtle spectral shifts associated with molecular binding. Silicon photomultiplier arrays may be a promising method to extend multiplex fluorescence imaging in a variety of scenarios.
△ Less
Submitted 11 October, 2024;
originally announced October 2024.
-
Signature of Superconductivity in Pressurized Trilayer-nickelate Pr$_4$Ni$_3$O$_{10-δ}$
Authors:
Xing Huang,
Hengyuan Zhang,
Jingyuan Li,
Mengwu Huo,
Junfeng Chen,
Zhengyang Qiu,
Peiyue Ma,
Chaoxin Huang,
Hualei Sun,
Meng Wang
Abstract:
The discovery of high-temperature superconductivity in La$_3$Ni$_2$O$_7$ and La$_4$Ni$_3$O$_{10}$ under pressure has drawn extensive attention. Herein, we report systematic investigations on the evolutions of structure, magnetism, and electrical resistance of Pr$_4$Ni$_3$O$_{10-δ}$ polycrystalline samples under various pressures. Pr$_4$Ni$_3$O$_{10-δ}$ exhibits density wave transitions on Ni and P…
▽ More
The discovery of high-temperature superconductivity in La$_3$Ni$_2$O$_7$ and La$_4$Ni$_3$O$_{10}$ under pressure has drawn extensive attention. Herein, we report systematic investigations on the evolutions of structure, magnetism, and electrical resistance of Pr$_4$Ni$_3$O$_{10-δ}$ polycrystalline samples under various pressures. Pr$_4$Ni$_3$O$_{10-δ}$ exhibits density wave transitions on Ni and Pr sublattices at about 158 K and 4.3 K, respectively, and the density wave can be progressively suppressed by pressure. A structural transformation from the monoclinic $P2_1/a$ space group to the tetragonal $I4/mmm$ occurs at around 20 GPa. An apparent drop in resistance with evident magnetic field dependence is observed as pressure above 20 GPa, indicating the emergence of superconductivity in Pr$_4$Ni$_3$O$_{10-δ}$ polycrystalline samples. The discovery of the signature of superconductivity in Pr$_4$Ni$_3$O$_{10-δ}$ broadens the family of nickelate superconductors and provides a new platform for investigating the mechanisms of superconductivity in the Ruddlesden-Popper phases of nickelates.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Authors:
Susan Liang,
Chao Huang,
Yapeng Tian,
Anurag Kumar,
Chenliang Xu
Abstract:
In this paper, we introduce a novel task called language-guided joint audio-visual editing. Given an audio and image pair of a sounding event, this task aims at generating new audio-visual content by editing the given sounding event conditioned on the language guidance. For instance, we can alter the background environment of a sounding object while keeping its appearance unchanged, or we can add…
▽ More
In this paper, we introduce a novel task called language-guided joint audio-visual editing. Given an audio and image pair of a sounding event, this task aims at generating new audio-visual content by editing the given sounding event conditioned on the language guidance. For instance, we can alter the background environment of a sounding object while keeping its appearance unchanged, or we can add new sounds contextualized to the visual content. To address this task, we propose a new diffusion-based framework for joint audio-visual editing and introduce two key ideas. Firstly, we propose a one-shot adaptation approach to tailor generative diffusion models for audio-visual content editing. With as few as one audio-visual sample, we jointly transfer the audio and vision diffusion models to the target domain. After fine-tuning, our model enables consistent generation of this audio-visual sample. Secondly, we introduce a cross-modal semantic enhancement approach. We observe that when using language as content editing guidance, the vision branch may overlook editing requirements. This phenomenon, termed catastrophic neglect, hampers audio-visual alignment during content editing. We therefore enhance semantic consistency between language and vision to mitigate this issue. Extensive experiments validate the effectiveness of our method in language-based audio-visual editing and highlight its superiority over several baseline approaches. We recommend that readers visit our project page for more details: https://liangsusan-git.github.io/project/avedit/.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
First Very Long Baseline Interferometry Detections at 870μm
Authors:
Alexander W. Raymond,
Sheperd S. Doeleman,
Keiichi Asada,
Lindy Blackburn,
Geoffrey C. Bower,
Michael Bremer,
Dominique Broguiere,
Ming-Tang Chen,
Geoffrey B. Crew,
Sven Dornbusch,
Vincent L. Fish,
Roberto García,
Olivier Gentaz,
Ciriaco Goddi,
Chih-Chiang Han,
Michael H. Hecht,
Yau-De Huang,
Michael Janssen,
Garrett K. Keating,
Jun Yi Koay,
Thomas P. Krichbaum,
Wen-Ping Lo,
Satoki Matsushita,
Lynn D. Matthews,
James M. Moran
, et al. (254 additional authors not shown)
Abstract:
The first very long baseline interferometry (VLBI) detections at 870$μ$m wavelength (345$\,$GHz frequency) are reported, achieving the highest diffraction-limited angular resolution yet obtained from the surface of the Earth, and the highest-frequency example of the VLBI technique to date. These include strong detections for multiple sources observed on inter-continental baselines between telescop…
▽ More
The first very long baseline interferometry (VLBI) detections at 870$μ$m wavelength (345$\,$GHz frequency) are reported, achieving the highest diffraction-limited angular resolution yet obtained from the surface of the Earth, and the highest-frequency example of the VLBI technique to date. These include strong detections for multiple sources observed on inter-continental baselines between telescopes in Chile, Hawaii, and Spain, obtained during observations in October 2018. The longest-baseline detections approach 11$\,$G$λ$ corresponding to an angular resolution, or fringe spacing, of 19$μ$as. The Allan deviation of the visibility phase at 870$μ$m is comparable to that at 1.3$\,$mm on the relevant integration time scales between 2 and 100$\,$s. The detections confirm that the sensitivity and signal chain stability of stations in the Event Horizon Telescope (EHT) array are suitable for VLBI observations at 870$μ$m. Operation at this short wavelength, combined with anticipated enhancements of the EHT, will lead to a unique high angular resolution instrument for black hole studies, capable of resolving the event horizons of supermassive black holes in both space and time.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training
Authors:
Wanchao Liang,
Tianyu Liu,
Less Wright,
Will Constable,
Andrew Gu,
Chien-Chin Huang,
Iris Zhang,
Wei Feng,
Howard Huang,
Junjie Wang,
Sanket Purandare,
Gokul Nadathur,
Stratos Idreos
Abstract:
The development of large language models (LLMs) has been instrumental in advancing state-of-the-art natural language processing applications. Training LLMs with billions of parameters and trillions of tokens require sophisticated distributed systems that enable composing and comparing several state-of-the-art techniques in order to efficiently scale across thousands of accelerators. However, exist…
▽ More
The development of large language models (LLMs) has been instrumental in advancing state-of-the-art natural language processing applications. Training LLMs with billions of parameters and trillions of tokens require sophisticated distributed systems that enable composing and comparing several state-of-the-art techniques in order to efficiently scale across thousands of accelerators. However, existing solutions are complex, scattered across multiple libraries/repositories, lack interoperability, and are cumbersome to maintain. Thus, curating and empirically comparing training recipes require non-trivial engineering effort.
This paper introduces TorchTitan, an open-source, PyTorch-native distributed training system that unifies state-of-the-art techniques, streamlining integration and reducing overhead. TorchTitan enables 3D parallelism in a modular manner with elastic scaling, providing comprehensive logging, checkpointing, and debugging tools for production-ready training. It also incorporates hardware-software co-designed solutions, leveraging features like Float8 training and SymmetricMemory. As a flexible test bed, TorchTitan facilitates custom recipe curation and comparison, allowing us to develop optimized training recipes for Llama 3.1 and provide guidance on selecting techniques for maximum efficiency based on our experiences.
We thoroughly assess TorchTitan on the Llama 3.1 family of LLMs, spanning 8 billion to 405 billion parameters, and showcase its exceptional performance, modular composability, and elastic scalability. By stacking training optimizations, we demonstrate accelerations of 65.08% with 1D parallelism at the 128-GPU scale (Llama 3.1 8B), an additional 12.59% with 2D parallelism at the 256-GPU scale (Llama 3.1 70B), and an additional 30% with 3D parallelism at the 512-GPU scale (Llama 3.1 405B) on NVIDIA H100 GPUs over optimized baselines.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
Provable Accuracy Bounds for Hybrid Dynamical Optimization and Sampling
Authors:
Matthew X. Burns,
Qingyuan Hou,
Michael C. Huang
Abstract:
Analog dynamical accelerators (DXs) are a growing sub-field in computer architecture research, offering order-of-magnitude gains in power efficiency and latency over traditional digital methods in several machine learning, optimization, and sampling tasks. However, limited-capacity accelerators require hybrid analog/digital algorithms to solve real-world problems, commonly using large-neighborhood…
▽ More
Analog dynamical accelerators (DXs) are a growing sub-field in computer architecture research, offering order-of-magnitude gains in power efficiency and latency over traditional digital methods in several machine learning, optimization, and sampling tasks. However, limited-capacity accelerators require hybrid analog/digital algorithms to solve real-world problems, commonly using large-neighborhood local search (LNLS) frameworks. Unlike fully digital algorithms, hybrid LNLS has no non-asymptotic convergence guarantees and no principled hyperparameter selection schemes, particularly limiting cross-device training and inference.
In this work, we provide non-asymptotic convergence guarantees for hybrid LNLS by reducing to block Langevin Diffusion (BLD) algorithms. Adapting tools from classical sampling theory, we prove exponential KL-divergence convergence for randomized and cyclic block selection strategies using ideal DXs. With finite device variation, we provide explicit bounds on the 2-Wasserstein bias in terms of step duration, noise strength, and function parameters. Our BLD model provides a key link between established theory and novel computing platforms, and our theoretical results provide a closed-form expression linking device variation, algorithm hyperparameters, and performance.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
LightRAG: Simple and Fast Retrieval-Augmented Generation
Authors:
Zirui Guo,
Lianghao Xia,
Yanhua Yu,
Tu Ao,
Chao Huang
Abstract:
Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user needs. However, existing RAG systems have significant limitations, including reliance on flat data representations and inadequate contextual awareness, which can lead to fragmented answers that fail…
▽ More
Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user needs. However, existing RAG systems have significant limitations, including reliance on flat data representations and inadequate contextual awareness, which can lead to fragmented answers that fail to capture complex inter-dependencies. To address these challenges, we propose LightRAG, which incorporates graph structures into text indexing and retrieval processes. This innovative framework employs a dual-level retrieval system that enhances comprehensive information retrieval from both low-level and high-level knowledge discovery. Additionally, the integration of graph structures with vector representations facilitates efficient retrieval of related entities and their relationships, significantly improving response times while maintaining contextual relevance. This capability is further enhanced by an incremental update algorithm that ensures the timely integration of new data, allowing the system to remain effective and responsive in rapidly changing data environments. Extensive experimental validation demonstrates considerable improvements in retrieval accuracy and efficiency compared to existing approaches. We have made our LightRAG open-source and available at the link: https://github.com/HKUDS/LightRAG.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
Experimental realization of direct entangling gates between dual-type qubits
Authors:
Chenxi Wang,
Chuanxin Huang,
Hongxuan Zhang,
Hongyuan Hu,
Zhichao Mao,
Panyu Hou,
Yukai Wu,
Zichao Zhou,
Luming Duan
Abstract:
Dual-type qubits have become a promising way to suppress the crosstalk error of auxiliary operations in large-scale ion trap quantum computation. Here we demonstrate a direct entangling gate between dual-type qubits encoded in the $S_{1/2}$ and $D_{5/2}$ hyperfine manifolds of $^{137}\mathrm{Ba}^{+}$ ions. Our scheme is economic in the hardware, requiring only a single $532\,$nm laser system to en…
▽ More
Dual-type qubits have become a promising way to suppress the crosstalk error of auxiliary operations in large-scale ion trap quantum computation. Here we demonstrate a direct entangling gate between dual-type qubits encoded in the $S_{1/2}$ and $D_{5/2}$ hyperfine manifolds of $^{137}\mathrm{Ba}^{+}$ ions. Our scheme is economic in the hardware, requiring only a single $532\,$nm laser system to entangle both qubit types by driving their Raman transitions. We achieve a Bell state fidelity of $96.3(4)\%$ for the dual-type Molmer-Sorensen gate between an $S$-$D$ ion pair, comparable to that for the same-type $S$-$S$ or $D$-$D$ gates. This technique can reduce the overhead for back-and-forth conversions between dual-type qubits in the quantum circuit with wide applications in quantum error correction and ion-photon quantum networks.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Energy calibration of GTM on ground
Authors:
Chien-You Huang,
Hsiang-Kuang Chang,
Chih-Hsun Lin,
Che-Chih Tsao,
Chin-Ping Hu,
Hao-Min Chang,
Yan-Fu Chen,
An-Hsuan Feng,
Yi-Wen Huang,
Tzu-Hsuan Lin,
Yi-Ning Tsao,
Chih-En Wu,
Chun-Wei Wu
Abstract:
The Gamma-ray Transients Monitor (GTM) on board the Formosat-8B (FS-8B) satellite is designed to detect and localize Gamma-Ray Bursts (GRBs). By utilizing 2+2 CITIROC chips to manipulate 4+4 detectors, which are composed of GAGG(Ce) scintillators coupled with Silicon Photomultipliers (SiPMs) and oriented in various directions to achieve all-sky coverage, the GRB saturation fluences of GTM in the 5…
▽ More
The Gamma-ray Transients Monitor (GTM) on board the Formosat-8B (FS-8B) satellite is designed to detect and localize Gamma-Ray Bursts (GRBs). By utilizing 2+2 CITIROC chips to manipulate 4+4 detectors, which are composed of GAGG(Ce) scintillators coupled with Silicon Photomultipliers (SiPMs) and oriented in various directions to achieve all-sky coverage, the GRB saturation fluences of GTM in the 50 keV to 1 MeV range for Short GRBs (SGRBs) and Long GRBs (LGRBs) were estimated to be about $3.1 \times 10^{-4}$ and $5.0 \times 10^{-3}\ {\rm erg/cm^2}$, respectively, based on simulations. To precisely interpret the GTM readout signal in terms of energy, several measurements for isotope and gain calibration were conducted. Despite encountering issues with crosstalk and SiPM saturation effect in the data, the energy spectrum can still be recovered by appropriately discarding channel noise and mapping with the correct ADC-to-energy relation. This paper summarizes the energy resolution of GTM and the linear variations in the relationship between photon energy and readout signal. At 662 keV, the energy resolution is about 16 %. Also, it demonstrates that greater gain is achieved by increasing voltage or decreasing temperature.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems
Authors:
Ismail Alkhouri,
Shijun Liang,
Cheng-Han Huang,
Jimmy Dai,
Qing Qu,
Saiprasad Ravishankar,
Rongrong Wang
Abstract:
Diffusion models (DMs) are a class of generative models that allow sampling from a distribution learned over a training set. When applied to solving inverse imaging problems (IPs), the reverse sampling steps of DMs are typically modified to approximately sample from a measurement-conditioned distribution in the image space. However, these modifications may be unsuitable for certain settings (such…
▽ More
Diffusion models (DMs) are a class of generative models that allow sampling from a distribution learned over a training set. When applied to solving inverse imaging problems (IPs), the reverse sampling steps of DMs are typically modified to approximately sample from a measurement-conditioned distribution in the image space. However, these modifications may be unsuitable for certain settings (such as in the presence of measurement noise) and non-linear tasks, as they often struggle to correct errors from earlier sampling steps and generally require a large number of optimization and/or sampling steps. To address these challenges, we state three conditions for achieving measurement-consistent diffusion trajectories. Building on these conditions, we propose a new optimization-based sampling method that not only enforces the standard data manifold measurement consistency and forward diffusion consistency, as seen in previous studies, but also incorporates backward diffusion consistency that maintains a diffusion trajectory by optimizing over the input of the pre-trained model at every sampling step. By enforcing these conditions, either implicitly or explicitly, our sampler requires significantly fewer reverse steps. Therefore, we refer to our accelerated method as Step-wise Triple-Consistent Sampling (SITCOM). Compared to existing state-of-the-art baseline methods, under different levels of measurement noise, our extensive experiments across five linear and three non-linear image restoration tasks demonstrate that SITCOM achieves competitive or superior results in terms of standard image similarity metrics while requiring a significantly reduced run-time across all considered tasks.
△ Less
Submitted 6 October, 2024;
originally announced October 2024.
-
LHAASO detection of very-high-energy gamma-ray emission surrounding PSR J0248+6021
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
We report the detection of an extended very-high-energy (VHE) gamma-ray source coincident with the locations of middle-aged (62.4~\rm kyr) pulsar PSR J0248+6021, by using the LHAASO-WCDA data of live 796 days and LHAASO-KM2A data of live 1216 days. A significant excess of \gray induced showers is observed both by WCDA in energy bands of 1-25~\rm TeV and KM2A in energy bands of $>$ 25~\rm TeV with…
▽ More
We report the detection of an extended very-high-energy (VHE) gamma-ray source coincident with the locations of middle-aged (62.4~\rm kyr) pulsar PSR J0248+6021, by using the LHAASO-WCDA data of live 796 days and LHAASO-KM2A data of live 1216 days. A significant excess of \gray induced showers is observed both by WCDA in energy bands of 1-25~\rm TeV and KM2A in energy bands of $>$ 25~\rm TeV with 7.3 $σ$ and 13.5 $σ$, respectively. The best-fit position derived through WCDA data is R.A. = 42.06$^\circ \pm$ 0.12$^\circ$ and Dec. = 60.24$^\circ \pm $ 0.13$^\circ$ with an extension of 0.69$^\circ\pm$0.15$^\circ$ and that of the KM2A data is R.A.= 42.29$^\circ \pm $ 0.13$^\circ$ and Dec. = 60.38$^\circ \pm$ 0.07$^\circ$ with an extension of 0.37$^\circ\pm$0.07$^\circ$. No clear extended multiwavelength counterpart of this LHAASO source has been found from the radio band to the GeV band. The most plausible explanation of the VHE \gray emission is the inverse Compton process of highly relativistic electrons and positrons injected by the pulsar. These electrons/positrons are hypothesized to be either confined within the pulsar wind nebula or to have already escaped into the interstellar medium, forming a pulsar halo.
△ Less
Submitted 6 October, 2024;
originally announced October 2024.
-
General recipe for immediate entanglement death-birth transitions via Bell states: environmental Heisenberg exchange as an example
Authors:
Son-Hsien Chen,
Seng Ghee Tan,
Che-Chun Huang
Abstract:
Environment is known to play a dual role in both extinguishing and establishing entanglement, leading to entanglement sudden death (ESD) and entanglement sudden birth (ESB). In this paper, we propose a recipe for the initial states of two qubits to undergo ESD, ESB, or transition of finite duration (TFD) between them. While this recipe is \emph{generally independent of the interaction}, a spin-sta…
▽ More
Environment is known to play a dual role in both extinguishing and establishing entanglement, leading to entanglement sudden death (ESD) and entanglement sudden birth (ESB). In this paper, we propose a recipe for the initial states of two qubits to undergo ESD, ESB, or transition of finite duration (TFD) between them. While this recipe is \emph{generally independent of the interaction}, a spin-star model with environmental Heisenberg exchange is chosen for illustration. Utilizing the Bell states, we introduce the entanglement switch parameter (ESP), whose sign indicates whether the qubit bipartite entanglement is switched on or off. The classical (quantum) weighting of the Bell states encodes the ESP for initial mixed (pure) states. When more than two Bell states are adopted, the ESP permits states to penetrate through the entanglement-unentanglement boundary. In this case, the penetrability of a small ESP ensures the immediate occurrence of ESD or ESB and indicates the TFD if the local time-even symmetry in the entanglement monotone is also satisfied. When no more than two Bell states are employed, the penetrability is lost, and TFD is only identified in some mixed states but not in pure states; here for pure states, the environmental quantum degrees of freedom are associated with the number of Bell states. Thanks to the simplicity of this model, analytic results are provided. We also analyze the symmetries that can convert or alter ESD into ESB, and vice versa. The recipe enhances the controllability of entanglement dynamics and facilitates entanglement engineering.
△ Less
Submitted 21 October, 2024; v1 submitted 6 October, 2024;
originally announced October 2024.
-
Spontaneous Symmetry Breaking In Nonlinear Binary Periodic Systems
Authors:
Ruihan Peng,
Qidong Fu,
Yejia Chen,
Weidong Luo,
Changming Huang,
Fangwei Ye
Abstract:
Spontaneous symmetry breaking (SSB) occurs when modes of asymmetric profile appear in a symmetric, double-well potential, due to the nonlinearity of the potential exceeding a critical value. In this study, we examine SSB in a periodic potential where the unit cell itself is a symmetric double-well, in both one-dimensional and two-dimensional periodic systems. Using the tight-binding model, we deri…
▽ More
Spontaneous symmetry breaking (SSB) occurs when modes of asymmetric profile appear in a symmetric, double-well potential, due to the nonlinearity of the potential exceeding a critical value. In this study, we examine SSB in a periodic potential where the unit cell itself is a symmetric double-well, in both one-dimensional and two-dimensional periodic systems. Using the tight-binding model, we derive the analytical form that predicts the critical power at which SSB occurs for both 1D and 2D systems. The results show that the critical power depends significantly on the quasi-momentum of the Bloch mode, and as the modulus of momentum increases, the SSB threshold decreases rapidly, potentially dropping to zero. These analytical findings are supported by numerical nonlinear eigenmode analysis and direct propagation simulations of Bloch modes.
△ Less
Submitted 5 October, 2024;
originally announced October 2024.
-
Model-Based Reward Shaping for Adversarial Inverse Reinforcement Learning in Stochastic Environments
Authors:
Simon Sinong Zhan,
Qingyuan Wu,
Philip Wang,
Yixuan Wang,
Ruochen Jiao,
Chao Huang,
Qi Zhu
Abstract:
In this paper, we aim to tackle the limitation of the Adversarial Inverse Reinforcement Learning (AIRL) method in stochastic environments where theoretical results cannot hold and performance is degraded. To address this issue, we propose a novel method which infuses the dynamics information into the reward shaping with the theoretical guarantee for the induced optimal policy in the stochastic env…
▽ More
In this paper, we aim to tackle the limitation of the Adversarial Inverse Reinforcement Learning (AIRL) method in stochastic environments where theoretical results cannot hold and performance is degraded. To address this issue, we propose a novel method which infuses the dynamics information into the reward shaping with the theoretical guarantee for the induced optimal policy in the stochastic environments. Incorporating our novel model-enhanced rewards, we present a novel Model-Enhanced AIRL framework, which integrates transition model estimation directly into reward shaping. Furthermore, we provide a comprehensive theoretical analysis of the reward error bound and performance difference bound for our method. The experimental results in MuJoCo benchmarks show that our method can achieve superior performance in stochastic environments and competitive performance in deterministic environments, with significant improvement in sample efficiency, compared to existing baselines.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Authors:
Jiayi Ye,
Yanbo Wang,
Yue Huang,
Dongping Chen,
Qihui Zhang,
Nuno Moniz,
Tian Gao,
Werner Geyer,
Chao Huang,
Pin-Yu Chen,
Nitesh V Chawla,
Xiangliang Zhang
Abstract:
LLM-as-a-Judge has been widely utilized as an evaluation method in various benchmarks and served as supervised rewards in model training. However, despite their excellence in many domains, potential issues are under-explored, undermining their reliability and the scope of their utility. Therefore, we identify 12 key potential biases and propose a new automated bias quantification framework-CALM-wh…
▽ More
LLM-as-a-Judge has been widely utilized as an evaluation method in various benchmarks and served as supervised rewards in model training. However, despite their excellence in many domains, potential issues are under-explored, undermining their reliability and the scope of their utility. Therefore, we identify 12 key potential biases and propose a new automated bias quantification framework-CALM-which systematically quantifies and analyzes each type of bias in LLM-as-a-Judge by using automated and principle-guided modification. Our experiments cover multiple popular language models, and the results indicate that while advanced models have achieved commendable overall performance, significant biases persist in certain specific tasks. Empirical results suggest that there remains room for improvement in the reliability of LLM-as-a-Judge. Moreover, we also discuss the explicit and implicit influence of these biases and give some suggestions for the reliable application of LLM-as-a-Judge. Our work highlights the need for stakeholders to address these issues and remind users to exercise caution in LLM-as-a-Judge applications.
△ Less
Submitted 3 October, 2024; v1 submitted 3 October, 2024;
originally announced October 2024.
-
Simulation Results of Center-Manifold-Based Identification of Polynomial Nonlinear Systems with Uncontrollable Linearization
Authors:
Chao Huang,
Hao Zhang,
Zhuping Wang
Abstract:
Recently, a system identification method based on center manifold is proposed to identify polynomial nonlinear systems with uncontrollable linearization. This note presents a numerical example to show the effectiveness of this method.
Recently, a system identification method based on center manifold is proposed to identify polynomial nonlinear systems with uncontrollable linearization. This note presents a numerical example to show the effectiveness of this method.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
OmniSR: Shadow Removal under Direct and Indirect Lighting
Authors:
Jiamin Xu,
Zelong Li,
Yuxin Zheng,
Chenyu Huang,
Renshu Gu,
Weiwei Xu,
Gang Xu
Abstract:
Shadows can originate from occlusions in both direct and indirect illumination. Although most current shadow removal research focuses on shadows caused by direct illumination, shadows from indirect illumination are often just as pervasive, particularly in indoor scenes. A significant challenge in removing shadows from indirect illumination is obtaining shadow-free images to train the shadow remova…
▽ More
Shadows can originate from occlusions in both direct and indirect illumination. Although most current shadow removal research focuses on shadows caused by direct illumination, shadows from indirect illumination are often just as pervasive, particularly in indoor scenes. A significant challenge in removing shadows from indirect illumination is obtaining shadow-free images to train the shadow removal network. To overcome this challenge, we propose a novel rendering pipeline for generating shadowed and shadow-free images under direct and indirect illumination, and create a comprehensive synthetic dataset that contains over 30,000 image pairs, covering various object types and lighting conditions. We also propose an innovative shadow removal network that explicitly integrates semantic and geometric priors through concatenation and attention mechanisms. The experiments show that our method outperforms state-of-the-art shadow removal techniques and can effectively generalize to indoor and outdoor scenes under various lighting conditions, enhancing the overall effectiveness and applicability of shadow removal methods.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
FactAlign: Long-form Factuality Alignment of Large Language Models
Authors:
Chao-Wei Huang,
Yun-Nung Chen
Abstract:
Large language models have demonstrated significant potential as the next-generation information access engines. However, their reliability is hindered by issues of hallucination and generating non-factual content. This is particularly problematic in long-form responses, where assessing and ensuring factual accuracy is complex. In this paper, we address this gap by proposing FactAlign, a novel ali…
▽ More
Large language models have demonstrated significant potential as the next-generation information access engines. However, their reliability is hindered by issues of hallucination and generating non-factual content. This is particularly problematic in long-form responses, where assessing and ensuring factual accuracy is complex. In this paper, we address this gap by proposing FactAlign, a novel alignment framework designed to enhance the factuality of LLMs' long-form responses while maintaining their helpfulness. We introduce fKTO, a fine-grained, sentence-level alignment algorithm that extends the Kahneman-Tversky Optimization (KTO) alignment method. Leveraging recent advances in automatic factuality evaluation, FactAlign utilizes fine-grained factuality assessments to guide the alignment process. Our experiments on open-domain prompts and information-seeking questions demonstrate that FactAlign significantly improves the factual accuracy of LLM responses while also improving their helpfulness. Further analyses identify that FactAlign is capable of training LLMs to provide more information without losing factual precision, thus improving the factual F1 score. Our source code, datasets, and trained models are publicly available at https://github.com/MiuLab/FactAlign
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Measuring Global Urban Complexity from the Perspective of Living Structure
Authors:
Andy Jingqian Xue,
Chenyu Huang,
Bin Jiang
Abstract:
As urban critic Jane Jacobs conceived, a city is essentially the problem of organized complexity. What underlies the complexity refers to a structural factor, called living structure, which is defined as a mathematical structure composed of hierarchically organized substructures. Through these substructures, the complexity of cities, or equivalent to the livingness of urban space (L), can be measu…
▽ More
As urban critic Jane Jacobs conceived, a city is essentially the problem of organized complexity. What underlies the complexity refers to a structural factor, called living structure, which is defined as a mathematical structure composed of hierarchically organized substructures. Through these substructures, the complexity of cities, or equivalent to the livingness of urban space (L), can be measured by the multiplication the number of cities or substructures (S) and their scaling hierarchy (H), indicating that complexity is about both quantity of cities and how well the city is organized hierarchically. In other words, complexity emerges from a hierarchical structure where there are far more small cities or substructures than large ones across all scales, and cities are more or less similar within each individual hierarchical level. In this paper, we conduct comprehensive case studies to investigate urban complexity on a global scale using multisource geospatial data. We develop an efficient approach to recursively identifying all natural cities with their inner hotspots worldwide through connected component analysis. To characterize urban complexity, urban space is initially represented as a hierarchy of recursively defined natural cities, and all the cities are then represented as a network for measuring the degree of complexity or livingness of the urban space. The results show the Earth's surface is growing more complex from an economic perspective, and the dynamics of urban complexity are more explicit from nighttime light imagery than from population data. We further discuss the implications in city science, aiming to help create and recreate urban environments that are more resilient and livable by fostering organized complexity from the perspective of living structure.
△ Less
Submitted 16 September, 2024;
originally announced October 2024.
-
PairDistill: Pairwise Relevance Distillation for Dense Retrieval
Authors:
Chao-Wei Huang,
Yun-Nung Chen
Abstract:
Effective information retrieval (IR) from vast datasets relies on advanced techniques to extract relevant information in response to queries. Recent advancements in dense retrieval have showcased remarkable efficacy compared to traditional sparse retrieval methods. To further enhance retrieval performance, knowledge distillation techniques, often leveraging robust cross-encoder rerankers, have bee…
▽ More
Effective information retrieval (IR) from vast datasets relies on advanced techniques to extract relevant information in response to queries. Recent advancements in dense retrieval have showcased remarkable efficacy compared to traditional sparse retrieval methods. To further enhance retrieval performance, knowledge distillation techniques, often leveraging robust cross-encoder rerankers, have been extensively explored. However, existing approaches primarily distill knowledge from pointwise rerankers, which assign absolute relevance scores to documents, thus facing challenges related to inconsistent comparisons. This paper introduces Pairwise Relevance Distillation (PairDistill) to leverage pairwise reranking, offering fine-grained distinctions between similarly relevant documents to enrich the training of dense retrieval models. Our experiments demonstrate that PairDistill outperforms existing methods, achieving new state-of-the-art results across multiple benchmarks. This highlights the potential of PairDistill in advancing dense retrieval techniques effectively. Our source code and trained models are released at https://github.com/MiuLab/PairDistill
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Text Clustering as Classification with LLMs
Authors:
Chen Huang,
Guoxiu He
Abstract:
Text clustering remains valuable in real-world applications where manual labeling is cost-prohibitive. It facilitates efficient organization and analysis of information by grouping similar texts based on their representations. However, implementing this approach necessitates fine-tuned embedders for downstream data and sophisticated similarity metrics. To address this issue, this study presents a…
▽ More
Text clustering remains valuable in real-world applications where manual labeling is cost-prohibitive. It facilitates efficient organization and analysis of information by grouping similar texts based on their representations. However, implementing this approach necessitates fine-tuned embedders for downstream data and sophisticated similarity metrics. To address this issue, this study presents a novel framework for text clustering that effectively leverages the in-context learning capacity of Large Language Models (LLMs). Instead of fine-tuning embedders, we propose to transform the text clustering into a classification task via LLM. First, we prompt LLM to generate potential labels for a given dataset. Second, after integrating similar labels generated by the LLM, we prompt the LLM to assign the most appropriate label to each sample in the dataset. Our framework has been experimentally proven to achieve comparable or superior performance to state-of-the-art clustering methods that employ embeddings, without requiring complex fine-tuning or clustering algorithms. We make our code available to the public for utilization at https://anonymous.4open.science/r/Text-Clustering-via-LLM-E500.
△ Less
Submitted 30 September, 2024;
originally announced October 2024.
-
E-Healthcare Systems: Integrated Sensing, Computing, and Semantic Communication with Physical Layer Security
Authors:
Yinchao Yang,
Zhaohui Yang,
Weijie Yuan,
Fan Liu,
Xiaowen Cao,
Chongwen Huang,
Zhaoyang Zhang,
Mohammad Shikh-Bahaei
Abstract:
This paper introduces an integrated sensing, computing, and semantic communication (ISCSC) framework tailored for smart healthcare systems. The framework is evaluated in the context of smart healthcare, optimising the transmit beamforming matrix and semantic extraction ratio for improved data rates, sensing accuracy, and general data protection regulation (GDPR) compliance, while considering IoRT…
▽ More
This paper introduces an integrated sensing, computing, and semantic communication (ISCSC) framework tailored for smart healthcare systems. The framework is evaluated in the context of smart healthcare, optimising the transmit beamforming matrix and semantic extraction ratio for improved data rates, sensing accuracy, and general data protection regulation (GDPR) compliance, while considering IoRT device computing capabilities. Semantic metrics such as semantic transmission rate and semantic secrecy rate are derived to evaluate data rate performance and GDPR risk, respectively, while the Cramér-Rao Bound (CRB) assesses sensing performance. Simulation results demonstrate the framework's effectiveness in ensuring reliable sensing, high data rates, and secure communication.
△ Less
Submitted 30 September, 2024;
originally announced September 2024.
-
Nematic correlations and nematic Berezinskii-Kosterlitz-Thouless transition in spin-1 kagomé lattice antiferromagnets
Authors:
Chun-Jiong Huang,
Xu-Ping Yao,
Gang v. Chen
Abstract:
Nematicity plays an important role in strongly correlated electron systems. We explore the spin nematicity for the spin-1 kagomé lattice antiferromagnet with the bilinear-biquadratic model and the single-ion anisotropy using a generalized semi-classical approximation and Monte Carlo simulations. We reveal a rich ground state phase diagram, characterized by two main regions: a pure spin nematic pha…
▽ More
Nematicity plays an important role in strongly correlated electron systems. We explore the spin nematicity for the spin-1 kagomé lattice antiferromagnet with the bilinear-biquadratic model and the single-ion anisotropy using a generalized semi-classical approximation and Monte Carlo simulations. We reveal a rich ground state phase diagram, characterized by two main regions: a pure spin nematic phase, and a region featuring the coexistence of classical spin liquid and ferroicities for both dipolar and quadrupolar moments. The thermal fluctuation melts the spin nematic order into a critical phase with a quasi-long-range nematic order. Due to the fluctuating vortices of the spin nematic order, this critical phase further undergoes a nematic Berezinskii-Kosterlitz-Thouless transition to a paramagnetic phase, marked by an anomalous stiffness jump. Additionally, the single-ion anisotropy leads to a weak ferromagnetism, resulting in a spontaneous time-reversal symmetry breaking at very low temperatures. Remarkably, this weak ferromagnetic regime is accompanied with the classical spin liquid behaviors. Our results provide an intriguing glimpse into the interplay between geometric frustration and intertwining spin orders with difference ranks, and are expected to stimulate further studies on the spin-1 systems and relevant materials.
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
Bayesian Insights into post-Glitch Dynamics: Model comparison and parameter constraint from decades long observation data of the Crab pulsar
Authors:
Chun Huang,
Xiao-Ping Zheng
Abstract:
The Crab Pulsar has exhibited numerous glitches accompanied by persistent shifts in its spin-down rate. The explanation of the observed persistent shifts remain a challenge. We perform a detailed Bayesian analysis to compare four data-fitting models, ranging from a simple linear model to more complex power-law and logarithmic models, using a dataset of observed glitches and persistent shifts. Our…
▽ More
The Crab Pulsar has exhibited numerous glitches accompanied by persistent shifts in its spin-down rate. The explanation of the observed persistent shifts remain a challenge. We perform a detailed Bayesian analysis to compare four data-fitting models, ranging from a simple linear model to more complex power-law and logarithmic models, using a dataset of observed glitches and persistent shifts. Our results show the large observed events are difficult to explain by the usually assumed linear model due to starquakes. A particularly notable finding is that the logarithmic model provides the best fit to the observation data but the two power-law models show a close tie to it. Detail differences of these models may be further clarified by the understanding of internal physics of neutron stars.
△ Less
Submitted 26 September, 2024;
originally announced September 2024.
-
First Use of a Polarized $^3$He Neutron Spin Filter on the Back-n White Neutron Source of CSNS
Authors:
Mofan Zhang,
Zhou Yang,
Junpei Zhang,
Chuyi Huang,
Tianhao Wang,
Yonghao Chen,
Ruirui Fan,
W. Michael Snow
Abstract:
Polarized eV neutrons can address interesting scientific questions in nuclear physics, particle physics, and astrophysics/cosmology. We present the first experiment to polarize the neutrons on the Back-n beamline at the Chinese Spallation Neutron Source (CSNS) using an in-situ NSF using spin-exchange optical pumping (SEOP) of 3He. A 3He polarization of 68%$\pm$0.7% for this in-situ NSF was measure…
▽ More
Polarized eV neutrons can address interesting scientific questions in nuclear physics, particle physics, and astrophysics/cosmology. We present the first experiment to polarize the neutrons on the Back-n beamline at the Chinese Spallation Neutron Source (CSNS) using an in-situ NSF using spin-exchange optical pumping (SEOP) of 3He. A 3He polarization of 68%$\pm$0.7% for this in-situ NSF was measured through neutron transmission method at Back-n.This is high enough to enable new experiments on the Back-n beamline.
Polarized neutron, Polarized nuclei, CSNS, White neutron, Fundamental physics research
PACS number(s): 24.80.+y, 67.30.ep, 29.27.Hj , 24.70.+s , 32.10.Dk
△ Less
Submitted 24 September, 2024;
originally announced September 2024.
-
Reactive Multi-Robot Navigation in Outdoor Environments Through Uncertainty-Aware Active Learning of Human Preference Landscape
Authors:
Chao Huang,
Wenshuo Zang,
Carlo Pinciroli,
Zhi Jane Li,
Taposh Banerjee,
Lili Su,
Rui Liu
Abstract:
Compared with single robots, Multi-Robot Systems (MRS) can perform missions more efficiently due to the presence of multiple members with diverse capabilities. However, deploying an MRS in wide real-world environments is still challenging due to uncertain and various obstacles (e.g., building clusters and trees). With a limited understanding of environmental uncertainty on performance, an MRS cann…
▽ More
Compared with single robots, Multi-Robot Systems (MRS) can perform missions more efficiently due to the presence of multiple members with diverse capabilities. However, deploying an MRS in wide real-world environments is still challenging due to uncertain and various obstacles (e.g., building clusters and trees). With a limited understanding of environmental uncertainty on performance, an MRS cannot flexibly adjust its behaviors (e.g., teaming, load sharing, trajectory planning) to ensure both environment adaptation and task accomplishments. In this work, a novel joint preference landscape learning and behavior adjusting framework (PLBA) is designed. PLBA efficiently integrates real-time human guidance to MRS coordination and utilizes Sparse Variational Gaussian Processes with Varying Output Noise to quickly assess human preferences by leveraging spatial correlations between environment characteristics. An optimization-based behavior-adjusting method then safely adapts MRS behaviors to environments. To validate PLBA's effectiveness in MRS behavior adaption, a flood disaster search and rescue task was designed. 20 human users provided 1764 feedback based on human preferences obtained from MRS behaviors related to "task quality", "task progress", "robot safety". The prediction accuracy and adaptation speed results show the effectiveness of PLBA in preference learning and MRS behavior adaption.
△ Less
Submitted 24 September, 2024;
originally announced September 2024.
-
RIS-aided Trajectory Optimization in Layered Urban Air Mobility
Authors:
Kai Xiong,
Supeng Leng,
Liyuan Chen,
Dapei Zhang,
Chongwen Huang,
Chau Yuen
Abstract:
Urban Air Mobility (UAM) relies on developing aerospace industries, where safe aviation and efficient communication are critical features of aircraft. However, it is challenging for aircraft to sustain efficient air-ground communication in urban circumstances. Without continuous air-ground communication, aircraft may experience course deviation and safety accidents. To address these problems, a re…
▽ More
Urban Air Mobility (UAM) relies on developing aerospace industries, where safe aviation and efficient communication are critical features of aircraft. However, it is challenging for aircraft to sustain efficient air-ground communication in urban circumstances. Without continuous air-ground communication, aircraft may experience course deviation and safety accidents. To address these problems, a reconfigurable intelligent surface(RIS)-aided trajectory optimization scheme is proposed enabling efficient air-ground communication and safe aviation in UAM with a layered airspace structure. This paper first devises a dual-plane RIS communication scheme for layered airspace. It fully engages the omnidirectional and directional signal attributes to reduce the transmission delay of the air-ground communication. Based on the dual-plane RIS configuration, we jointly develop the intra- and inter-layer trajectory scheme to optimize communication and safe aviation. In the intra-layer trajectory optimization, we propose a dual-time-scale flight scheme to improve communication capacity and horizontal flight safety. Meanwhile, we propose a safe layer-switching method to ensure collision avoidance during vertical flight in the inter-layer trajectory optimization. The communication load of the proposed scheme can be improved 40% and the time of safe separation restoration can be lessened 66% compared with the benchmarks in the layered airspace.
△ Less
Submitted 24 September, 2024;
originally announced September 2024.
-
Federated Large Language Models: Current Progress and Future Directions
Authors:
Yuhang Yao,
Jianyi Zhang,
Junda Wu,
Chengkai Huang,
Yu Xia,
Tong Yu,
Ruiyi Zhang,
Sungchul Kim,
Ryan Rossi,
Ang Li,
Lina Yao,
Julian McAuley,
Yiran Chen,
Carlee Joe-Wong
Abstract:
Large language models are rapidly gaining popularity and have been widely adopted in real-world applications. While the quality of training data is essential, privacy concerns arise during data collection. Federated learning offers a solution by allowing multiple clients to collaboratively train LLMs without sharing local data. However, FL introduces new challenges, such as model convergence issue…
▽ More
Large language models are rapidly gaining popularity and have been widely adopted in real-world applications. While the quality of training data is essential, privacy concerns arise during data collection. Federated learning offers a solution by allowing multiple clients to collaboratively train LLMs without sharing local data. However, FL introduces new challenges, such as model convergence issues due to heterogeneous data and high communication costs. A comprehensive study is required to address these challenges and guide future research. This paper surveys Federated learning for LLMs (FedLLM), highlighting recent advances and future directions. We focus on two key aspects: fine-tuning and prompt learning in a federated setting, discussing existing work and associated research challenges. We finally propose potential research directions for federated LLMs, including pre-training and how LLMs can further enhance federated learning.
△ Less
Submitted 24 September, 2024;
originally announced September 2024.
-
From Our Lab to Their Homes: Learnings from Longitudinal Field Research with Older Adults
Authors:
Amama Mahmood,
Chien-Ming Huang
Abstract:
Conducting research with older adults in their home environments presents unique opportunities and challenges that differ significantly from traditional lab-based studies. In this paper, we share our experiences from year-long research activities aiming to design and evaluate conversational voice assistants for older adults through longitudinal deployment, interviews, co-design workshops, and eval…
▽ More
Conducting research with older adults in their home environments presents unique opportunities and challenges that differ significantly from traditional lab-based studies. In this paper, we share our experiences from year-long research activities aiming to design and evaluate conversational voice assistants for older adults through longitudinal deployment, interviews, co-design workshops, and evaluation studies. We discuss the benefits of bringing the lab to their home, including producing realistic and contextual interactions, creating stronger researcher-participant bonds, and enabling participant growth with the research over time. We also detail the difficulties encountered in various aspects of the research process, including recruitment, scheduling, logistics, following study protocols, and study closure. These learnings highlight the complex, yet rewarding, nature of longitudinal home-based research with older adults, offering lessons for future studies aiming to achieve real-world applicability.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
Voice Assistants for Health Self-Management: Designing for and with Older Adults
Authors:
Amama Mahmood,
Shiye Cao,
Maia Stiber,
Victor Nikhil Antony,
Chien-Ming Huang
Abstract:
Supporting older adults in health self-management is crucial for promoting independent aging, particularly given the growing strain on healthcare systems. While voice assistants (VAs) hold the potential to support aging in place, they often lack tailored assistance and present usability challenges. We addressed these issues through a five-stage design process with older adults to develop a persona…
▽ More
Supporting older adults in health self-management is crucial for promoting independent aging, particularly given the growing strain on healthcare systems. While voice assistants (VAs) hold the potential to support aging in place, they often lack tailored assistance and present usability challenges. We addressed these issues through a five-stage design process with older adults to develop a personal health assistant. Starting with in-home interviews (N=17), we identified two primary challenges in older adult's health self-management: health awareness and medical adherence. To address these challenges, we developed a high-fidelity LLM-powered VA prototype to debrief doctor's visit notes and generate tailored medication reminders. We refined our prototype with feedback from co-design workshops (N=10) and validated its usability through in-home studies (N=5). Our work highlights key design features for personal health assistants and provides broader insights into desirable VA characteristics, including personalization, adapting to user context, and respect for user autonomy.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
PackageIntel: Leveraging Large Language Models for Automated Intelligence Extraction in Package Ecosystems
Authors:
Wenbo Guo,
Chengwei Liu,
Limin Wang,
Jiahui Wu,
Zhengzi Xu,
Cheng Huang,
Yong Fang,
Yang Liu
Abstract:
The rise of malicious packages in public registries poses a significant threat to software supply chain (SSC) security. Although academia and industry employ methods like software composition analysis (SCA) to address this issue, existing approaches often lack timely and comprehensive intelligence updates. This paper introduces PackageIntel, a novel platform that revolutionizes the collection, pro…
▽ More
The rise of malicious packages in public registries poses a significant threat to software supply chain (SSC) security. Although academia and industry employ methods like software composition analysis (SCA) to address this issue, existing approaches often lack timely and comprehensive intelligence updates. This paper introduces PackageIntel, a novel platform that revolutionizes the collection, processing, and retrieval of malicious package intelligence. By utilizing exhaustive search techniques, snowball sampling from diverse sources, and large language models (LLMs) with specialized prompts, PackageIntel ensures enhanced coverage, timeliness, and accuracy. We have developed a comprehensive database containing 20,692 malicious NPM and PyPI packages sourced from 21 distinct intelligence repositories. Empirical evaluations demonstrate that PackageIntel achieves a precision of 98.6% and an F1 score of 92.0 in intelligence extraction. Additionally, it detects threats on average 70% earlier than leading databases like Snyk and OSV, and operates cost-effectively at $0.094 per intelligence piece. The platform has successfully identified and reported over 1,000 malicious packages in downstream package manager mirror registries. This research provides a robust, efficient, and timely solution for identifying and mitigating threats within the software supply chain ecosystem.
△ Less
Submitted 27 September, 2024; v1 submitted 23 September, 2024;
originally announced September 2024.
-
On demand single photon generation and coherent control of excitons from resonantly driven nanowire quantum dots
Authors:
Jun Gao,
Govind Krishna,
Edith Yeung,
Lingxi Yu,
Sayan Gangopadhyay,
Kai-Sum Chan,
Chiao-Tzu Huang,
Thomas Descamps,
Michael E. Reimer,
Philip J. Poole,
Dan Dalacu,
Val Zwiller,
Ali W. Elshaari
Abstract:
Coherent control of single photon sources is a key requirement for the advancement of photonic quantum technologies. Among them, nanowire-based quantum dot sources are popular due to their potential for on-chip hybrid integration. Here we demonstrate on-demand single-photon generation ($g^{(2)}(0)(X^{*}) =0.078$ and $g^{(2)}(0)(X)= 0.03$) from resonantly excited InAsP/InP nanowire quantum dots and…
▽ More
Coherent control of single photon sources is a key requirement for the advancement of photonic quantum technologies. Among them, nanowire-based quantum dot sources are popular due to their potential for on-chip hybrid integration. Here we demonstrate on-demand single-photon generation ($g^{(2)}(0)(X^{*}) =0.078$ and $g^{(2)}(0)(X)= 0.03$) from resonantly excited InAsP/InP nanowire quantum dots and observe Rabi oscillations in the dot emission, indicating successful coherent manipulation of the excitonic states in the nanowire. We also measure a low emission time jitter for resonant excitation as compared to above-band excitation. This work addresses the long-standing challenge of resonantly exciting nanowire-quantum dots. It paves the way for hybrid quantum photonic integration, enabling spin-photon entanglement and matter memories on-chip.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information
Authors:
Zheng Hui,
Zhaoxiao Guo,
Hang Zhao,
Juanyong Duan,
Congrui Huang
Abstract:
In different NLP tasks, detecting harmful content is crucial for online environments, especially with the growing influence of social media. However, previous research has two main issues: 1) a lack of data in low-resource settings, and 2) inconsistent definitions and criteria for judging harmful content, requiring classification models to be robust to spurious features and diverse. We propose Tox…
▽ More
In different NLP tasks, detecting harmful content is crucial for online environments, especially with the growing influence of social media. However, previous research has two main issues: 1) a lack of data in low-resource settings, and 2) inconsistent definitions and criteria for judging harmful content, requiring classification models to be robust to spurious features and diverse. We propose Toxicraft, a novel framework for synthesizing datasets of harmful information to address these weaknesses. With only a small amount of seed data, our framework can generate a wide variety of synthetic, yet remarkably realistic, examples of toxic information. Experimentation across various datasets showcases a notable enhancement in detection model robustness and adaptability, surpassing or close to the gold labels. We release the generated data at Github upon acceptance.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.