subscribe to arXiv mailings

VoiceBench: Benchmarking LLM-Based Voice Assistants

Authors: Yiming Chen, Xianghu Yue, Chen Zhang, Xiaoxue Gao, Robby T. Tan, Haizhou Li

Abstract: Building on the success of large language models (LLMs), recent advancements such as GPT-4o have enabled real-time speech interactions through LLM-based voice assistants, offering a significantly improved user experience compared to traditional text-based interactions. However, the absence of benchmarks designed to evaluate these speech interaction capabilities has hindered progress of LLM-based v… ▽ More Building on the success of large language models (LLMs), recent advancements such as GPT-4o have enabled real-time speech interactions through LLM-based voice assistants, offering a significantly improved user experience compared to traditional text-based interactions. However, the absence of benchmarks designed to evaluate these speech interaction capabilities has hindered progress of LLM-based voice assistants development. Current evaluations focus primarily on automatic speech recognition (ASR) or general knowledge evaluation with clean speeches, neglecting the more intricate, real-world scenarios that involve diverse speaker characteristics, environmental and content factors. To address this, we introduce VoiceBench, the first benchmark designed to provide a multi-faceted evaluation of LLM-based voice assistants. VoiceBench also includes both real and synthetic spoken instructions that incorporate the above three key real-world variations. Extensive experiments reveal the limitations of current LLM-based voice assistant models and offer valuable insights for future research and development in this field. △ Less

Submitted 22 October, 2024; originally announced October 2024.

Comments: Work in progress. Data is available at https://github.com/MatthewCYM/VoiceBench

arXiv:2410.13351 [pdf, other]

Representation Learning of Structured Data for Medical Foundation Models

Authors: Vijay Prakash Dwivedi, Viktor Schlegel, Andy T. Liu, Thanh-Tung Nguyen, Abhinav Ramesh Kashyap, Jeng Wei, Wei-Hsian Yin, Stefan Winkler, Robby T. Tan

Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across various domains, including healthcare. However, their ability to effectively represent structured non-textual data, such as the alphanumeric medical codes used in records like ICD-10 or SNOMED-CT, is limited and has been particularly exposed in recent research. This paper examines the challenges LLMs face in processing me… ▽ More Large Language Models (LLMs) have demonstrated remarkable performance across various domains, including healthcare. However, their ability to effectively represent structured non-textual data, such as the alphanumeric medical codes used in records like ICD-10 or SNOMED-CT, is limited and has been particularly exposed in recent research. This paper examines the challenges LLMs face in processing medical codes due to the shortcomings of current tokenization methods. As a result, we introduce the UniStruct architecture to design a multimodal medical foundation model of unstructured text and structured data, which addresses these challenges by adapting subword tokenization techniques specifically for the structured medical codes. Our approach is validated through model pre-training on both an extensive internal medical database and a public repository of structured medical records. Trained on over 1 billion tokens on the internal medical database, the proposed model achieves up to a 23% improvement in evaluation metrics, with around 2% gain attributed to our proposed tokenization. Additionally, when evaluated on the EHRSHOT public benchmark with a 1/1000 fraction of the pre-training data, the UniStruct model improves performance on over 42% of the downstream tasks. Our approach not only enhances the representation and generalization capabilities of patient-centric models but also bridges a critical gap in representation learning models' ability to handle complex structured medical data, alongside unstructured text. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: NeurIPS 2024 Workshop on Unifying Representations in Neural Models (UniReps 2024)

arXiv:2410.10121 [pdf, other]

Interaction-Guided Two-Branch Image Dehazing Network

Authors: Huichun Liu, Xiaosong Li, Tianshu Tan

Abstract: Image dehazing aims to restore clean images from hazy ones. Convolutional Neural Networks (CNNs) and Transformers have demonstrated exceptional performance in local and global feature extraction, respectively, and currently represent the two mainstream frameworks in image dehazing. In this paper, we propose a novel dual-branch image dehazing framework that guides CNN and Transformer components int… ▽ More Image dehazing aims to restore clean images from hazy ones. Convolutional Neural Networks (CNNs) and Transformers have demonstrated exceptional performance in local and global feature extraction, respectively, and currently represent the two mainstream frameworks in image dehazing. In this paper, we propose a novel dual-branch image dehazing framework that guides CNN and Transformer components interactively. We reconsider the complementary characteristics of CNNs and Transformers by leveraging the differential relationships between global and local features for interactive guidance. This approach enables the capture of local feature positions through global attention maps, allowing the CNN to focus solely on feature information at effective positions. The single-branch Transformer design ensures the network's global information recovery capability. Extensive experiments demonstrate that our proposed method yields competitive qualitative and quantitative evaluation performance on both synthetic and real public datasets. Codes are available at https://github.com/Feecuin/Two-Branch-Dehazing △ Less

Submitted 13 October, 2024; originally announced October 2024.

Comments: Accepted by ACCV 2024

arXiv:2410.08638 [pdf]

Leveraging reconfigurable micro-resonator soliton crystals for Intensity-Modulated Direct Detection Data Transmission

Authors: Xavier X. Chia, Kenny Y. K. Ong, A. Aadhi, George F. R. Chen, Ju Won Choi, Byoung-Uk Sohn, Amdad Chowdury, Dawn T. H. Tan

Abstract: The perennial demand for highly efficient short-haul communications is evidenced by a sustained explosion of growth in data center infrastructure that is predicted to continue for the foreseeable future. In these relatively compact networks, cost-sensitivity is of particular importance, which limits options to direct detection schemes that are more cost efficient than their coherent counterparts.… ▽ More The perennial demand for highly efficient short-haul communications is evidenced by a sustained explosion of growth in data center infrastructure that is predicted to continue for the foreseeable future. In these relatively compact networks, cost-sensitivity is of particular importance, which limits options to direct detection schemes that are more cost efficient than their coherent counterparts. Since their initial demonstration, multi-soliton states in optical microresonators have been observed to manifest in self-organised ensembles where soliton pulses are equally spaced around the resonators. In the spectral domain, these states, dubbed soliton crystals (SCs), result in significant enhancements to individual comb lines depending on the crystal state, making them well suited towards intensity-modulated direct detection (IMDD) schemes. In this work, we experimentally demonstrate adiabatic, deterministic access to lower-order soliton crystal states using an auxiliary-assisted cavity pumping method, attaining up to 19.6 dB enhancement of the comb lines in the 7-SC configuration compared to the single-soliton state. Seven comb lines of each 46 Gbaud/s pulse amplitude modulation 4 (PAM4) is transmitted over 4km of fiber in comb lines across the C-band with bit-error-rates (BER) as low as 5E-5. Our demonstration shows the promising way of using soliton crystal states as future integrated sources for highly stable Terabaud/s datacenter communications. △ Less

Submitted 11 October, 2024; originally announced October 2024.

arXiv:2410.05125 [pdf, other]

Dense Plasma Opacity from Excited States Method

Authors: C. E. Starrett, C. J. Fontes, H. B. Tran Tan, J. M. Kasper, J. R. White

Abstract: The self-consistent inclusion of plasma effects in opacity calculations is a significant modeling challenge. As density increases, such effects can no longer be treated perturbatively. Building on a recently published model that addresses this challenge, we calculate opacities of oxygen at solar interior conditions. The new model includes the effects of treating the free electrons consistently wit… ▽ More The self-consistent inclusion of plasma effects in opacity calculations is a significant modeling challenge. As density increases, such effects can no longer be treated perturbatively. Building on a recently published model that addresses this challenge, we calculate opacities of oxygen at solar interior conditions. The new model includes the effects of treating the free electrons consistently with the bound electrons, and the influence of free electron energy and entropy variations are explored. It is found that, relative to a state-of-the-art-model that does not include these effects, the bound free-opacity of the oxygen plasmas considered can increase by 10%. △ Less

Submitted 7 October, 2024; originally announced October 2024.

arXiv:2410.01753 [pdf, other]

$^{229}\mathrm{ThF}_4$ thin films for solid-state nuclear clocks

Authors: Chuankun Zhang, Lars von der Wense, Jack F. Doyle, Jacob S. Higgins, Tian Ooi, Hans U. Friebel, Jun Ye, R. Elwell, J. E. S. Terhune, H. W. T. Morgan, A. N. Alexandrova, H. B. Tran Tan, Andrei Derevianko, Eric R. Hudson

Abstract: After nearly fifty years of searching, the vacuum ultraviolet $^{229}$Th nuclear isomeric transition has recently been directly laser excited [1,2] and measured with high spectroscopic precision [3]. Nuclear clocks based on this transition are expected to be more robust [4,5] than and may outperform [6,7] current optical atomic clocks. They also promise sensitive tests for new physics beyond the s… ▽ More After nearly fifty years of searching, the vacuum ultraviolet $^{229}$Th nuclear isomeric transition has recently been directly laser excited [1,2] and measured with high spectroscopic precision [3]. Nuclear clocks based on this transition are expected to be more robust [4,5] than and may outperform [6,7] current optical atomic clocks. They also promise sensitive tests for new physics beyond the standard model [5,8,9]. In light of these important advances and applications, a dramatic increase in the need for $^{229}$Th spectroscopy targets in a variety of platforms is anticipated. However, the growth and handling of high-concentration $^{229}$Th-doped crystals [5] used in previous measurements [1-3,10] are challenging due to the scarcity and radioactivity of the $^{229}$Th material. Here, we demonstrate a potentially scalable solution to these problems by demonstrating laser excitation of the nuclear transition in $^{229}$ThF$_4$ thin films grown with a physical vapor deposition process, consuming only micrograms of $^{229}$Th material. The $^{229}$ThF$_4$ thin films are intrinsically compatible with photonics platforms and nanofabrication tools for integration with laser sources and detectors, paving the way for an integrated and field-deployable solid-state nuclear clock with radioactivity up to three orders of magnitude smaller than typical \thor-doped crystals [1-3,10]. The high nuclear emitter density in $^{229}$ThF$_4$ also potentially enables quantum optics studies in a new regime. Finally, we describe the operation and present the estimation of the performance of a nuclear clock based on a defect-free ThF$_4$ crystal. △ Less

Submitted 2 October, 2024; originally announced October 2024.

Comments: 15 pages, 3 figures

arXiv:2409.19745 [pdf, other]

PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead

Authors: Tao Tan, Yining Qian, Ang Lv, Hongzhan Lin, Songhao Wu, Yongbo Wang, Feng Wang, Jingtong Wu, Xin Lu, Rui Yan

Abstract: Large language models (LLMs) enhanced with retrieval-augmented generation (RAG) have introduced a new paradigm for web search. However, the limited context awareness of LLMs degrades their performance on RAG tasks. Existing methods to enhance context awareness are often inefficient, incurring time or memory overhead during inference, and many are tailored to specific position embeddings. In this p… ▽ More Large language models (LLMs) enhanced with retrieval-augmented generation (RAG) have introduced a new paradigm for web search. However, the limited context awareness of LLMs degrades their performance on RAG tasks. Existing methods to enhance context awareness are often inefficient, incurring time or memory overhead during inference, and many are tailored to specific position embeddings. In this paper, we propose Position-Embedding-Agnostic attention Re-weighting (PEAR), which enhances the context awareness of LLMs with zero inference overhead. Specifically, on a proxy task focused on context copying, we first detect heads which suppress the models' context awareness thereby diminishing RAG performance. To weaken the impact of these heads, we re-weight their outputs with learnable coefficients. The LLM (with frozen parameters) is optimized by adjusting these coefficients to minimize loss on the proxy task. As a result, the coefficients are optimized to values less than one, thereby reducing their tendency to suppress RAG performance. During inference, the optimized coefficients are fixed to re-weight these heads, regardless of the specific task at hand. Our proposed PEAR offers two major advantages over previous approaches: (1) It introduces zero additional inference overhead in terms of memory usage or inference time, while outperforming competitive baselines in accuracy and efficiency across various RAG tasks. (2) It is independent of position embedding algorithms, ensuring broader applicability. △ Less

Submitted 7 October, 2024; v1 submitted 29 September, 2024; originally announced September 2024.

Comments: preprint

arXiv:2409.18680 [pdf, other]

Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models

Authors: Yiming Chen, Xianghu Yue, Xiaoxue Gao, Chen Zhang, Luis Fernando D'Haro, Robby T. Tan, Haizhou Li

Abstract: Various audio-LLMs (ALLMs) have been explored recently for tackling different audio tasks simultaneously using a single, unified model. While existing evaluations of ALLMs primarily focus on single-audio tasks, real-world applications often involve processing multiple audio streams simultaneously. To bridge this gap, we propose the first multi-audio evaluation (MAE) benchmark that consists of 20 d… ▽ More Various audio-LLMs (ALLMs) have been explored recently for tackling different audio tasks simultaneously using a single, unified model. While existing evaluations of ALLMs primarily focus on single-audio tasks, real-world applications often involve processing multiple audio streams simultaneously. To bridge this gap, we propose the first multi-audio evaluation (MAE) benchmark that consists of 20 datasets from 11 multi-audio tasks encompassing both speech and sound scenarios. Comprehensive experiments on MAE demonstrate that the existing ALLMs, while being powerful in comprehending primary audio elements in individual audio inputs, struggling to handle multi-audio scenarios. To this end, we propose a novel multi-audio-LLM (MALLM) to capture audio context among multiple similar audios using discriminative learning on our proposed synthetic data. The results demonstrate that the proposed MALLM outperforms all baselines and achieves high data efficiency using synthetic data without requiring human annotations. The proposed MALLM opens the door for ALLMs towards multi-audio processing era and brings us closer to replicating human auditory capabilities in machines. △ Less

Submitted 1 October, 2024; v1 submitted 27 September, 2024; originally announced September 2024.

Comments: EMNLP24 Findings

arXiv:2409.18129 [pdf, other]

TOI-5005 b: A super-Neptune in the savanna near the ridge

Authors: A. Castro-González, J. Lillo-Box, D. J. Armstrong, L. Acuña, A. Aguichine, V. Bourrier, S. Gandhi, S. G. Sousa, E. Delgado-Mena, A. Moya, V. Adibekyan, A. C. M. Correia, D. Barrado, M. Damasso, J. N. Winn, N. C. Santos, K. Barkaoui, S. C. C. Barros, Z. Benkhaldoun, F. Bouchy, C. Briceño, D. A. Caldwell, K. A. Collins, Z. Essack, M. Ghachoui , et al. (16 additional authors not shown)

Abstract: The Neptunian desert and savanna have been recently found to be separated by a ridge, an overdensity of planets in the $\simeq$3-5 days period range. These features are thought to be shaped by dynamical and atmospheric processes. However, their relative roles are not yet well understood. We intend to confirm and characterise the super-Neptune TESS candidate TOI-5005.01, which orbits a moderately b… ▽ More The Neptunian desert and savanna have been recently found to be separated by a ridge, an overdensity of planets in the $\simeq$3-5 days period range. These features are thought to be shaped by dynamical and atmospheric processes. However, their relative roles are not yet well understood. We intend to confirm and characterise the super-Neptune TESS candidate TOI-5005.01, which orbits a moderately bright (V = 11.8) solar-type star (G2 V) with an orbital period of 6.3 days. We confirm TOI-5005 b to be a transiting super-Neptune with a radius of $R_{\rm p}$ = $6.25\pm 0.24$ $\rm R_{\rm \oplus}$ ($R_{\rm p}$ = $0.558\pm 0.021$ $\rm R_{\rm J}$) and a mass of $M_{\rm p}$ = $32.7\pm 5.9$ $\rm M_{\oplus}$ ($M_{\rm p}$ = $0.103\pm 0.018$ $\rm M_{\rm J}$), which corresponds to a mean density of $ρ_{\rm p}$ = $0.74 \pm 0.16$ $\rm g \, cm^{-3}$. Our internal structure modelling indicates that the overall metal mass fraction is well constrained to a value slightly lower than that of Neptune and Uranus ($Z_{\rm planet}$ = $0.76^{+0.04}_{-0.11}$). We also estimated the present-day atmospheric mass-loss rate of TOI-5005 b but found contrasting predictions depending on the choice of photoevaporation model. At a population level, we find statistical evidence ($p$-value = $0.0092^{+0.0184}_{-0.0066}$) that planets in the savanna such as TOI-5005 b tend to show lower densities than planets in the ridge, with a dividing line around 1 $\rm g \, cm^{-3}$, which supports the hypothesis of different evolutionary pathways populating both regimes. TOI-5005 b is located in a key region of the period-radius space to study the transition between the Neptunian ridge and the savanna. It orbits the brightest star of all such planets, which makes it a target of interest for atmospheric and orbital architecture observations that will bring a clearer picture of its overall evolution. △ Less

Submitted 26 September, 2024; originally announced September 2024.

Comments: Accepted for publication in A&A. Abstract shortened. 35 pages, 26 figures

arXiv:2409.17682 [pdf, other]

Dark Miner: Defend against unsafe generation for text-to-image diffusion models

Authors: Zheling Meng, Bo Peng, Xiaochuan Jin, Yue Jiang, Jing Dong, Wei Wang, Tieniu Tan

Abstract: Text-to-image diffusion models have been demonstrated with unsafe generation due to unfiltered large-scale training data, such as violent, sexual, and shocking images, necessitating the erasure of unsafe concepts. Most existing methods focus on modifying the generation probabilities conditioned on the texts containing unsafe descriptions. However, they fail to guarantee safe generation for unseen… ▽ More Text-to-image diffusion models have been demonstrated with unsafe generation due to unfiltered large-scale training data, such as violent, sexual, and shocking images, necessitating the erasure of unsafe concepts. Most existing methods focus on modifying the generation probabilities conditioned on the texts containing unsafe descriptions. However, they fail to guarantee safe generation for unseen texts in the training phase, especially for the prompts from adversarial attacks. In this paper, we re-analyze the erasure task and point out that existing methods cannot guarantee the minimization of the total probabilities of unsafe generation. To tackle this problem, we propose Dark Miner. It entails a recurring three-stage process that comprises mining, verifying, and circumventing. It greedily mines embeddings with maximum generation probabilities of unsafe concepts and reduces unsafe generation more effectively. In the experiments, we evaluate its performance on two inappropriate concepts, two objects, and two styles. Compared with 6 previous state-of-the-art methods, our method achieves better erasure and defense results in most cases, especially under 4 state-of-the-art attacks, while preserving the model's native generation capability. Our code will be available on GitHub. △ Less

Submitted 26 September, 2024; originally announced September 2024.

arXiv:2409.17558 [pdf, other]

Demonstration of entanglement distribution over 155 km metropolitan fiber using a silicon nanophotonic chip

Authors: Jinyi Du, Xingjian Zhang, George F. R. Chen, Hongwei Gao, Dawn T. H. Tan, Alexander Ling

Abstract: Transmitting an entangled state over an extended distance is crucial for the development of quantum networks. Previous demonstrations of transmitting entangled photons over long distance using satellites or fibers have use entangled photon pairs generated from bulk crystal arrangements. An alternative approach would be to generate photon pairs using silicon-on-insulator (SOI) chips. Despite numero… ▽ More Transmitting an entangled state over an extended distance is crucial for the development of quantum networks. Previous demonstrations of transmitting entangled photons over long distance using satellites or fibers have use entangled photon pairs generated from bulk crystal arrangements. An alternative approach would be to generate photon pairs using silicon-on-insulator (SOI) chips. Despite numerous proof-of-concept studies, no long range distribution has been achieved using this platform because of the challenge of getting sufficient off-chip brightness. We report a SOI platform that provides an off-chip entangled photon pair brightness of between 8,000 to 460,000 pairs per second. This exceeds previous reports by three orders of magnitude in brightness. The entanglement fidelity is 99.85(6)% and 97.90(3)% respectively. Measuring one photon locally, and transmitting the other over 93 km of deployed fiber (link loss of 40 dB), achieves a count rate of 132 pairs per second with an entanglement fidelity of 93.3(3)%, after solving the additional challenges of chromatic dispersion. The source can be pumped harder to enable transmission of entangled photons over 155 km of deployed fiber (link loss of 66 dB) at a rate of 0.7 pairs per second, with an entanglement fidelity of 87.6(5)%. These results demonstrate that SOI nanophotonic chips can perform competitively with bulk crystal sources and represent an important step toward building quantum networks using integrated nanophotonic platforms. △ Less

Submitted 26 September, 2024; originally announced September 2024.

arXiv:2409.14818 [pdf, other]

MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding

Authors: Qinzhuo Wu, Weikai Xu, Wei Liu, Tao Tan, Jianfeng Liu, Ang Li, Jian Luan, Bin Wang, Shuo Shang

Abstract: Recently, mobile AI agents based on VLMs have been gaining increasing attention. These works typically utilize VLM as a foundation, fine-tuning it with instruction-based mobile datasets. However, these VLMs are typically pre-trained on general-domain data, which often results in a lack of fundamental capabilities specific to the mobile domain. Therefore, they may struggle to recognize specific UI… ▽ More Recently, mobile AI agents based on VLMs have been gaining increasing attention. These works typically utilize VLM as a foundation, fine-tuning it with instruction-based mobile datasets. However, these VLMs are typically pre-trained on general-domain data, which often results in a lack of fundamental capabilities specific to the mobile domain. Therefore, they may struggle to recognize specific UI elements and understand intra-UI fine-grained information. In addition, the current fine-tuning task focuses on interacting with the most relevant element for the given instruction. These fine-tuned VLMs may still ignore the relationships between UI pages, neglect the roles of elements in page transitions and lack inter-UI understanding. To address issues, we propose a VLM called MobileVLM, which includes two additional pre-training stages to enhance both intra- and inter-UI understanding. We defined four UI-based pre-training tasks, enabling the model to better perceive fine-grained elements and capture page transition actions. To address the lack of mobile pre-training data, we built a large Chinese mobile dataset Mobile3M from scratch, which contains 3 million UI pages, and real-world transition actions, forming a directed graph structure. Experimental results show MobileVLM excels on both our test set and public mobile benchmarks, outperforming existing VLMs. △ Less

Submitted 3 October, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

arXiv:2409.13015 [pdf, other]

First Resolution of Microlensed Images of a Binary-Lens Event

Authors: Zexuan Wu, Subo Dong, A. Mérand, Christopher S. Kochanek, Przemek Mróz, Jinyi Shangguan, Grant Christie, Thiam-Guan Tan, Thomas Bensby, Joss Bland-Hawthorn, Sven Buder, Frank Eisenhauer, Andrew P. Gould, Janez Kos, Tim Natusch, Sanjib Sharma, Andrzej Udalski, J. Woillez, David A. H. Buckley, I. B. Thompson, Karim Abd El Dayem, Evelyne Alecian, Carine Babusiaux, Anthony Berdeu, Jean-Philippe Berger , et al. (53 additional authors not shown)

Abstract: We resolve the multiple images of the binary-lens microlensing event ASASSN-22av using the GRAVITY instrument of the Very Large Telescope Interferometer (VLTI). The light curves show weak binary perturbations, complicating the analysis, but the joint modeling with the VLTI data breaks several degeneracies, arriving at a strongly favored solution. Thanks to precise measurements of angular Einstein… ▽ More We resolve the multiple images of the binary-lens microlensing event ASASSN-22av using the GRAVITY instrument of the Very Large Telescope Interferometer (VLTI). The light curves show weak binary perturbations, complicating the analysis, but the joint modeling with the VLTI data breaks several degeneracies, arriving at a strongly favored solution. Thanks to precise measurements of angular Einstein radius θ_E = 0.726 +/- 0.002 mas and microlens parallax, we determine that the lens system consists of two M dwarfs with masses of M_1 = 0.261 +/- 0.009 M_sun and M_2 = 0.252 +/- 0.017 M_sun, a projected separation of r_\perp = 7.42 +/- 0.33 AU and a distance of D_L = 2.31 +/- 0.09 kpc. The successful VLTI observations of ASASSN-22av open up a new path for studying intermediate-separation (i.e., a few AUs) stellar-mass binaries, including those containing dark compact objects such as neutron stars and stellar-mass black holes. △ Less

Submitted 19 September, 2024; originally announced September 2024.

Comments: see the ancillary file for animation associated with Fig. 8

arXiv:2409.08651 [pdf, other]

Light-induced cortical excitability reveals programmable shape dynamics in starfish oocytes

Authors: Jinghui Liu, Tom Burkart, Alexander Ziepke, John Reinhard, Yu-Chen Chao, Tzer Han Tan, S. Zachary Swartz, Erwin Frey, Nikta Fakhri

Abstract: Chemo-mechanical waves on active deformable surfaces are a key component for many vital cellular functions. In particular, these waves play a major role in force generation and long-range signal transmission in cells that dynamically change shape, as encountered during cell division or morphogenesis. Reconstituting and controlling such chemically controlled cell deformations is a crucial but unsol… ▽ More Chemo-mechanical waves on active deformable surfaces are a key component for many vital cellular functions. In particular, these waves play a major role in force generation and long-range signal transmission in cells that dynamically change shape, as encountered during cell division or morphogenesis. Reconstituting and controlling such chemically controlled cell deformations is a crucial but unsolved challenge for the development of synthetic cells. Here, we develop an optogenetic method to elucidate the mechanism responsible for coordinating surface contraction waves that occur in oocytes of the starfish Patiria miniata during meiotic cell division. Using spatiotemporally-patterned light stimuli as a control input, we create chemo-mechanical cortical excitations that are decoupled from meiotic cues and drive diverse shape deformations ranging from local pinching to surface contraction waves and cell lysis. We develop a quantitative model that entails the hierarchy of chemical and mechanical dynamics, which allows to relate the variety of mechanical responses to optogenetic stimuli. Our framework systematically predicts and explains transitions of programmed shape dynamics. Finally, we qualitatively map the observed shape dynamics to elucidate how the versatility of intracellular protein dynamics can give rise to a broad range of mechanical phenomenologies. More broadly, our results pave the way toward real-time control over dynamical deformations in living organisms and can advance the design of synthetic cells and life-like cellular functions. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: 36 pages, 16 figures, 11 movies

arXiv:2409.07520 [pdf, other]

The inflated, eccentric warm Jupiter TOI-4914 b orbiting a metal-poor star, and the hot Jupiters TOI-2714 b and TOI-2981 b

Authors: G. Mantovan, T. G. Wilson, L. Borsato, T. Zingales, K. Biazzo, D. Nardiello, L. Malavolta, S. Desidera, F. Marzari, A. Collier Cameron, V. Nascimbeni, F. Z. Majidi, M. Montalto, G. Piotto, K. G. Stassun, J. N. Winn, J. M. Jenkins, L. Mignon, A. Bieryla, D. W. Latham, K. Barkaoui, K. A. Collins, P. Evans, M. M. Fausnaugh, V. Granata , et al. (10 additional authors not shown)

Abstract: Recent observations of giant planets have revealed unexpected bulk densities. Hot Jupiters, in particular, appear larger than expected for their masses compared to planetary evolution models, while warm Jupiters seem denser than expected. These differences are often attributed to the influence of the stellar incident flux, but could they also result from different planet formation processes? Is th… ▽ More Recent observations of giant planets have revealed unexpected bulk densities. Hot Jupiters, in particular, appear larger than expected for their masses compared to planetary evolution models, while warm Jupiters seem denser than expected. These differences are often attributed to the influence of the stellar incident flux, but could they also result from different planet formation processes? Is there a trend linking the planetary density to the chemical composition of the host star? In this work we present the confirmation of three giant planets in orbit around solar analogue stars. TOI-2714 b ($P \simeq 2.5$ d, $R_{\rm p} \simeq 1.22 R_{\rm J}$, $M_{\rm p} = 0.72 M_{\rm J}$) and TOI-2981 b ($P \simeq 3.6$ d, $R_{\rm p} \simeq 1.2 R_{\rm J}$, $M_{\rm p} = 2 M_{\rm J}$) are hot Jupiters on nearly circular orbits, while TOI-4914 b ($P \simeq 10.6$ d, $R_{\rm p} \simeq 1.15 R_{\rm J}$, $M_{\rm p} = 0.72 M_{\rm J}$) is a warm Jupiter with a significant eccentricity ($e = 0.41 \pm 0.02$) that orbits a star more metal-poor ([Fe/H]$~= -0.13$) than most of the stars known to host giant planets. Our radial velocity (RV) follow-up with the HARPS spectrograph allows us to detect their Keplerian signals at high significance (7, 30, and 23$σ$, respectively) and to place a strong constraint on the eccentricity of TOI-4914 b (18$σ$). TOI-4914 b, with its large radius and low insolation flux ($F_\star < 2 \times 10^8~{\rm erg~s^{-1}~cm^{-2}}$), appears to be more inflated than what is supported by current theoretical models for giant planets. Moreover, it does not conform to the previously noted trend that warm giant planets orbiting metal-poor stars have low eccentricities. This study thus provides insights into the diverse orbital characteristics and formation processes of giant exoplanets, in particular the role of stellar metallicity in the evolution of planetary systems. △ Less

Submitted 11 September, 2024; originally announced September 2024.

Comments: Accepted for publication in Astronomy & Astrophysics. 21 pages, 26 figures, and 8 tables. Abstract abridged

arXiv:2409.06887 [pdf, other]

Ordinal Learning: Longitudinal Attention Alignment Model for Predicting Time to Future Breast Cancer Events from Mammograms

Authors: Xin Wang, Tao Tan, Yuan Gao, Eric Marcus, Luyi Han, Antonio Portaluri, Tianyu Zhang, Chunyao Lu, Xinglong Liang, Regina Beets-Tan, Jonas Teuwen, Ritse Mann

Abstract: Precision breast cancer (BC) risk assessment is crucial for developing individualized screening and prevention. Despite the promising potential of recent mammogram (MG) based deep learning models in predicting BC risk, they mostly overlook the 'time-to-future-event' ordering among patients and exhibit limited explorations into how they track history changes in breast tissue, thereby limiting their… ▽ More Precision breast cancer (BC) risk assessment is crucial for developing individualized screening and prevention. Despite the promising potential of recent mammogram (MG) based deep learning models in predicting BC risk, they mostly overlook the 'time-to-future-event' ordering among patients and exhibit limited explorations into how they track history changes in breast tissue, thereby limiting their clinical application. In this work, we propose a novel method, named OA-BreaCR, to precisely model the ordinal relationship of the time to and between BC events while incorporating longitudinal breast tissue changes in a more explainable manner. We validate our method on public EMBED and inhouse datasets, comparing with existing BC risk prediction and time prediction methods. Our ordinal learning method OA-BreaCR outperforms existing methods in both BC risk and time-to-future-event prediction tasks. Additionally, ordinal heatmap visualizations show the model's attention over time. Our findings underscore the importance of interpretable and precise risk assessment for enhancing BC screening and prevention efforts. The code will be accessible to the public. △ Less

Submitted 10 September, 2024; originally announced September 2024.

arXiv:2409.06775 [pdf, other]

Wavefunction approach to the fractional anomalous Hall crystal

Authors: Tixuan Tan, Julian May-Mann, Trithep Devakul

Abstract: We propose fractional anomalous Hall crystals (FAHCs) as possible ground states of strongly interacting electrons in parent bands with Berry curvature. FAHCs are exotic states of matter that spontaneously break continuous translation symmetry to form a fractional Chern insulator. We construct a unified family of variational wavefunctions that describe FAHCs and their competing states in the presen… ▽ More We propose fractional anomalous Hall crystals (FAHCs) as possible ground states of strongly interacting electrons in parent bands with Berry curvature. FAHCs are exotic states of matter that spontaneously break continuous translation symmetry to form a fractional Chern insulator. We construct a unified family of variational wavefunctions that describe FAHCs and their competing states in the presence of uniform parent Berry curvature. We calculate their variational energy with Coulomb interactions semi-analytically in the thermodynamic limit. Our analysis reveals that FAHCs can be energetically favorable over both Wigner crystals and integer anomalous Hall crystals for sufficiently strong interactions or flat dispersion. △ Less

Submitted 10 September, 2024; originally announced September 2024.

arXiv:2409.05455 [pdf, other]

Universal Quantum Gate Set for Gottesman-Kitaev-Preskill Logical Qubits

Authors: V. G. Matsos, C. H. Valahu, M. J. Millican, T. Navickas, X. C. Kolesnikow, M. J. Biercuk, T. R. Tan

Abstract: The realisation of a universal quantum computer at scale promises to deliver a paradigm shift in information processing, providing the capability to solve problems that are intractable with conventional computers. A key limiting factor of realising fault-tolerant quantum information processing (QIP) is the large ratio of physical-to-logical qubits that outstrip device sizes available in the near f… ▽ More The realisation of a universal quantum computer at scale promises to deliver a paradigm shift in information processing, providing the capability to solve problems that are intractable with conventional computers. A key limiting factor of realising fault-tolerant quantum information processing (QIP) is the large ratio of physical-to-logical qubits that outstrip device sizes available in the near future. An alternative approach proposed by Gottesman, Kitaev, and Preskill (GKP) encodes a single logical qubit into a single harmonic oscillator, alleviating this hardware overhead in exchange for a more complex encoding. Owing to this complexity, current experiments with GKP codes have been limited to single-qubit encodings and operations. Here, we report on the experimental demonstration of a universal gate set for the GKP code, which includes single-qubit gates and -- for the first time -- a two-qubit entangling gate between logical code words. Our scheme deterministically implements energy-preserving quantum gates on finite-energy GKP states encoded in the mechanical motion of a trapped ion. This is achieved by a novel optimal control strategy that dynamically modulates an interaction between the ion's spin and motion. We demonstrate single-qubit gates with a logical process fidelity as high as 0.960 and a two-qubit entangling gate with a logical process fidelity of 0.680. We also directly create a GKP Bell state from the oscillators' ground states in a single step with a logical state fidelity of 0.842. The overall scheme is compatible with existing hardware architectures, highlighting the opportunity to leverage optimal control strategies as a key accelerant towards fault tolerance. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.05289 [pdf, other]

Developing Path Planning with Behavioral Cloning and Proximal Policy Optimization for Path-Tracking and Static Obstacle Nudging

Authors: Mingyan Zhou, Biao Wang, Tian Tan, Xiatao Sun

Abstract: In autonomous driving, end-to-end methods utilizing Imitation Learning (IL) and Reinforcement Learning (RL) are becoming more and more common. However, they do not involve explicit reasoning like classic robotics workflow and planning with horizons, resulting in strategies implicit and myopic. In this paper, we introduce a path planning method that uses Behavioral Cloning (BC) for path-tracking an… ▽ More In autonomous driving, end-to-end methods utilizing Imitation Learning (IL) and Reinforcement Learning (RL) are becoming more and more common. However, they do not involve explicit reasoning like classic robotics workflow and planning with horizons, resulting in strategies implicit and myopic. In this paper, we introduce a path planning method that uses Behavioral Cloning (BC) for path-tracking and Proximal Policy Optimization (PPO) for static obstacle nudging. It outputs lateral offset values to adjust the given reference waypoints and performs modified path for different controllers. Experimental results show that the algorithm can do path following that mimics the expert performance of path-tracking controllers, and avoid collision to fixed obstacles. The method makes a good attempt at planning with learning-based methods in path planning problems of autonomous driving. △ Less

Submitted 22 October, 2024; v1 submitted 8 September, 2024; originally announced September 2024.

Comments: 6 pages, 8 figures

arXiv:2409.04044 [pdf, other]

Experimental Quantum Simulation of Chemical Dynamics

Authors: T. Navickas, R. J. MacDonell, C. H. Valahu, V. C. Olaya-Agudelo, F. Scuccimarra, M. J. Millican, V. G. Matsos, H. L. Nourse, A. D. Rao, M. J. Biercuk, C. Hempel, I. Kassal, T. R. Tan

Abstract: Simulating chemistry is likely to be among the earliest applications of quantum computing. However, existing digital quantum algorithms for chemical simulation require many logical qubits and gates, placing practical applications beyond existing technology. Here, we use an analog approach to carry out the first quantum simulations of chemical reactions. In particular, we simulate photoinduced non-… ▽ More Simulating chemistry is likely to be among the earliest applications of quantum computing. However, existing digital quantum algorithms for chemical simulation require many logical qubits and gates, placing practical applications beyond existing technology. Here, we use an analog approach to carry out the first quantum simulations of chemical reactions. In particular, we simulate photoinduced non-adiabatic dynamics, one of the most challenging classes of problems in quantum chemistry because they involve strong coupling and entanglement between electronic and nuclear motions. We use a mixed-qudit-boson (MQB) analog simulator, which encodes information in both the electronic and vibrational degrees of freedom of a trapped ion. We demonstrate its programmability and versatility by simulating the dynamics of three different molecules as well as open-system dynamics in the condensed phase, all with the same quantum resources. Our approach requires orders of magnitude fewer resources than equivalent digital quantum simulations, demonstrating the potential of analog quantum simulators for near-term simulations of complex chemical reactions. △ Less

Submitted 6 September, 2024; originally announced September 2024.

arXiv:2409.01341 [pdf, other]

Enhancing Test Time Adaptation with Few-shot Guidance

Authors: Siqi Luo, Yi Xin, Yuntao Du, Zhongwei Wan, Tao Tan, Guangtao Zhai, Xiaohong Liu

Abstract: Deep neural networks often encounter significant performance drops while facing with domain shifts between training (source) and test (target) data. To address this issue, Test Time Adaptation (TTA) methods have been proposed to adapt pre-trained source model to handle out-of-distribution streaming target data. Although these methods offer some relief, they lack a reliable mechanism for domain shi… ▽ More Deep neural networks often encounter significant performance drops while facing with domain shifts between training (source) and test (target) data. To address this issue, Test Time Adaptation (TTA) methods have been proposed to adapt pre-trained source model to handle out-of-distribution streaming target data. Although these methods offer some relief, they lack a reliable mechanism for domain shift correction, which can often be erratic in real-world applications. In response, we develop Few-Shot Test Time Adaptation (FS-TTA), a novel and practical setting that utilizes a few-shot support set on top of TTA. Adhering to the principle of few inputs, big gains, FS-TTA reduces blind exploration in unseen target domains. Furthermore, we propose a two-stage framework to tackle FS-TTA, including (i) fine-tuning the pre-trained source model with few-shot support set, along with using feature diversity augmentation module to avoid overfitting, (ii) implementing test time adaptation based on prototype memory bank guidance to produce high quality pseudo-label for model adaptation. Through extensive experiments on three cross-domain classification benchmarks, we demonstrate the superior performance and reliability of our FS-TTA and framework. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: 8 pages, 7 figures

arXiv:2408.13257 [pdf, other]

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

Authors: Yi-Fan Zhang, Huanyu Zhang, Haochen Tian, Chaoyou Fu, Shuangqing Zhang, Junfei Wu, Feng Li, Kun Wang, Qingsong Wen, Zhang Zhang, Liang Wang, Rong Jin, Tieniu Tan

Abstract: Comprehensive evaluation of Multimodal Large Language Models (MLLMs) has recently garnered widespread attention in the research community. However, we observe that existing benchmarks present several common barriers that make it difficult to measure the significant challenges that models face in the real world, including: 1) small data scale leads to a large performance variance; 2) reliance on mo… ▽ More Comprehensive evaluation of Multimodal Large Language Models (MLLMs) has recently garnered widespread attention in the research community. However, we observe that existing benchmarks present several common barriers that make it difficult to measure the significant challenges that models face in the real world, including: 1) small data scale leads to a large performance variance; 2) reliance on model-based annotations results in restricted data quality; 3) insufficient task difficulty, especially caused by the limited image resolution. To tackle these issues, we introduce MME-RealWorld. Specifically, we collect more than $300$K images from public datasets and the Internet, filtering $13,366$ high-quality images for annotation. This involves the efforts of professional $25$ annotators and $7$ experts in MLLMs, contributing to $29,429$ question-answer pairs that cover $43$ subtasks across $5$ real-world scenarios, extremely challenging even for humans. As far as we know, MME-RealWorld is the largest manually annotated benchmark to date, featuring the highest resolution and a targeted focus on real-world applications. We further conduct a thorough evaluation involving $28$ prominent MLLMs, such as GPT-4o, Gemini 1.5 Pro, and Claude 3.5 Sonnet. Our results show that even the most advanced models struggle with our benchmarks, where none of them reach $60\%$ accuracy. The challenges of perceiving high-resolution images and understanding complex real-world scenarios remain urgent issues to be addressed. The data and evaluation code are released at https://mme-realworld.github.io/ . △ Less

Submitted 11 September, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

Comments: Project Page: https://mme-realworld.github.io/

arXiv:2408.12141 [pdf, other]

TRRG: Towards Truthful Radiology Report Generation With Cross-modal Disease Clue Enhanced Large Language Model

Authors: Yuhao Wang, Chao Hao, Yawen Cui, Xinqi Su, Weicheng Xie, Tao Tan, Zitong Yu

Abstract: The vision-language modeling capability of multi-modal large language models has attracted wide attention from the community. However, in medical domain, radiology report generation using vision-language models still faces significant challenges due to the imbalanced data distribution caused by numerous negated descriptions in radiology reports and issues such as rough alignment between radiology… ▽ More The vision-language modeling capability of multi-modal large language models has attracted wide attention from the community. However, in medical domain, radiology report generation using vision-language models still faces significant challenges due to the imbalanced data distribution caused by numerous negated descriptions in radiology reports and issues such as rough alignment between radiology reports and radiography. In this paper, we propose a truthful radiology report generation framework, namely TRRG, based on stage-wise training for cross-modal disease clue injection into large language models. In pre-training stage, During the pre-training phase, contrastive learning is employed to enhance the ability of visual encoder to perceive fine-grained disease details. In fine-tuning stage, the clue injection module we proposed significantly enhances the disease-oriented perception capability of the large language model by effectively incorporating the robust zero-shot disease perception. Finally, through the cross-modal clue interaction module, our model effectively achieves the multi-granular interaction of visual embeddings and an arbitrary number of disease clue embeddings. This significantly enhances the report generation capability and clinical effectiveness of multi-modal large language models in the field of radiology reportgeneration. Experimental results demonstrate that our proposed pre-training and fine-tuning framework achieves state-of-the-art performance in radiology report generation on datasets such as IU-Xray and MIMIC-CXR. Further analysis indicates that our proposed method can effectively enhance the model to perceive diseases and improve its clinical effectiveness. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.12095 [pdf, other]

uMedSum: A Unified Framework for Advancing Medical Abstractive Summarization

Authors: Aishik Nagar, Yutong Liu, Andy T. Liu, Viktor Schlegel, Vijay Prakash Dwivedi, Arun-Kumar Kaliya-Perumal, Guna Pratheep Kalanchiam, Yili Tang, Robby T. Tan

Abstract: Medical abstractive summarization faces the challenge of balancing faithfulness and informativeness. Current methods often sacrifice key information for faithfulness or introduce confabulations when prioritizing informativeness. While recent advancements in techniques like in-context learning (ICL) and fine-tuning have improved medical summarization, they often overlook crucial aspects such as fai… ▽ More Medical abstractive summarization faces the challenge of balancing faithfulness and informativeness. Current methods often sacrifice key information for faithfulness or introduce confabulations when prioritizing informativeness. While recent advancements in techniques like in-context learning (ICL) and fine-tuning have improved medical summarization, they often overlook crucial aspects such as faithfulness and informativeness without considering advanced methods like model reasoning and self-improvement. Moreover, the field lacks a unified benchmark, hindering systematic evaluation due to varied metrics and datasets. This paper addresses these gaps by presenting a comprehensive benchmark of six advanced abstractive summarization methods across three diverse datasets using five standardized metrics. Building on these findings, we propose uMedSum, a modular hybrid summarization framework that introduces novel approaches for sequential confabulation removal followed by key missing information addition, ensuring both faithfulness and informativeness. Our work improves upon previous GPT-4-based state-of-the-art (SOTA) medical summarization methods, significantly outperforming them in both quantitative metrics and qualitative domain expert evaluations. Notably, we achieve an average relative performance improvement of 11.8% in reference-free metrics over the previous SOTA. Doctors prefer uMedSum's summaries 6 times more than previous SOTA in difficult cases where there are chances of confabulations or missing information. These results highlight uMedSum's effectiveness and generalizability across various datasets and metrics, marking a significant advancement in medical summarization. △ Less

Submitted 25 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

Comments: 12 pages

arXiv:2408.09144 [pdf, other]

SSNeRF: Sparse View Semi-supervised Neural Radiance Fields with Augmentation

Authors: Xiao Cao, Beibei Lin, Bo Wang, Zhiyong Huang, Robby T. Tan

Abstract: Sparse view NeRF is challenging because limited input images lead to an under constrained optimization problem for volume rendering. Existing methods address this issue by relying on supplementary information, such as depth maps. However, generating this supplementary information accurately remains problematic and often leads to NeRF producing images with undesired artifacts. To address these arti… ▽ More Sparse view NeRF is challenging because limited input images lead to an under constrained optimization problem for volume rendering. Existing methods address this issue by relying on supplementary information, such as depth maps. However, generating this supplementary information accurately remains problematic and often leads to NeRF producing images with undesired artifacts. To address these artifacts and enhance robustness, we propose SSNeRF, a sparse view semi supervised NeRF method based on a teacher student framework. Our key idea is to challenge the NeRF module with progressively severe sparse view degradation while providing high confidence pseudo labels. This approach helps the NeRF model become aware of noise and incomplete information associated with sparse views, thus improving its robustness. The novelty of SSNeRF lies in its sparse view specific augmentations and semi supervised learning mechanism. In this approach, the teacher NeRF generates novel views along with confidence scores, while the student NeRF, perturbed by the augmented input, learns from the high confidence pseudo labels. Our sparse view degradation augmentation progressively injects noise into volume rendering weights, perturbs feature maps in vulnerable layers, and simulates sparse view blurriness. These augmentation strategies force the student NeRF to recognize degradation and produce clearer rendered views. By transferring the student's parameters to the teacher, the teacher gains increased robustness in subsequent training iterations. Extensive experiments demonstrate the effectiveness of our SSNeRF in generating novel views with less sparse view degradation. We will release code upon acceptance. △ Less

Submitted 17 August, 2024; originally announced August 2024.

arXiv:2408.07516 [pdf, other]

DIffSteISR: Harnessing Diffusion Prior for Superior Real-world Stereo Image Super-Resolution

Authors: Yuanbo Zhou, Xinlin Zhang, Wei Deng, Tao Wang, Tao Tan, Qinquan Gao, Tong Tong

Abstract: We introduce DiffSteISR, a pioneering framework for reconstructing real-world stereo images. DiffSteISR utilizes the powerful prior knowledge embedded in pre-trained text-to-image model to efficiently recover the lost texture details in low-resolution stereo images. Specifically, DiffSteISR implements a time-aware stereo cross attention with temperature adapter (TASCATA) to guide the diffusion pro… ▽ More We introduce DiffSteISR, a pioneering framework for reconstructing real-world stereo images. DiffSteISR utilizes the powerful prior knowledge embedded in pre-trained text-to-image model to efficiently recover the lost texture details in low-resolution stereo images. Specifically, DiffSteISR implements a time-aware stereo cross attention with temperature adapter (TASCATA) to guide the diffusion process, ensuring that the generated left and right views exhibit high texture consistency thereby reducing disparity error between the super-resolved images and the ground truth (GT) images. Additionally, a stereo omni attention control network (SOA ControlNet) is proposed to enhance the consistency of super-resolved images with GT images in the pixel, perceptual, and distribution space. Finally, DiffSteISR incorporates a stereo semantic extractor (SSE) to capture unique viewpoint soft semantic information and shared hard tag semantic information, thereby effectively improving the semantic accuracy and consistency of the generated left and right images. Extensive experimental results demonstrate that DiffSteISR accurately reconstructs natural and precise textures from low-resolution stereo images while maintaining a high consistency of semantic and texture between the left and right views. △ Less

Submitted 14 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

arXiv:2408.05053 [pdf, ps, other]

Odd Covers of Complete Graphs and Hypergraphs

Authors: Imre Leader, Ta Sheng Tan

Abstract: The `odd cover number' of a complete graph is the smallest size of a family of complete bipartite graphs that covers each edge an odd number of times. For $n$ odd, Buchanan, Clifton, Culver, Nie, O'Neill, Rombach and Yin showed that the odd cover number of $K_n$ is equal to $(n+1)/2$ or $(n+3)/2$, and they conjectured that it is always $(n+1)/2$. We prove this conjecture. For $n$ even, Babai and… ▽ More The `odd cover number' of a complete graph is the smallest size of a family of complete bipartite graphs that covers each edge an odd number of times. For $n$ odd, Buchanan, Clifton, Culver, Nie, O'Neill, Rombach and Yin showed that the odd cover number of $K_n$ is equal to $(n+1)/2$ or $(n+3)/2$, and they conjectured that it is always $(n+1)/2$. We prove this conjecture. For $n$ even, Babai and Frankl showed that the odd cover number of $K_n$ is always at least $n/2$, and the above authors and Radhakrishnan, Sen and Vishwanathan gave some values of $n$ for which equality holds. We give some new examples. Our constructions arise from some very symmetric constructions for the corresponding problem for complete hypergraphs. Thus the odd cover number of the complete 3-graph $K_n^{(3)}$ is the smallest number of complete 3-partite 3-graphs such that each 3-set is in an odd number of them. We show that the odd cover number of $K_n^{(3)}$ is exactly $n/2$ for even $n$, and we show that for odd $n$ it is $(n-1)/2$ for infinitely many values of $n$. We also show that for $r=3$ and $r=4$ the odd cover number of $K_n^{(r)}$ is strictly less than the partition number, answering a question of Buchanan, Clifton, Culver, Nie, O'Neill, Rombach and Yin for those values of $r$. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: 7 pages

MSC Class: 05C70

arXiv:2408.02653 [pdf, other]

Importance of electron-phonon coupling near the electron-liquid to Wigner-crystal transition in two-dimensional atomically thin materials

Authors: Tixuan Tan, Vladimir Calvera, Steven A. Kivelson

Abstract: We study the effect of electron-phonon coupling on the location of the Fermi Liquid to Wigner Crystal transition in the two-dimensional electron gas realized in various material platforms. Based on dimensional estimates of the relevant parameters, we conclude that (as conventionally assumed) phonons are negligible in traditional semiconductor quantum well systems, but likely play a significant rol… ▽ More We study the effect of electron-phonon coupling on the location of the Fermi Liquid to Wigner Crystal transition in the two-dimensional electron gas realized in various material platforms. Based on dimensional estimates of the relevant parameters, we conclude that (as conventionally assumed) phonons are negligible in traditional semiconductor quantum well systems, but likely play a significant role in various recently synthesized atomically thin two-dimensional materials. △ Less

Submitted 5 August, 2024; originally announced August 2024.

Comments: 4 + 4 pages, 2 figures

arXiv:2407.21650 [pdf, other]

doi 10.3847/1538-3881/ad543b

TESS Giants Transiting Giants. VI. Newly Discovered Hot Jupiters Provide Evidence for Efficient Obliquity Damping after the Main Sequence

Authors: Nicholas Saunders, Samuel K. Grunblatt, Ashley Chontos, Fei Dai, Daniel Huber, Jingwen Zhang, Gudmundur Stefansson, Jennifer L. van Saders, Joshua N. Winn, Daniel Hey, Andrew W. Howard, Benjamin Fulton, Howard Isaacson, Corey Beard, Steven Giacalone, Judah van Zandt, Joseph M. Akana Murphey, Malena Rice, Sarah Blunt, Emma Turtelboom, Paul A. Dalba, Jack Lubin, Casey Brinkman, Emma M. Louden, Emma Page , et al. (31 additional authors not shown)

Abstract: The degree of alignment between a star's spin axis and the orbital plane of its planets (the stellar obliquity) is related to interesting and poorly understood processes that occur during planet formation and evolution. Hot Jupiters orbiting hot stars ($\gtrsim$6250 K) display a wide range of obliquities, while similar planets orbiting cool stars are preferentially aligned. Tidal dissipation is ex… ▽ More The degree of alignment between a star's spin axis and the orbital plane of its planets (the stellar obliquity) is related to interesting and poorly understood processes that occur during planet formation and evolution. Hot Jupiters orbiting hot stars ($\gtrsim$6250 K) display a wide range of obliquities, while similar planets orbiting cool stars are preferentially aligned. Tidal dissipation is expected to be more rapid in stars with thick convective envelopes, potentially explaining this trend. Evolved stars provide an opportunity to test the damping hypothesis, particularly stars that were hot on the main sequence and have since cooled and developed deep convective envelopes. We present the first systematic study of the obliquities of hot Jupiters orbiting subgiants that recently developed convective envelopes using Rossiter-McLaughlin observations. Our sample includes two newly discovered systems in the Giants Transiting Giants Survey (TOI-6029 b, TOI-4379 b). We find that the orbits of hot Jupiters orbiting subgiants that have cooled below $\sim$6250 K are aligned or nearly aligned with the spin-axis of their host stars, indicating rapid tidal realignment after the emergence of a stellar convective envelope. We place an upper limit for the timescale of realignment for hot Jupiters orbiting subgiants at $\sim$500 Myr. Comparison with a simplified tidal evolution model shows that obliquity damping needs to be $\sim$4 orders of magnitude more efficient than orbital period decay to damp the obliquity without destroying the planet, which is consistent with recent predictions for tidal dissipation from inertial waves excited by hot Jupiters on misaligned orbits. △ Less

Submitted 31 July, 2024; originally announced July 2024.

Comments: 22 pages, 14 figures, 3 tables

Journal ref: AJ, 168, 2 (2024)

arXiv:2407.19735 [pdf, other]

Scalable High-Dimensional Multipartite Entanglement with Trapped Ions

Authors: Harsh Vardhan Upadhyay, Sanket Kumar Tripathy, Ting Rei Tan, Baladitya Suri, Athreya Shankar

Abstract: We propose a protocol for the preparation of generalized Greenberger-Horne-Zeilinger (GHZ) states of $N$ atoms each with $d=3$ or $4$ internal levels. We generalize the celebrated one-axis twisting (OAT) Hamiltonian for $N$ qubits to qudits by including OAT interactions of equal strengths between every pair of qudit levels, a protocol we call as balanced OAT (BOAT). Analogous to OAT for qubits, we… ▽ More We propose a protocol for the preparation of generalized Greenberger-Horne-Zeilinger (GHZ) states of $N$ atoms each with $d=3$ or $4$ internal levels. We generalize the celebrated one-axis twisting (OAT) Hamiltonian for $N$ qubits to qudits by including OAT interactions of equal strengths between every pair of qudit levels, a protocol we call as balanced OAT (BOAT). Analogous to OAT for qubits, we find that starting from a product state of an arbitrary number of atoms $N$, dynamics under BOAT leads to the formation of GHZ states for qutrits ($d=3$) and ququarts ($d=4$). While BOAT could potentially be realized on several platforms where all-to-all coupling is possible, here we propose specific implementations using trapped ion systems. We show that preparing these states with a fidelity above a threshold value rules out lower dimensional entanglement than that of the generalized GHZ states. For qutrits, we also propose a protocol to bound the fidelity that requires only global addressing of the ion crystal and single-shot readout of one of the levels. Our results open a path for the scalable generation and certification of high-dimensional multipartite entanglement on current atom-based quantum hardware. △ Less

Submitted 29 July, 2024; originally announced July 2024.

Comments: 12 pages, 1 figure; comments welcome

arXiv:2407.19088 [pdf]

Shaping Integrity: Why Generative Artificial Intelligence Does Not Have to Undermine Education

Authors: Myles Joshua Toledo Tan, Nicholle Mae Amor Tan Maravilla

Abstract: This paper examines the role of generative artificial intelligence (GAI) in promoting academic integrity within educational settings. It explores how AI can be ethically integrated into classrooms to enhance learning experiences, foster intrinsic motivation, and support voluntary behavior change among students. By analyzing established ethical frameworks and educational theories such as deontologi… ▽ More This paper examines the role of generative artificial intelligence (GAI) in promoting academic integrity within educational settings. It explores how AI can be ethically integrated into classrooms to enhance learning experiences, foster intrinsic motivation, and support voluntary behavior change among students. By analyzing established ethical frameworks and educational theories such as deontological ethics, consequentialism, constructivist learning, and Self-Determination Theory (SDT), the paper argues that GAI, when used responsibly, can enhance digital literacy, encourage genuine knowledge construction, and uphold ethical standards in education. This research highlights the potential of GAI to create enriching, personalized learning environments that prepare students to navigate the complexities of the modern world ethically and effectively. △ Less

Submitted 10 October, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

Comments: 18 pages, 0 figures

arXiv:2407.18242 [pdf, other]

LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

Authors: Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, Tieniu Tan

Abstract: Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models. Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning. In this paper, we first uncover a fundamental connection between the optimization processes of LoRA and full fine-tuning: using LoRA for optimization is mathema… ▽ More Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models. Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning. In this paper, we first uncover a fundamental connection between the optimization processes of LoRA and full fine-tuning: using LoRA for optimization is mathematically equivalent to full fine-tuning using a low-rank gradient for parameter updates. And this low-rank gradient can be expressed in terms of the gradients of the two low-rank matrices in LoRA. Leveraging this insight, we introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of these low-rank matrices. This adjustment allows the low-rank gradient to more accurately approximate the full fine-tuning gradient, thereby narrowing the performance gap between LoRA and full fine-tuning. Furthermore, we theoretically derive the optimal solutions for adjusting the gradients of the low-rank matrices, applying them during fine-tuning in LoRA-Pro. We conduct extensive experiments across natural language understanding, dialogue generation, mathematical reasoning, code generation, and image classification tasks, demonstrating that LoRA-Pro substantially improves LoRA's performance, effectively narrowing the gap with full fine-tuning. Code is publicly available at \url{https://github.com/mrflogs/LoRA-Pro}. △ Less

Submitted 15 October, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.17819 [pdf, other]

Simulating open-system molecular dynamics on analog quantum computers

Authors: V. C. Olaya-Agudelo, B. Stewart, C. H. Valahu, R. J. MacDonell, M. J. Millican, V. G. Matsos, F. Scuccimarra, T. R. Tan, I. Kassal

Abstract: Interactions of molecules with their environment influence the course and outcome of almost all chemical reactions. However, classical computers struggle to accurately simulate complicated molecule-environment interactions because of the steep growth of computational resources with both molecule size and environment complexity. Therefore, many quantum-chemical simulations are restricted to isolate… ▽ More Interactions of molecules with their environment influence the course and outcome of almost all chemical reactions. However, classical computers struggle to accurately simulate complicated molecule-environment interactions because of the steep growth of computational resources with both molecule size and environment complexity. Therefore, many quantum-chemical simulations are restricted to isolated molecules, whose dynamics can dramatically differ from what happens in an environment. Here, we show that analog quantum simulators can simulate open molecular systems by using the native dissipation of the simulator and injecting additional controllable dissipation. By exploiting the native dissipation to simulate the molecular dissipation -- rather than seeing it as a limitation -- our approach enables longer simulations of open systems than are possible for closed systems. In particular, we show that trapped-ion simulators using a mixed qudit-boson (MQB) encoding could simulate molecules in a wide range of condensed phases by implementing widely used dissipative processes within the Lindblad formalism, including pure dephasing and both electronic and vibrational relaxation. The MQB open-system simulations require significantly fewer additional quantum resources compared to both classical and digital quantum approaches. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.15451 [pdf, other]

Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions

Authors: Yihao Ai, Yifei Qi, Bo Wang, Yu Cheng, Xinchao Wang, Robby T. Tan

Abstract: Existing 2D human pose estimation research predominantly concentrates on well-lit scenarios, with limited exploration of poor lighting conditions, which are a prevalent aspect of daily life. Recent studies on low-light pose estimation require the use of paired well-lit and low-light images with ground truths for training, which are impractical due to the inherent challenges associated with annotat… ▽ More Existing 2D human pose estimation research predominantly concentrates on well-lit scenarios, with limited exploration of poor lighting conditions, which are a prevalent aspect of daily life. Recent studies on low-light pose estimation require the use of paired well-lit and low-light images with ground truths for training, which are impractical due to the inherent challenges associated with annotation on low-light images. To this end, we introduce a novel approach that eliminates the need for low-light ground truths. Our primary novelty lies in leveraging two complementary-teacher networks to generate more reliable pseudo labels, enabling our model achieves competitive performance on extremely low-light images without the need for training with low-light ground truths. Our framework consists of two stages. In the first stage, our model is trained on well-lit data with low-light augmentations. In the second stage, we propose a dual-teacher framework to utilize the unlabeled low-light data, where a center-based main teacher produces the pseudo labels for relatively visible cases, while a keypoints-based complementary teacher focuses on producing the pseudo labels for the missed persons of the main teacher. With the pseudo labels from both teachers, we propose a person-specific low-light augmentation to challenge a student model in training to outperform the teachers. Experimental results on real low-light dataset (ExLPose-OCN) show, our method achieves 6.8% (2.4 AP) improvement over the state-of-the-art (SOTA) method, despite no low-light ground-truth data is used in our approach, in contrast to the SOTA method. Our code will be available at:https://github.com/ayh015-dev/DA-LLPose. △ Less

Submitted 23 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

Comments: 18 pages, 3 figure. Accepted by ECCV24

arXiv:2407.14825 [pdf]

3D-printed axicon enables extended depth-of-focus intravascular optical coherence tomography

Authors: Pavel Ruchka, Alok Kushwaha, Jessica A. Marathe, Lei Xiang, Rouyan Chen, Rodney Kirk, Joanne T. M. Tan, Christina A. Bursill, Johan Verjans, Simon Thiele, Robert Fitridge, Robert A. McLaughlin, Peter J. Psaltis, Harald Giessen, Jiawen Li

Abstract: A fundamental challenge in endoscopy is how to fabricate a small fiber-optic probe that can achieve comparable function to probes with large, complicated optics (e.g., high resolution and extended depth of focus). To achieve high resolution over an extended depth of focus (DOF), the application of needle-like beams has been proposed. However, existing methods using miniaturized needle beam designs… ▽ More A fundamental challenge in endoscopy is how to fabricate a small fiber-optic probe that can achieve comparable function to probes with large, complicated optics (e.g., high resolution and extended depth of focus). To achieve high resolution over an extended depth of focus (DOF), the application of needle-like beams has been proposed. However, existing methods using miniaturized needle beam designs fail to adequately correct astigmatism and other monochromatic aberrations, limiting the resolution of at least one axis. Here, we describe a novel approach to realize freeform beam-shaping endoscopic probes via two-photon direct laser writing, also known as micro 3D-printing. We present a design achieving approximately 8-micron resolution with a DOF of >0.8 mm at a central wavelength of 1310 nm. The probe has a diameter of 0.25 mm (without the catheter sheaths) and is fabricated using a single printing step directly on the optical fiber. We demonstrate our device in intravascular imaging of living atherosclerotic pigs at multiple time points, as well as human arteries with plaques ex vivo. This is the first step to enable beam-tailoring endoscopic probes which achieve diffraction-limited resolution over a large DOF. △ Less

Submitted 20 July, 2024; originally announced July 2024.

arXiv:2407.12445 [pdf, other]

A Comprehensive Sustainable Framework for Machine Learning and Artificial Intelligence

Authors: Roberto Pagliari, Peter Hill, Po-Yu Chen, Maciej Dabrowny, Tingsheng Tan, Francois Buet-Golfouse

Abstract: In financial applications, regulations or best practices often lead to specific requirements in machine learning relating to four key pillars: fairness, privacy, interpretability and greenhouse gas emissions. These all sit in the broader context of sustainability in AI, an emerging practical AI topic. However, although these pillars have been individually addressed by past literature, none of thes… ▽ More In financial applications, regulations or best practices often lead to specific requirements in machine learning relating to four key pillars: fairness, privacy, interpretability and greenhouse gas emissions. These all sit in the broader context of sustainability in AI, an emerging practical AI topic. However, although these pillars have been individually addressed by past literature, none of these works have considered all the pillars. There are inherent trade-offs between each of the pillars (for example, accuracy vs fairness or accuracy vs privacy), making it even more important to consider them together. This paper outlines a new framework for Sustainable Machine Learning and proposes FPIG, a general AI pipeline that allows for these critical topics to be considered simultaneously to learn the trade-offs between the pillars better. Based on the FPIG framework, we propose a meta-learning algorithm to estimate the four key pillars given a dataset summary, model architecture, and hyperparameters before model training. This algorithm allows users to select the optimal model architecture for a given dataset and a given set of user requirements on the pillars. We illustrate the trade-offs under the FPIG model on three classical datasets and demonstrate the meta-learning approach with an example of real-world datasets and models with different interpretability, showcasing how it can aid model selection. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: 8 pages, 3 figures, 4 tables, ECAI 24'

ACM Class: I.2.0

arXiv:2407.11536 [pdf, other]

Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise

Authors: Qimin Yang, Rongsheng Wang, Jiexin Chen, Runqi Su, Tao Tan

Abstract: Large Language Models (LLMs) have been widely applied in various professional fields. By fine-tuning the models using domain specific question and answer datasets, the professional domain knowledge and Q\&A abilities of these models have significantly improved, for example, medical professional LLMs that use fine-tuning of doctor-patient Q\&A data exhibit extraordinary disease diagnostic abilities… ▽ More Large Language Models (LLMs) have been widely applied in various professional fields. By fine-tuning the models using domain specific question and answer datasets, the professional domain knowledge and Q\&A abilities of these models have significantly improved, for example, medical professional LLMs that use fine-tuning of doctor-patient Q\&A data exhibit extraordinary disease diagnostic abilities. However, we observed that despite improvements in specific domain knowledge, the performance of medical LLM in long-context understanding has significantly declined, especially compared to general language models with similar parameters. The purpose of this study is to investigate the phenomenon of reduced performance in understanding long-context in medical LLM. We designed a series of experiments to conduct open-book professional knowledge exams on all models to evaluate their ability to read long-context. By adjusting the proportion and quantity of general data and medical data in the process of fine-tuning, we can determine the best data composition to optimize the professional model and achieve a balance between long-context performance and specific domain knowledge. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 5 pages, 1 figure. Accepted by the Workshop on Long-Context Foundation Models (LCFM) at ICML 2024

arXiv:2407.10767 [pdf, other]

Magnetic and nematic order of Bose-Fermi mixtures in moiré superlattices of 2D semiconductors

Authors: Feng-Ren Fan, Tixuan Tan, Chengxin Xiao, Wang Yao

Abstract: We investigate the magnetic orders in a mixture of Boson (exciton) and Fermion (electron or hole) trapped in transition-metal dichalcogenides moiré superlattices. A sizable antiferromagnetic exchange interaction is found between a carrier and an interlayer exciton trapped at different high symmetry points of the moiré supercell. This interaction at a distance much shorter than the carrier-carrier… ▽ More We investigate the magnetic orders in a mixture of Boson (exciton) and Fermion (electron or hole) trapped in transition-metal dichalcogenides moiré superlattices. A sizable antiferromagnetic exchange interaction is found between a carrier and an interlayer exciton trapped at different high symmetry points of the moiré supercell. This interaction at a distance much shorter than the carrier-carrier separation dominates the magnetic order in the Bose-Fermi mixture, where the carrier sublattice develops ferromagnetism opposite to that in the exciton sublattice. We demonstrate the possibility of increasing the Curie temperature of moiré carriers through electrical tuning of the exciton density in the ground state. In a trilayer moiré system with a p-n-p type band alignment, the exciton-carrier interplay can establish a layered antiferromagnetism for holes confined in the two outer layers. We further reveal a spontaneous nematic order in the Bose-Fermi mixture, arising from the interference between the Coulomb interaction and p-wave interlayer tunneling dictated by the stacking registry. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 6 pages, 4 figures

arXiv:2407.07666 [pdf]

A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability

Authors: Ting Fang Tan, Kabilan Elangovan, Jasmine Ong, Nigam Shah, Joseph Sung, Tien Yin Wong, Lan Xue, Nan Liu, Haibo Wang, Chang Fu Kuo, Simon Chesterman, Zee Kin Yeong, Daniel SW Ting

Abstract: A comprehensive qualitative evaluation framework for large language models (LLM) in healthcare that expands beyond traditional accuracy and quantitative metrics needed. We propose 5 key aspects for evaluation of LLMs: Safety, Consensus, Objectivity, Reproducibility and Explainability (S.C.O.R.E.). We suggest that S.C.O.R.E. may form the basis for an evaluation framework for future LLM-based models… ▽ More A comprehensive qualitative evaluation framework for large language models (LLM) in healthcare that expands beyond traditional accuracy and quantitative metrics needed. We propose 5 key aspects for evaluation of LLMs: Safety, Consensus, Objectivity, Reproducibility and Explainability (S.C.O.R.E.). We suggest that S.C.O.R.E. may form the basis for an evaluation framework for future LLM-based models that are safe, reliable, trustworthy, and ethical for healthcare and clinical applications. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.06857 [pdf, other]

Enhanced Battery Degradation-Aware Scheduling for Distribution Network with Electric Vehicle Load

Authors: Vijay Babu Pamshetti, Wei Zhang, Andy Man-Fai Ng, Qingyu Yan, Kuan Tak Tan

Abstract: Batteries play a key role in today's power grid. In this paper, we investigate the impact of battery degradation on the distribution network. We formulate a multi-objective framework for optimizing battery scheduling with the goals of minimizing monetary costs and improving network performance. Our framework incorporates energy purchase and battery degradation into the costs and measures the netwo… ▽ More Batteries play a key role in today's power grid. In this paper, we investigate the impact of battery degradation on the distribution network. We formulate a multi-objective framework for optimizing battery scheduling with the goals of minimizing monetary costs and improving network performance. Our framework incorporates energy purchase and battery degradation into the costs and measures the network performance through energy losses and voltage deviation. We propose Bach for battery degradation-aware cheduling based on e-constraint and fuzzy logic methods. Bach is implemented for the IEEE 33-bus network for an experimental study. The results show the effectiveness of Bach in optimizing costs and performance simultaneously with battery degradation awareness and demonstrate the flexibility of further customization. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 3 figures

arXiv:2407.04675 [pdf, other]

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this work, we introduce Seed-ASR, a large language model (LLM) based speech recognition model. Seed-ASR is developed based on the framework of audio conditioned LLM (AcLLM), leveraging the capabilities of LLMs by inputting continuous speech representations together with contextual information into the LLM. Through stage-wise large-scale training and the elicitation of context-aware capabilities in LLM, Seed-ASR demonstrates significant improvement over end-to-end models on comprehensive evaluation sets, including multiple domains, accents/dialects and languages. Additionally, Seed-ASR can be further deployed to support specific needs in various scenarios without requiring extra language models. Compared to recently released large ASR models, Seed-ASR achieves 10%-40% reduction in word (or character, for Chinese) error rates on Chinese and English public test sets, further demonstrating its powerful performance. △ Less

Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.02911 [pdf, other]

Non-Adversarial Learning: Vector-Quantized Common Latent Space for Multi-Sequence MRI

Authors: Luyi Han, Tao Tan, Tianyu Zhang, Xin Wang, Yuan Gao, Chunyao Lu, Xinglong Liang, Haoran Dou, Yunzhi Huang, Ritse Mann

Abstract: Adversarial learning helps generative models translate MRI from source to target sequence when lacking paired samples. However, implementing MRI synthesis with adversarial learning in clinical settings is challenging due to training instability and mode collapse. To address this issue, we leverage intermediate sequences to estimate the common latent space among multi-sequence MRI, enabling the rec… ▽ More Adversarial learning helps generative models translate MRI from source to target sequence when lacking paired samples. However, implementing MRI synthesis with adversarial learning in clinical settings is challenging due to training instability and mode collapse. To address this issue, we leverage intermediate sequences to estimate the common latent space among multi-sequence MRI, enabling the reconstruction of distinct sequences from the common latent space. We propose a generative model that compresses discrete representations of each sequence to estimate the Gaussian distribution of vector-quantized common (VQC) latent space between multiple sequences. Moreover, we improve the latent space consistency with contrastive learning and increase model stability by domain augmentation. Experiments using BraTS2021 dataset show that our non-adversarial model outperforms other GAN-based methods, and VQC latent space aids our model to achieve (1) anti-interference ability, which can eliminate the effects of noise, bias fields, and artifacts, and (2) solid semantic representation ability, with the potential of one-shot segmentation. Our code is publicly available. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.00993 [pdf, other]

Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents

Authors: Shihan Deng, Weikai Xu, Hongda Sun, Wei Liu, Tao Tan, Jianfeng Liu, Ang Li, Jian Luan, Bin Wang, Rui Yan, Shuo Shang

Abstract: With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction. However, there is a scarcity of benchmarks available for LLM-based mobile agents. Benchmarking these agents generally faces three main challenges: (1) The inefficiency of UI-only operations imposes limitations to task evaluation. (2) Specific instructions… ▽ More With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction. However, there is a scarcity of benchmarks available for LLM-based mobile agents. Benchmarking these agents generally faces three main challenges: (1) The inefficiency of UI-only operations imposes limitations to task evaluation. (2) Specific instructions within a singular application lack adequacy for assessing the multi-dimensional reasoning and decision-making capacities of LLM mobile agents. (3) Current evaluation metrics are insufficient to accurately assess the process of sequential actions. To this end, we propose Mobile-Bench, a novel benchmark for evaluating the capabilities of LLM-based mobile agents. First, we expand conventional UI operations by incorporating 103 collected APIs to accelerate the efficiency of task completion. Subsequently, we collect evaluation data by combining real user queries with augmentation from LLMs. To better evaluate different levels of planning capabilities for mobile agents, our data is categorized into three distinct groups: SAST, SAMT, and MAMT, reflecting varying levels of task complexity. Mobile-Bench comprises 832 data entries, with more than 200 tasks specifically designed to evaluate multi-APP collaboration scenarios. Furthermore, we introduce a more accurate evaluation metric, named CheckPoint, to assess whether LLM-based mobile agents reach essential points during their planning and reasoning steps. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.18144 [pdf, other]

doi 10.1007/s11263-024-02153-0

Artificial Immune System of Secure Face Recognition Against Adversarial Attacks

Authors: Min Ren, Yunlong Wang, Yuhao Zhu, Yongzhen Huang, Zhenan Sun, Qi Li, Tieniu Tan

Abstract: Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future. However, optimisation is required for insect production to realise its full potential. This can be by targeted improvement of traits of interest through selective breeding, an approach which has so far been underexplored… ▽ More Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future. However, optimisation is required for insect production to realise its full potential. This can be by targeted improvement of traits of interest through selective breeding, an approach which has so far been underexplored and underutilised in insect farming. Here we present a comprehensive review of the selective breeding framework in the context of insect production. We systematically evaluate adjustments of selective breeding techniques to the realm of insects and highlight the essential components integral to the breeding process. The discussion covers every step of a conventional breeding scheme, such as formulation of breeding objectives, phenotyping, estimation of genetic parameters and breeding values, selection of appropriate breeding strategies, and mitigation of issues associated with genetic diversity depletion and inbreeding. This review combines knowledge from diverse disciplines, bridging the gap between animal breeding, quantitative genetics, evolutionary biology, and entomology, offering an integrated view of the insect breeding research area and uniting knowledge which has previously remained scattered across diverse fields of expertise. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Journal ref: International Journal of Computer Vision (IJCV), 2024

arXiv:2406.15704 [pdf, other]

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

Authors: Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang

Abstract: Speech understanding as an element of the more generic video understanding using audio-visual large language models (av-LLMs) is a crucial yet understudied aspect. This paper proposes video-SALMONN, a single end-to-end av-LLM for video processing, which can understand not only visual frame sequences, audio events and music, but speech as well. To obtain fine-grained temporal information required b… ▽ More Speech understanding as an element of the more generic video understanding using audio-visual large language models (av-LLMs) is a crucial yet understudied aspect. This paper proposes video-SALMONN, a single end-to-end av-LLM for video processing, which can understand not only visual frame sequences, audio events and music, but speech as well. To obtain fine-grained temporal information required by speech understanding, while keeping efficient for other video elements, this paper proposes a novel multi-resolution causal Q-Former (MRC Q-Former) structure to connect pre-trained audio-visual encoders and the backbone large language model. Moreover, dedicated training approaches including the diversity loss and the unpaired audio-visual mixed training scheme are proposed to avoid frames or modality dominance. On the introduced speech-audio-visual evaluation benchmark, video-SALMONN achieves more than 25\% absolute accuracy improvements on the video-QA task and over 30\% absolute accuracy improvements on audio-visual QA tasks with human speech. In addition, video-SALMONN demonstrates remarkable video comprehension and reasoning abilities on tasks that are unprecedented by other av-LLMs. Our training code and model checkpoints are available at \texttt{\url{https://github.com/bytedance/SALMONN/}}. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Accepted at ICML 2024. arXiv admin note: substantial text overlap with arXiv:2310.05863

arXiv:2406.12996 [pdf, other]

TOI-2374 b and TOI-3071 b: two metal-rich sub-Saturns well within the Neptunian desert

Authors: Alejandro Hacker, Rodrigo F. Díaz, David J. Armstrong, Jorge Fernández Fernández, Simon Müller, Elisa Delgado-Mena, Sérgio G. Sousa, Vardan Adibekyan, Keivan G. Stassun, Karen A. Collins, Samuel W. Yee, Daniel Bayliss, Allyson Bieryla, François Bouchy, R. Paul Butler, Jeffrey D. Crane, Xavier Dumusque, Joel D. Hartman, Ravit Helled, Jon Jenkins, Marcelo Aron F. Keniger, Hannah Lewis, Jorge Lillo-Box, Michael B. Lund, Louise D. Nielsen , et al. (18 additional authors not shown)

Abstract: We report the discovery of two transiting planets detected by the Transiting Exoplanet Survey Satellite (TESS), TOI-2374 b and TOI-3071 b, orbiting a K5V and an F8V star, respectively, with periods of 4.31 and 1.27 days, respectively. We confirm and characterize these two planets with a variety of ground-based and follow-up observations, including photometry, precise radial velocity monitoring and… ▽ More We report the discovery of two transiting planets detected by the Transiting Exoplanet Survey Satellite (TESS), TOI-2374 b and TOI-3071 b, orbiting a K5V and an F8V star, respectively, with periods of 4.31 and 1.27 days, respectively. We confirm and characterize these two planets with a variety of ground-based and follow-up observations, including photometry, precise radial velocity monitoring and high-resolution imaging. The planetary and orbital parameters were derived from a joint analysis of the radial velocities and photometric data. We found that the two planets have masses of $(57 \pm 4)$ $M_\oplus$ or $(0.18 \pm 0.01)$ $M_J$, and $(68 \pm 4)$ $M_\oplus$ or $(0.21 \pm 0.01)$ $M_J$, respectively, and they have radii of $(6.8 \pm 0.3)$ $R_\oplus$ or $(0.61 \pm 0.03)$ $R_J$ and $(7.2 \pm 0.5)$ $R_\oplus$ or $(0.64 \pm 0.05)$ $R_J$, respectively. These parameters correspond to sub-Saturns within the Neptunian desert, both planets being hot and highly irradiated, with $T_{\rm eq} \approx 745$ $K$ and $T_{\rm eq} \approx 1812$ $K$, respectively, assuming a Bond albedo of 0.5. TOI-3071 b has the hottest equilibrium temperature of all known planets with masses between $10$ and $300$ $M_\oplus$ and radii less than $1.5$ $R_J$. By applying gas giant evolution models we found that both planets, especially TOI-3071 b, are very metal-rich. This challenges standard formation models which generally predict lower heavy-element masses for planets with similar characteristics. We studied the evolution of the planets' atmospheres under photoevaporation and concluded that both are stable against evaporation due to their large masses and likely high metallicities in their gaseous envelopes. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 24 pages, 22 figures, 10 tables, accepted for publication in MNRAS

arXiv:2406.12447 [pdf, other]

Text-aware Speech Separation for Multi-talker Keyword Spotting

Authors: Haoyu Li, Baochen Yang, Yu Xi, Linfeng Yu, Tian Tan, Hao Li, Kai Yu

Abstract: For noisy environments, ensuring the robustness of keyword spotting (KWS) systems is essential. While much research has focused on noisy KWS, less attention has been paid to multi-talker mixed speech scenarios. Unlike the usual cocktail party problem where multi-talker speech is separated using speaker clues, the key challenge here is to extract the target speech for KWS based on text clues. To ad… ▽ More For noisy environments, ensuring the robustness of keyword spotting (KWS) systems is essential. While much research has focused on noisy KWS, less attention has been paid to multi-talker mixed speech scenarios. Unlike the usual cocktail party problem where multi-talker speech is separated using speaker clues, the key challenge here is to extract the target speech for KWS based on text clues. To address it, this paper proposes a novel Text-aware Permutation Determinization Training method for multi-talker KWS with a clue-based Speech Separation front-end (TPDT-SS). Our research highlights the critical role of SS front-ends and shows that incorporating keyword-specific clues into these models can greatly enhance the effectiveness. TPDT-SS shows remarkable success in addressing permutation problems in mixed keyword speech, thereby greatly boosting the performance of the backend. Additionally, fine-tuning our system on unseen mixed speech results in further performance improvement. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted by INTERSPEECH2024

arXiv:2406.11369 [pdf, other]

Approximation Algorithms for Smallest Intersecting Balls

Authors: Jiaqi Zheng, Tiow-Seng Tan

Abstract: We study a general smallest intersecting ball problem and its soft-margin variant in high-dimensional Euclidean spaces, which only require the input objects to be compact and convex. These two problems link and unify a series of fundamental problems in computational geometry and machine learning, including smallest enclosing ball, polytope distance, intersection radius, $\ell_1$-loss support vecto… ▽ More We study a general smallest intersecting ball problem and its soft-margin variant in high-dimensional Euclidean spaces, which only require the input objects to be compact and convex. These two problems link and unify a series of fundamental problems in computational geometry and machine learning, including smallest enclosing ball, polytope distance, intersection radius, $\ell_1$-loss support vector machine, $\ell_1$-loss support vector data description, and so on. Two general approximation algorithms are presented respectively, and implementation details are given for specific inputs of convex polytopes, reduced polytopes, axis-aligned bounding boxes, balls, and ellipsoids. For most of these inputs, our algorithms are the first results in high-dimensional spaces, and also the first approximation methods. To achieve this, we develop a novel framework for approximating zero-sum games in Euclidean Jordan algebra systems, which may be useful in its own right. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.08481 [pdf, other]

Enhancing End-to-End Autonomous Driving with Latent World Model

Authors: Yingyan Li, Lue Fan, Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang, Tieniu Tan

Abstract: End-to-end autonomous driving has garnered widespread attention. Current end-to-end approaches largely rely on the supervision from perception tasks such as detection, tracking, and map segmentation to aid in learning scene representations. However, these methods require extensive annotations, hindering the data scalability. To address this challenge, we propose a novel self-supervised method to e… ▽ More End-to-end autonomous driving has garnered widespread attention. Current end-to-end approaches largely rely on the supervision from perception tasks such as detection, tracking, and map segmentation to aid in learning scene representations. However, these methods require extensive annotations, hindering the data scalability. To address this challenge, we propose a novel self-supervised method to enhance end-to-end driving without the need for costly labels. Specifically, our framework \textbf{LAW} uses a LAtent World model to predict future latent features based on the predicted ego actions and the latent feature of the current frame. The predicted latent features are supervised by the actually observed features in the future. This supervision jointly optimizes the latent feature learning and action prediction, which greatly enhances the driving performance. As a result, our approach achieves state-of-the-art performance in both open-loop and closed-loop benchmarks without costly annotations. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07914 [pdf, other]

Can Large Language Models Understand Spatial Audio?

Authors: Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Jun Zhang, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang

Abstract: This paper explores enabling large language models (LLMs) to understand spatial information from multichannel audio, a skill currently lacking in auditory LLMs. By leveraging LLMs' advanced cognitive and inferential abilities, the aim is to enhance understanding of 3D environments via audio. We study 3 spatial audio tasks: sound source localization (SSL), far-field speech recognition (FSR), and lo… ▽ More This paper explores enabling large language models (LLMs) to understand spatial information from multichannel audio, a skill currently lacking in auditory LLMs. By leveraging LLMs' advanced cognitive and inferential abilities, the aim is to enhance understanding of 3D environments via audio. We study 3 spatial audio tasks: sound source localization (SSL), far-field speech recognition (FSR), and localisation-informed speech extraction (LSE), achieving notable progress in each task. For SSL, our approach achieves an MAE of $2.70^{\circ}$ on the Spatial LibriSpeech dataset, substantially surpassing the prior benchmark of about $6.60^{\circ}$. Moreover, our model can employ spatial cues to improve FSR accuracy and execute LSE by selectively attending to sounds originating from a specified direction via text prompts, even amidst overlapping speech. These findings highlight the potential of adapting LLMs to grasp physical audio concepts, paving the way for LLM-based agents in 3D environments. △ Less

Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted at Interspeech 2024

Showing 1–50 of 703 results for author: Taen, T