Skip to main content

Showing 1–50 of 703 results for author: Taen, T

  1. arXiv:2410.17196  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    VoiceBench: Benchmarking LLM-Based Voice Assistants

    Authors: Yiming Chen, Xianghu Yue, Chen Zhang, Xiaoxue Gao, Robby T. Tan, Haizhou Li

    Abstract: Building on the success of large language models (LLMs), recent advancements such as GPT-4o have enabled real-time speech interactions through LLM-based voice assistants, offering a significantly improved user experience compared to traditional text-based interactions. However, the absence of benchmarks designed to evaluate these speech interaction capabilities has hindered progress of LLM-based v… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

    Comments: Work in progress. Data is available at https://github.com/MatthewCYM/VoiceBench

  2. arXiv:2410.13351  [pdf, other

    cs.CL cs.AI cs.LG

    Representation Learning of Structured Data for Medical Foundation Models

    Authors: Vijay Prakash Dwivedi, Viktor Schlegel, Andy T. Liu, Thanh-Tung Nguyen, Abhinav Ramesh Kashyap, Jeng Wei, Wei-Hsian Yin, Stefan Winkler, Robby T. Tan

    Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across various domains, including healthcare. However, their ability to effectively represent structured non-textual data, such as the alphanumeric medical codes used in records like ICD-10 or SNOMED-CT, is limited and has been particularly exposed in recent research. This paper examines the challenges LLMs face in processing me… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: NeurIPS 2024 Workshop on Unifying Representations in Neural Models (UniReps 2024)

  3. arXiv:2410.10121  [pdf, other

    cs.CV

    Interaction-Guided Two-Branch Image Dehazing Network

    Authors: Huichun Liu, Xiaosong Li, Tianshu Tan

    Abstract: Image dehazing aims to restore clean images from hazy ones. Convolutional Neural Networks (CNNs) and Transformers have demonstrated exceptional performance in local and global feature extraction, respectively, and currently represent the two mainstream frameworks in image dehazing. In this paper, we propose a novel dual-branch image dehazing framework that guides CNN and Transformer components int… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

    Comments: Accepted by ACCV 2024

  4. arXiv:2410.08638  [pdf

    physics.optics

    Leveraging reconfigurable micro-resonator soliton crystals for Intensity-Modulated Direct Detection Data Transmission

    Authors: Xavier X. Chia, Kenny Y. K. Ong, A. Aadhi, George F. R. Chen, Ju Won Choi, Byoung-Uk Sohn, Amdad Chowdury, Dawn T. H. Tan

    Abstract: The perennial demand for highly efficient short-haul communications is evidenced by a sustained explosion of growth in data center infrastructure that is predicted to continue for the foreseeable future. In these relatively compact networks, cost-sensitivity is of particular importance, which limits options to direct detection schemes that are more cost efficient than their coherent counterparts.… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  5. arXiv:2410.05125  [pdf, other

    physics.plasm-ph

    Dense Plasma Opacity from Excited States Method

    Authors: C. E. Starrett, C. J. Fontes, H. B. Tran Tan, J. M. Kasper, J. R. White

    Abstract: The self-consistent inclusion of plasma effects in opacity calculations is a significant modeling challenge. As density increases, such effects can no longer be treated perturbatively. Building on a recently published model that addresses this challenge, we calculate opacities of oxygen at solar interior conditions. The new model includes the effects of treating the free electrons consistently wit… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  6. arXiv:2410.01753  [pdf, other

    physics.atom-ph nucl-ex physics.optics quant-ph

    $^{229}\mathrm{ThF}_4$ thin films for solid-state nuclear clocks

    Authors: Chuankun Zhang, Lars von der Wense, Jack F. Doyle, Jacob S. Higgins, Tian Ooi, Hans U. Friebel, Jun Ye, R. Elwell, J. E. S. Terhune, H. W. T. Morgan, A. N. Alexandrova, H. B. Tran Tan, Andrei Derevianko, Eric R. Hudson

    Abstract: After nearly fifty years of searching, the vacuum ultraviolet $^{229}$Th nuclear isomeric transition has recently been directly laser excited [1,2] and measured with high spectroscopic precision [3]. Nuclear clocks based on this transition are expected to be more robust [4,5] than and may outperform [6,7] current optical atomic clocks. They also promise sensitive tests for new physics beyond the s… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: 15 pages, 3 figures

  7. arXiv:2409.19745  [pdf, other

    cs.CL cs.AI

    PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead

    Authors: Tao Tan, Yining Qian, Ang Lv, Hongzhan Lin, Songhao Wu, Yongbo Wang, Feng Wang, Jingtong Wu, Xin Lu, Rui Yan

    Abstract: Large language models (LLMs) enhanced with retrieval-augmented generation (RAG) have introduced a new paradigm for web search. However, the limited context awareness of LLMs degrades their performance on RAG tasks. Existing methods to enhance context awareness are often inefficient, incurring time or memory overhead during inference, and many are tailored to specific position embeddings. In this p… ▽ More

    Submitted 7 October, 2024; v1 submitted 29 September, 2024; originally announced September 2024.

    Comments: preprint

  8. arXiv:2409.18680  [pdf, other

    cs.SD cs.AI cs.CL cs.MM eess.AS

    Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models

    Authors: Yiming Chen, Xianghu Yue, Xiaoxue Gao, Chen Zhang, Luis Fernando D'Haro, Robby T. Tan, Haizhou Li

    Abstract: Various audio-LLMs (ALLMs) have been explored recently for tackling different audio tasks simultaneously using a single, unified model. While existing evaluations of ALLMs primarily focus on single-audio tasks, real-world applications often involve processing multiple audio streams simultaneously. To bridge this gap, we propose the first multi-audio evaluation (MAE) benchmark that consists of 20 d… ▽ More

    Submitted 1 October, 2024; v1 submitted 27 September, 2024; originally announced September 2024.

    Comments: EMNLP24 Findings

  9. arXiv:2409.18129  [pdf, other

    astro-ph.EP

    TOI-5005 b: A super-Neptune in the savanna near the ridge

    Authors: A. Castro-González, J. Lillo-Box, D. J. Armstrong, L. Acuña, A. Aguichine, V. Bourrier, S. Gandhi, S. G. Sousa, E. Delgado-Mena, A. Moya, V. Adibekyan, A. C. M. Correia, D. Barrado, M. Damasso, J. N. Winn, N. C. Santos, K. Barkaoui, S. C. C. Barros, Z. Benkhaldoun, F. Bouchy, C. Briceño, D. A. Caldwell, K. A. Collins, Z. Essack, M. Ghachoui , et al. (16 additional authors not shown)

    Abstract: The Neptunian desert and savanna have been recently found to be separated by a ridge, an overdensity of planets in the $\simeq$3-5 days period range. These features are thought to be shaped by dynamical and atmospheric processes. However, their relative roles are not yet well understood. We intend to confirm and characterise the super-Neptune TESS candidate TOI-5005.01, which orbits a moderately b… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: Accepted for publication in A&A. Abstract shortened. 35 pages, 26 figures

  10. arXiv:2409.17682  [pdf, other

    cs.CV

    Dark Miner: Defend against unsafe generation for text-to-image diffusion models

    Authors: Zheling Meng, Bo Peng, Xiaochuan Jin, Yue Jiang, Jing Dong, Wei Wang, Tieniu Tan

    Abstract: Text-to-image diffusion models have been demonstrated with unsafe generation due to unfiltered large-scale training data, such as violent, sexual, and shocking images, necessitating the erasure of unsafe concepts. Most existing methods focus on modifying the generation probabilities conditioned on the texts containing unsafe descriptions. However, they fail to guarantee safe generation for unseen… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  11. arXiv:2409.17558  [pdf, other

    quant-ph

    Demonstration of entanglement distribution over 155 km metropolitan fiber using a silicon nanophotonic chip

    Authors: Jinyi Du, Xingjian Zhang, George F. R. Chen, Hongwei Gao, Dawn T. H. Tan, Alexander Ling

    Abstract: Transmitting an entangled state over an extended distance is crucial for the development of quantum networks. Previous demonstrations of transmitting entangled photons over long distance using satellites or fibers have use entangled photon pairs generated from bulk crystal arrangements. An alternative approach would be to generate photon pairs using silicon-on-insulator (SOI) chips. Despite numero… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  12. arXiv:2409.14818  [pdf, other

    cs.CL cs.AI

    MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding

    Authors: Qinzhuo Wu, Weikai Xu, Wei Liu, Tao Tan, Jianfeng Liu, Ang Li, Jian Luan, Bin Wang, Shuo Shang

    Abstract: Recently, mobile AI agents based on VLMs have been gaining increasing attention. These works typically utilize VLM as a foundation, fine-tuning it with instruction-based mobile datasets. However, these VLMs are typically pre-trained on general-domain data, which often results in a lack of fundamental capabilities specific to the mobile domain. Therefore, they may struggle to recognize specific UI… ▽ More

    Submitted 3 October, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

  13. arXiv:2409.13015  [pdf, other

    astro-ph.SR astro-ph.GA astro-ph.IM

    First Resolution of Microlensed Images of a Binary-Lens Event

    Authors: Zexuan Wu, Subo Dong, A. Mérand, Christopher S. Kochanek, Przemek Mróz, Jinyi Shangguan, Grant Christie, Thiam-Guan Tan, Thomas Bensby, Joss Bland-Hawthorn, Sven Buder, Frank Eisenhauer, Andrew P. Gould, Janez Kos, Tim Natusch, Sanjib Sharma, Andrzej Udalski, J. Woillez, David A. H. Buckley, I. B. Thompson, Karim Abd El Dayem, Evelyne Alecian, Carine Babusiaux, Anthony Berdeu, Jean-Philippe Berger , et al. (53 additional authors not shown)

    Abstract: We resolve the multiple images of the binary-lens microlensing event ASASSN-22av using the GRAVITY instrument of the Very Large Telescope Interferometer (VLTI). The light curves show weak binary perturbations, complicating the analysis, but the joint modeling with the VLTI data breaks several degeneracies, arriving at a strongly favored solution. Thanks to precise measurements of angular Einstein… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

    Comments: see the ancillary file for animation associated with Fig. 8

  14. arXiv:2409.08651  [pdf, other

    physics.bio-ph

    Light-induced cortical excitability reveals programmable shape dynamics in starfish oocytes

    Authors: Jinghui Liu, Tom Burkart, Alexander Ziepke, John Reinhard, Yu-Chen Chao, Tzer Han Tan, S. Zachary Swartz, Erwin Frey, Nikta Fakhri

    Abstract: Chemo-mechanical waves on active deformable surfaces are a key component for many vital cellular functions. In particular, these waves play a major role in force generation and long-range signal transmission in cells that dynamically change shape, as encountered during cell division or morphogenesis. Reconstituting and controlling such chemically controlled cell deformations is a crucial but unsol… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: 36 pages, 16 figures, 11 movies

  15. arXiv:2409.07520  [pdf, other

    astro-ph.EP astro-ph.SR

    The inflated, eccentric warm Jupiter TOI-4914 b orbiting a metal-poor star, and the hot Jupiters TOI-2714 b and TOI-2981 b

    Authors: G. Mantovan, T. G. Wilson, L. Borsato, T. Zingales, K. Biazzo, D. Nardiello, L. Malavolta, S. Desidera, F. Marzari, A. Collier Cameron, V. Nascimbeni, F. Z. Majidi, M. Montalto, G. Piotto, K. G. Stassun, J. N. Winn, J. M. Jenkins, L. Mignon, A. Bieryla, D. W. Latham, K. Barkaoui, K. A. Collins, P. Evans, M. M. Fausnaugh, V. Granata , et al. (10 additional authors not shown)

    Abstract: Recent observations of giant planets have revealed unexpected bulk densities. Hot Jupiters, in particular, appear larger than expected for their masses compared to planetary evolution models, while warm Jupiters seem denser than expected. These differences are often attributed to the influence of the stellar incident flux, but could they also result from different planet formation processes? Is th… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: Accepted for publication in Astronomy & Astrophysics. 21 pages, 26 figures, and 8 tables. Abstract abridged

  16. arXiv:2409.06887  [pdf, other

    eess.IV cs.CV

    Ordinal Learning: Longitudinal Attention Alignment Model for Predicting Time to Future Breast Cancer Events from Mammograms

    Authors: Xin Wang, Tao Tan, Yuan Gao, Eric Marcus, Luyi Han, Antonio Portaluri, Tianyu Zhang, Chunyao Lu, Xinglong Liang, Regina Beets-Tan, Jonas Teuwen, Ritse Mann

    Abstract: Precision breast cancer (BC) risk assessment is crucial for developing individualized screening and prevention. Despite the promising potential of recent mammogram (MG) based deep learning models in predicting BC risk, they mostly overlook the 'time-to-future-event' ordering among patients and exhibit limited explorations into how they track history changes in breast tissue, thereby limiting their… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  17. arXiv:2409.06775  [pdf, other

    cond-mat.mes-hall cond-mat.str-el

    Wavefunction approach to the fractional anomalous Hall crystal

    Authors: Tixuan Tan, Julian May-Mann, Trithep Devakul

    Abstract: We propose fractional anomalous Hall crystals (FAHCs) as possible ground states of strongly interacting electrons in parent bands with Berry curvature. FAHCs are exotic states of matter that spontaneously break continuous translation symmetry to form a fractional Chern insulator. We construct a unified family of variational wavefunctions that describe FAHCs and their competing states in the presen… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  18. arXiv:2409.05455  [pdf, other

    quant-ph

    Universal Quantum Gate Set for Gottesman-Kitaev-Preskill Logical Qubits

    Authors: V. G. Matsos, C. H. Valahu, M. J. Millican, T. Navickas, X. C. Kolesnikow, M. J. Biercuk, T. R. Tan

    Abstract: The realisation of a universal quantum computer at scale promises to deliver a paradigm shift in information processing, providing the capability to solve problems that are intractable with conventional computers. A key limiting factor of realising fault-tolerant quantum information processing (QIP) is the large ratio of physical-to-logical qubits that outstrip device sizes available in the near f… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  19. arXiv:2409.05289  [pdf, other

    cs.RO eess.SY

    Developing Path Planning with Behavioral Cloning and Proximal Policy Optimization for Path-Tracking and Static Obstacle Nudging

    Authors: Mingyan Zhou, Biao Wang, Tian Tan, Xiatao Sun

    Abstract: In autonomous driving, end-to-end methods utilizing Imitation Learning (IL) and Reinforcement Learning (RL) are becoming more and more common. However, they do not involve explicit reasoning like classic robotics workflow and planning with horizons, resulting in strategies implicit and myopic. In this paper, we introduce a path planning method that uses Behavioral Cloning (BC) for path-tracking an… ▽ More

    Submitted 22 October, 2024; v1 submitted 8 September, 2024; originally announced September 2024.

    Comments: 6 pages, 8 figures

  20. arXiv:2409.04044  [pdf, other

    quant-ph

    Experimental Quantum Simulation of Chemical Dynamics

    Authors: T. Navickas, R. J. MacDonell, C. H. Valahu, V. C. Olaya-Agudelo, F. Scuccimarra, M. J. Millican, V. G. Matsos, H. L. Nourse, A. D. Rao, M. J. Biercuk, C. Hempel, I. Kassal, T. R. Tan

    Abstract: Simulating chemistry is likely to be among the earliest applications of quantum computing. However, existing digital quantum algorithms for chemical simulation require many logical qubits and gates, placing practical applications beyond existing technology. Here, we use an analog approach to carry out the first quantum simulations of chemical reactions. In particular, we simulate photoinduced non-… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

  21. arXiv:2409.01341  [pdf, other

    cs.CV

    Enhancing Test Time Adaptation with Few-shot Guidance

    Authors: Siqi Luo, Yi Xin, Yuntao Du, Zhongwei Wan, Tao Tan, Guangtao Zhai, Xiaohong Liu

    Abstract: Deep neural networks often encounter significant performance drops while facing with domain shifts between training (source) and test (target) data. To address this issue, Test Time Adaptation (TTA) methods have been proposed to adapt pre-trained source model to handle out-of-distribution streaming target data. Although these methods offer some relief, they lack a reliable mechanism for domain shi… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 8 pages, 7 figures

  22. arXiv:2408.13257  [pdf, other

    cs.CV

    MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

    Authors: Yi-Fan Zhang, Huanyu Zhang, Haochen Tian, Chaoyou Fu, Shuangqing Zhang, Junfei Wu, Feng Li, Kun Wang, Qingsong Wen, Zhang Zhang, Liang Wang, Rong Jin, Tieniu Tan

    Abstract: Comprehensive evaluation of Multimodal Large Language Models (MLLMs) has recently garnered widespread attention in the research community. However, we observe that existing benchmarks present several common barriers that make it difficult to measure the significant challenges that models face in the real world, including: 1) small data scale leads to a large performance variance; 2) reliance on mo… ▽ More

    Submitted 11 September, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

    Comments: Project Page: https://mme-realworld.github.io/

  23. arXiv:2408.12141  [pdf, other

    cs.CV

    TRRG: Towards Truthful Radiology Report Generation With Cross-modal Disease Clue Enhanced Large Language Model

    Authors: Yuhao Wang, Chao Hao, Yawen Cui, Xinqi Su, Weicheng Xie, Tao Tan, Zitong Yu

    Abstract: The vision-language modeling capability of multi-modal large language models has attracted wide attention from the community. However, in medical domain, radiology report generation using vision-language models still faces significant challenges due to the imbalanced data distribution caused by numerous negated descriptions in radiology reports and issues such as rough alignment between radiology… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  24. arXiv:2408.12095  [pdf, other

    cs.CL cs.AI cs.LG

    uMedSum: A Unified Framework for Advancing Medical Abstractive Summarization

    Authors: Aishik Nagar, Yutong Liu, Andy T. Liu, Viktor Schlegel, Vijay Prakash Dwivedi, Arun-Kumar Kaliya-Perumal, Guna Pratheep Kalanchiam, Yili Tang, Robby T. Tan

    Abstract: Medical abstractive summarization faces the challenge of balancing faithfulness and informativeness. Current methods often sacrifice key information for faithfulness or introduce confabulations when prioritizing informativeness. While recent advancements in techniques like in-context learning (ICL) and fine-tuning have improved medical summarization, they often overlook crucial aspects such as fai… ▽ More

    Submitted 25 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: 12 pages

  25. arXiv:2408.09144  [pdf, other

    cs.CV

    SSNeRF: Sparse View Semi-supervised Neural Radiance Fields with Augmentation

    Authors: Xiao Cao, Beibei Lin, Bo Wang, Zhiyong Huang, Robby T. Tan

    Abstract: Sparse view NeRF is challenging because limited input images lead to an under constrained optimization problem for volume rendering. Existing methods address this issue by relying on supplementary information, such as depth maps. However, generating this supplementary information accurately remains problematic and often leads to NeRF producing images with undesired artifacts. To address these arti… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  26. arXiv:2408.07516  [pdf, other

    cs.CV eess.IV

    DIffSteISR: Harnessing Diffusion Prior for Superior Real-world Stereo Image Super-Resolution

    Authors: Yuanbo Zhou, Xinlin Zhang, Wei Deng, Tao Wang, Tao Tan, Qinquan Gao, Tong Tong

    Abstract: We introduce DiffSteISR, a pioneering framework for reconstructing real-world stereo images. DiffSteISR utilizes the powerful prior knowledge embedded in pre-trained text-to-image model to efficiently recover the lost texture details in low-resolution stereo images. Specifically, DiffSteISR implements a time-aware stereo cross attention with temperature adapter (TASCATA) to guide the diffusion pro… ▽ More

    Submitted 14 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

  27. arXiv:2408.05053  [pdf, ps, other

    math.CO

    Odd Covers of Complete Graphs and Hypergraphs

    Authors: Imre Leader, Ta Sheng Tan

    Abstract: The `odd cover number' of a complete graph is the smallest size of a family of complete bipartite graphs that covers each edge an odd number of times. For $n$ odd, Buchanan, Clifton, Culver, Nie, O'Neill, Rombach and Yin showed that the odd cover number of $K_n$ is equal to $(n+1)/2$ or $(n+3)/2$, and they conjectured that it is always $(n+1)/2$. We prove this conjecture. For $n$ even, Babai and… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 7 pages

    MSC Class: 05C70

  28. arXiv:2408.02653  [pdf, other

    cond-mat.str-el cond-mat.mes-hall

    Importance of electron-phonon coupling near the electron-liquid to Wigner-crystal transition in two-dimensional atomically thin materials

    Authors: Tixuan Tan, Vladimir Calvera, Steven A. Kivelson

    Abstract: We study the effect of electron-phonon coupling on the location of the Fermi Liquid to Wigner Crystal transition in the two-dimensional electron gas realized in various material platforms. Based on dimensional estimates of the relevant parameters, we conclude that (as conventionally assumed) phonons are negligible in traditional semiconductor quantum well systems, but likely play a significant rol… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 4 + 4 pages, 2 figures

  29. arXiv:2407.21650  [pdf, other

    astro-ph.EP astro-ph.SR

    TESS Giants Transiting Giants. VI. Newly Discovered Hot Jupiters Provide Evidence for Efficient Obliquity Damping after the Main Sequence

    Authors: Nicholas Saunders, Samuel K. Grunblatt, Ashley Chontos, Fei Dai, Daniel Huber, Jingwen Zhang, Gudmundur Stefansson, Jennifer L. van Saders, Joshua N. Winn, Daniel Hey, Andrew W. Howard, Benjamin Fulton, Howard Isaacson, Corey Beard, Steven Giacalone, Judah van Zandt, Joseph M. Akana Murphey, Malena Rice, Sarah Blunt, Emma Turtelboom, Paul A. Dalba, Jack Lubin, Casey Brinkman, Emma M. Louden, Emma Page , et al. (31 additional authors not shown)

    Abstract: The degree of alignment between a star's spin axis and the orbital plane of its planets (the stellar obliquity) is related to interesting and poorly understood processes that occur during planet formation and evolution. Hot Jupiters orbiting hot stars ($\gtrsim$6250 K) display a wide range of obliquities, while similar planets orbiting cool stars are preferentially aligned. Tidal dissipation is ex… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: 22 pages, 14 figures, 3 tables

    Journal ref: AJ, 168, 2 (2024)

  30. arXiv:2407.19735  [pdf, other

    quant-ph

    Scalable High-Dimensional Multipartite Entanglement with Trapped Ions

    Authors: Harsh Vardhan Upadhyay, Sanket Kumar Tripathy, Ting Rei Tan, Baladitya Suri, Athreya Shankar

    Abstract: We propose a protocol for the preparation of generalized Greenberger-Horne-Zeilinger (GHZ) states of $N$ atoms each with $d=3$ or $4$ internal levels. We generalize the celebrated one-axis twisting (OAT) Hamiltonian for $N$ qubits to qudits by including OAT interactions of equal strengths between every pair of qudit levels, a protocol we call as balanced OAT (BOAT). Analogous to OAT for qubits, we… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 12 pages, 1 figure; comments welcome

  31. arXiv:2407.19088  [pdf

    cs.CY

    Shaping Integrity: Why Generative Artificial Intelligence Does Not Have to Undermine Education

    Authors: Myles Joshua Toledo Tan, Nicholle Mae Amor Tan Maravilla

    Abstract: This paper examines the role of generative artificial intelligence (GAI) in promoting academic integrity within educational settings. It explores how AI can be ethically integrated into classrooms to enhance learning experiences, foster intrinsic motivation, and support voluntary behavior change among students. By analyzing established ethical frameworks and educational theories such as deontologi… ▽ More

    Submitted 10 October, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

    Comments: 18 pages, 0 figures

  32. arXiv:2407.18242  [pdf, other

    cs.LG cs.AI cs.CL

    LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

    Authors: Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, Tieniu Tan

    Abstract: Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models. Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning. In this paper, we first uncover a fundamental connection between the optimization processes of LoRA and full fine-tuning: using LoRA for optimization is mathema… ▽ More

    Submitted 15 October, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

  33. arXiv:2407.17819  [pdf, other

    quant-ph physics.chem-ph

    Simulating open-system molecular dynamics on analog quantum computers

    Authors: V. C. Olaya-Agudelo, B. Stewart, C. H. Valahu, R. J. MacDonell, M. J. Millican, V. G. Matsos, F. Scuccimarra, T. R. Tan, I. Kassal

    Abstract: Interactions of molecules with their environment influence the course and outcome of almost all chemical reactions. However, classical computers struggle to accurately simulate complicated molecule-environment interactions because of the steep growth of computational resources with both molecule size and environment complexity. Therefore, many quantum-chemical simulations are restricted to isolate… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  34. arXiv:2407.15451  [pdf, other

    cs.CV

    Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions

    Authors: Yihao Ai, Yifei Qi, Bo Wang, Yu Cheng, Xinchao Wang, Robby T. Tan

    Abstract: Existing 2D human pose estimation research predominantly concentrates on well-lit scenarios, with limited exploration of poor lighting conditions, which are a prevalent aspect of daily life. Recent studies on low-light pose estimation require the use of paired well-lit and low-light images with ground truths for training, which are impractical due to the inherent challenges associated with annotat… ▽ More

    Submitted 23 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: 18 pages, 3 figure. Accepted by ECCV24

  35. arXiv:2407.14825  [pdf

    physics.optics physics.bio-ph physics.med-ph

    3D-printed axicon enables extended depth-of-focus intravascular optical coherence tomography

    Authors: Pavel Ruchka, Alok Kushwaha, Jessica A. Marathe, Lei Xiang, Rouyan Chen, Rodney Kirk, Joanne T. M. Tan, Christina A. Bursill, Johan Verjans, Simon Thiele, Robert Fitridge, Robert A. McLaughlin, Peter J. Psaltis, Harald Giessen, Jiawen Li

    Abstract: A fundamental challenge in endoscopy is how to fabricate a small fiber-optic probe that can achieve comparable function to probes with large, complicated optics (e.g., high resolution and extended depth of focus). To achieve high resolution over an extended depth of focus (DOF), the application of needle-like beams has been proposed. However, existing methods using miniaturized needle beam designs… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  36. arXiv:2407.12445  [pdf, other

    cs.LG cs.CY

    A Comprehensive Sustainable Framework for Machine Learning and Artificial Intelligence

    Authors: Roberto Pagliari, Peter Hill, Po-Yu Chen, Maciej Dabrowny, Tingsheng Tan, Francois Buet-Golfouse

    Abstract: In financial applications, regulations or best practices often lead to specific requirements in machine learning relating to four key pillars: fairness, privacy, interpretability and greenhouse gas emissions. These all sit in the broader context of sustainability in AI, an emerging practical AI topic. However, although these pillars have been individually addressed by past literature, none of thes… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 8 pages, 3 figures, 4 tables, ECAI 24'

    ACM Class: I.2.0

  37. arXiv:2407.11536  [pdf, other

    cs.CL cs.AI

    Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise

    Authors: Qimin Yang, Rongsheng Wang, Jiexin Chen, Runqi Su, Tao Tan

    Abstract: Large Language Models (LLMs) have been widely applied in various professional fields. By fine-tuning the models using domain specific question and answer datasets, the professional domain knowledge and Q\&A abilities of these models have significantly improved, for example, medical professional LLMs that use fine-tuning of doctor-patient Q\&A data exhibit extraordinary disease diagnostic abilities… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 5 pages, 1 figure. Accepted by the Workshop on Long-Context Foundation Models (LCFM) at ICML 2024

  38. arXiv:2407.10767  [pdf, other

    cond-mat.str-el cond-mat.mtrl-sci

    Magnetic and nematic order of Bose-Fermi mixtures in moiré superlattices of 2D semiconductors

    Authors: Feng-Ren Fan, Tixuan Tan, Chengxin Xiao, Wang Yao

    Abstract: We investigate the magnetic orders in a mixture of Boson (exciton) and Fermion (electron or hole) trapped in transition-metal dichalcogenides moiré superlattices. A sizable antiferromagnetic exchange interaction is found between a carrier and an interlayer exciton trapped at different high symmetry points of the moiré supercell. This interaction at a distance much shorter than the carrier-carrier… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 6 pages, 4 figures

  39. arXiv:2407.07666  [pdf

    cs.CL cs.AI

    A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability

    Authors: Ting Fang Tan, Kabilan Elangovan, Jasmine Ong, Nigam Shah, Joseph Sung, Tien Yin Wong, Lan Xue, Nan Liu, Haibo Wang, Chang Fu Kuo, Simon Chesterman, Zee Kin Yeong, Daniel SW Ting

    Abstract: A comprehensive qualitative evaluation framework for large language models (LLM) in healthcare that expands beyond traditional accuracy and quantitative metrics needed. We propose 5 key aspects for evaluation of LLMs: Safety, Consensus, Objectivity, Reproducibility and Explainability (S.C.O.R.E.). We suggest that S.C.O.R.E. may form the basis for an evaluation framework for future LLM-based models… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  40. arXiv:2407.06857  [pdf, other

    eess.SY

    Enhanced Battery Degradation-Aware Scheduling for Distribution Network with Electric Vehicle Load

    Authors: Vijay Babu Pamshetti, Wei Zhang, Andy Man-Fai Ng, Qingyu Yan, Kuan Tak Tan

    Abstract: Batteries play a key role in today's power grid. In this paper, we investigate the impact of battery degradation on the distribution network. We formulate a multi-objective framework for optimizing battery scheduling with the goals of minimizing monetary costs and improving network performance. Our framework incorporates energy purchase and battery degradation into the costs and measures the netwo… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 3 figures

  41. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  42. arXiv:2407.02911  [pdf, other

    eess.IV cs.CV

    Non-Adversarial Learning: Vector-Quantized Common Latent Space for Multi-Sequence MRI

    Authors: Luyi Han, Tao Tan, Tianyu Zhang, Xin Wang, Yuan Gao, Chunyao Lu, Xinglong Liang, Haoran Dou, Yunzhi Huang, Ritse Mann

    Abstract: Adversarial learning helps generative models translate MRI from source to target sequence when lacking paired samples. However, implementing MRI synthesis with adversarial learning in clinical settings is challenging due to training instability and mode collapse. To address this issue, we leverage intermediate sequences to estimate the common latent space among multi-sequence MRI, enabling the rec… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  43. arXiv:2407.00993  [pdf, other

    cs.AI cs.CL

    Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents

    Authors: Shihan Deng, Weikai Xu, Hongda Sun, Wei Liu, Tao Tan, Jianfeng Liu, Ang Li, Jian Luan, Bin Wang, Rui Yan, Shuo Shang

    Abstract: With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction. However, there is a scarcity of benchmarks available for LLM-based mobile agents. Benchmarking these agents generally faces three main challenges: (1) The inefficiency of UI-only operations imposes limitations to task evaluation. (2) Specific instructions… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  44. Artificial Immune System of Secure Face Recognition Against Adversarial Attacks

    Authors: Min Ren, Yunlong Wang, Yuhao Zhu, Yongzhen Huang, Zhenan Sun, Qi Li, Tieniu Tan

    Abstract: Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future. However, optimisation is required for insect production to realise its full potential. This can be by targeted improvement of traits of interest through selective breeding, an approach which has so far been underexplored… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Journal ref: International Journal of Computer Vision (IJCV), 2024

  45. arXiv:2406.15704  [pdf, other

    cs.CV

    video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

    Authors: Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang

    Abstract: Speech understanding as an element of the more generic video understanding using audio-visual large language models (av-LLMs) is a crucial yet understudied aspect. This paper proposes video-SALMONN, a single end-to-end av-LLM for video processing, which can understand not only visual frame sequences, audio events and music, but speech as well. To obtain fine-grained temporal information required b… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML 2024. arXiv admin note: substantial text overlap with arXiv:2310.05863

  46. arXiv:2406.12996  [pdf, other

    astro-ph.EP

    TOI-2374 b and TOI-3071 b: two metal-rich sub-Saturns well within the Neptunian desert

    Authors: Alejandro Hacker, Rodrigo F. Díaz, David J. Armstrong, Jorge Fernández Fernández, Simon Müller, Elisa Delgado-Mena, Sérgio G. Sousa, Vardan Adibekyan, Keivan G. Stassun, Karen A. Collins, Samuel W. Yee, Daniel Bayliss, Allyson Bieryla, François Bouchy, R. Paul Butler, Jeffrey D. Crane, Xavier Dumusque, Joel D. Hartman, Ravit Helled, Jon Jenkins, Marcelo Aron F. Keniger, Hannah Lewis, Jorge Lillo-Box, Michael B. Lund, Louise D. Nielsen , et al. (18 additional authors not shown)

    Abstract: We report the discovery of two transiting planets detected by the Transiting Exoplanet Survey Satellite (TESS), TOI-2374 b and TOI-3071 b, orbiting a K5V and an F8V star, respectively, with periods of 4.31 and 1.27 days, respectively. We confirm and characterize these two planets with a variety of ground-based and follow-up observations, including photometry, precise radial velocity monitoring and… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 24 pages, 22 figures, 10 tables, accepted for publication in MNRAS

  47. arXiv:2406.12447  [pdf, other

    eess.AS

    Text-aware Speech Separation for Multi-talker Keyword Spotting

    Authors: Haoyu Li, Baochen Yang, Yu Xi, Linfeng Yu, Tian Tan, Hao Li, Kai Yu

    Abstract: For noisy environments, ensuring the robustness of keyword spotting (KWS) systems is essential. While much research has focused on noisy KWS, less attention has been paid to multi-talker mixed speech scenarios. Unlike the usual cocktail party problem where multi-talker speech is separated using speaker clues, the key challenge here is to extract the target speech for KWS based on text clues. To ad… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  48. arXiv:2406.11369  [pdf, other

    cs.CG cs.DS

    Approximation Algorithms for Smallest Intersecting Balls

    Authors: Jiaqi Zheng, Tiow-Seng Tan

    Abstract: We study a general smallest intersecting ball problem and its soft-margin variant in high-dimensional Euclidean spaces, which only require the input objects to be compact and convex. These two problems link and unify a series of fundamental problems in computational geometry and machine learning, including smallest enclosing ball, polytope distance, intersection radius, $\ell_1$-loss support vecto… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  49. arXiv:2406.08481  [pdf, other

    cs.CV

    Enhancing End-to-End Autonomous Driving with Latent World Model

    Authors: Yingyan Li, Lue Fan, Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang, Tieniu Tan

    Abstract: End-to-end autonomous driving has garnered widespread attention. Current end-to-end approaches largely rely on the supervision from perception tasks such as detection, tracking, and map segmentation to aid in learning scene representations. However, these methods require extensive annotations, hindering the data scalability. To address this challenge, we propose a novel self-supervised method to e… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  50. arXiv:2406.07914  [pdf, other

    cs.SD eess.AS

    Can Large Language Models Understand Spatial Audio?

    Authors: Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Jun Zhang, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang

    Abstract: This paper explores enabling large language models (LLMs) to understand spatial information from multichannel audio, a skill currently lacking in auditory LLMs. By leveraging LLMs' advanced cognitive and inferential abilities, the aim is to enhance understanding of 3D environments via audio. We study 3 spatial audio tasks: sound source localization (SSL), far-field speech recognition (FSR), and lo… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024