Skip to main content

Showing 1–50 of 85 results for author: Deng, K

  1. arXiv:2410.15272  [pdf, other

    cs.IR cs.AI

    Performance-Driven QUBO for Recommender Systems on Quantum Annealers

    Authors: Jiayang Niu, Jie Li, Ke Deng, Mark Sanderson, Yongli Ren

    Abstract: We propose Counterfactual Analysis Quadratic Unconstrained Binary Optimization (CAQUBO) to solve QUBO problems for feature selection in recommender systems. CAQUBO leverages counterfactual analysis to measure the impact of individual features and feature combinations on model performance and employs the measurements to construct the coefficient matrix for a quantum annealer to select the optimal f… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

  2. arXiv:2410.13854  [pdf, other

    cs.CL cs.AI cs.CV cs.CY

    Can MLLMs Understand the Deep Implication Behind Chinese Images?

    Authors: Chenhao Zhang, Xi Feng, Yuelin Bai, Xinrun Du, Jinchang Hou, Kaixin Deng, Guangzeng Han, Qinrui Li, Bingli Wang, Jiaheng Liu, Xingwei Qu, Yifei Zhang, Qixuan Zhao, Yiming Liang, Ziqiang Liu, Feiteng Fang, Min Yang, Wenhao Huang, Chenghua Lin, Ge Zhang, Shiwen Ni

    Abstract: As the capabilities of Multimodal Large Language Models (MLLMs) continue to improve, the need for higher-order capability evaluation of MLLMs is increasing. However, there is a lack of work evaluating MLLM for higher-order perception and understanding of Chinese visual content. To fill the gap, we introduce the **C**hinese **I**mage **I**mplication understanding **Bench**mark, **CII-Bench**, which… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: 32 pages,18 figures. Project Page: https://cii-bench.github.io/ Code: https://github.com/MING_X/CII-Bench Dataset: https://huggingface.co/datasets/m-a-p/CII-Bench

  3. arXiv:2410.11710  [pdf, other

    cs.CL

    MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models

    Authors: Pei Wang, Yanan Wu, Zekun Wang, Jiaheng Liu, Xiaoshuai Song, Zhongyuan Peng, Ken Deng, Chenchen Zhang, Jiakai Wang, Junran Peng, Ge Zhang, Hangyu Guo, Zhaoxiang Zhang, Wenbo Su, Bo Zheng

    Abstract: Large Language Models (LLMs) have displayed massive improvements in reasoning and decision-making skills and can hold natural conversations with users. Recently, many tool-use benchmark datasets have been proposed. However, existing datasets have the following limitations: (1). Insufficient evaluation scenarios (e.g., only cover limited tool-use scenes). (2). Extensive evaluation costs (e.g., GPT… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  4. arXiv:2410.06885  [pdf, ps, other

    eess.AS cs.SD

    F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

    Authors: Yushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu, Xie Chen

    Abstract: This paper introduces F5-TTS, a fully non-autoregressive text-to-speech system based on flow matching with Diffusion Transformer (DiT). Without requiring complex designs such as duration model, text encoder, and phoneme alignment, the text input is simply padded with filler tokens to the same length as input speech, and then the denoising is performed for speech generation, which was originally pr… ▽ More

    Submitted 15 October, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

  5. arXiv:2409.19510  [pdf, other

    cs.CL

    CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

    Authors: Yexing Du, Ziyang Ma, Yifan Yang, Keqi Deng, Xie Chen, Bo Yang, Yang Xiang, Ming Liu, Bing Qin

    Abstract: Speech Language Models (SLMs) have demonstrated impressive performance on speech translation tasks. However, existing research primarily focuses on direct instruction fine-tuning and often overlooks the inherent reasoning capabilities of SLMs. In this paper, we introduce a three-stage training framework designed to activate the chain-of-thought (CoT) capabilities of SLMs. We propose CoT-ST, a spee… ▽ More

    Submitted 28 September, 2024; originally announced September 2024.

  6. arXiv:2409.19506  [pdf, ps, other

    cs.MM cs.CV

    IWN: Image Watermarking Based on Idempotency

    Authors: Kaixin Deng

    Abstract: In the expanding field of digital media, maintaining the strength and integrity of watermarking technology is becoming increasingly challenging. This paper, inspired by the Idempotent Generative Network (IGN), explores the prospects of introducing idempotency into image watermark processing and proposes an innovative neural network model - the Idempotent Watermarking Network (IWN). The proposed mo… ▽ More

    Submitted 28 September, 2024; originally announced September 2024.

  7. arXiv:2409.15273  [pdf, other

    cs.CV

    MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors

    Authors: Yehonathan Litman, Or Patashnik, Kangle Deng, Aviral Agrawal, Rushikesh Zawar, Fernando De la Torre, Shubham Tulsiani

    Abstract: Recent works in inverse rendering have shown promise in using multi-view images of an object to recover shape, albedo, and materials. However, the recovered components often fail to render accurately under new lighting conditions due to the intrinsic challenge of disentangling albedo and material properties from input images. To address this challenge, we introduce MaterialFusion, an enhanced conv… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Comments: Project Page: https://yehonathanlitman.github.io/material_fusion

  8. arXiv:2407.16154  [pdf, other

    cs.CL

    DDK: Distilling Domain Knowledge for Efficient Large Language Models

    Authors: Jiaheng Liu, Chenchen Zhang, Jinyang Guo, Yuanxing Zhang, Haoran Que, Ken Deng, Zhiqi Bai, Jie Liu, Ge Zhang, Jiakai Wang, Yanan Wu, Congnan Liu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

    Abstract: Despite the advanced intelligence abilities of large language models (LLMs) in various applications, they still face significant computational and storage demands. Knowledge Distillation (KD) has emerged as an effective strategy to improve the performance of a smaller LLM (i.e., the student model) by transferring knowledge from a high-performing LLM (i.e., the teacher model). Prevailing techniques… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  9. arXiv:2407.02839  [pdf, other

    cs.IR cs.AI

    CRUISE on Quantum Computing for Feature Selection in Recommender Systems

    Authors: Jiayang Niu, Jie Li, Ke Deng, Yongli Ren

    Abstract: Using Quantum Computers to solve problems in Recommender Systems that classical computers cannot address is a worthwhile research topic. In this paper, we use Quantum Annealers to address the feature selection problem in recommendation algorithms. This feature selection problem is a Quadratic Unconstrained Binary Optimization(QUBO) problem. By incorporating Counterfactual Analysis, we significantl… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: accepted by QuantumCLEF 2024

  10. arXiv:2407.01598  [pdf

    cs.LG cs.AI

    Long-Term Prediction Accuracy Improvement of Data-Driven Medium-Range Global Weather Forecast

    Authors: Yifan Hu, Fukang Yin, Weimin Zhang, Kaijun Ren, Junqiang Song, Kefeng Deng, Di Zhang

    Abstract: Long-term stability stands as a crucial requirement in data-driven medium-range global weather forecasting. Spectral bias is recognized as the primary contributor to instabilities, as data-driven methods difficult to learn small-scale dynamics. In this paper, we reveal that the universal mechanism for these instabilities is not only related to spectral bias but also to distortions brought by proce… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  11. arXiv:2407.00488  [pdf, other

    cs.CL cs.AI

    PFME: A Modular Approach for Fine-grained Hallucination Detection and Editing of Large Language Models

    Authors: Kunquan Deng, Zeyu Huang, Chen Li, Chenghua Lin, Min Gao, Wenge Rong

    Abstract: Large Language Models (LLMs) excel in fluency but risk producing inaccurate content, called "hallucinations." This paper outlines a standardized process for categorizing fine-grained hallucination types and proposes an innovative framework--the Progressive Fine-grained Model Editor (PFME)--specifically designed to detect and correct fine-grained hallucinations in LLMs. PFME consists of two collabo… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  12. arXiv:2406.17720  [pdf, other

    cs.CV

    Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity

    Authors: Chih-Hsuan Yang, Benjamin Feuer, Zaki Jubery, Zi K. Deng, Andre Nakkab, Md Zahid Hasan, Shivani Chiranjeevi, Kelly Marshall, Nirmal Baishnab, Asheesh K Singh, Arti Singh, Soumik Sarkar, Nirav Merchant, Chinmay Hegde, Baskar Ganapathysubramanian

    Abstract: We introduce Arboretum, the largest publicly accessible dataset designed to advance AI for biodiversity applications. This dataset, curated from the iNaturalist community science platform and vetted by domain experts to ensure accuracy, includes 134.6 million images, surpassing existing datasets in scale by an order of magnitude. The dataset encompasses image-language paired data for a diverse set… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Preprint under review

  13. arXiv:2406.04541  [pdf, other

    cs.CL eess.AS

    Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation

    Authors: Keqi Deng, Philip C. Woodland

    Abstract: While the neural transducer is popular for online speech recognition, simultaneous speech translation (SST) requires both streaming and re-ordering capabilities. This paper presents the LS-Transducer-SST, a label-synchronous neural transducer for SST, which naturally possesses these two properties. The LS-Transducer-SST dynamically decides when to emit translation tokens based on an Auto-regressiv… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024 Main Conference

  14. arXiv:2406.01359  [pdf, other

    cs.CL cs.SE

    R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

    Authors: Ken Deng, Jiaheng Liu, He Zhu, Congnan Liu, Jingxin Li, Jiakai Wang, Peng Zhao, Chenchen Zhang, Yanan Wu, Xueqiao Yin, Yuanxing Zhang, Wenbo Su, Bangyu Xiang, Tiezheng Ge, Bo Zheng

    Abstract: Code completion models have made significant progress in recent years. Recently, repository-level code completion has drawn more attention in modern software development, and several baseline methods and benchmarks have been proposed. However, existing repository-level code completion methods often fall short of fully using the extensive context of a project repository, such as the intricacies of… ▽ More

    Submitted 3 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  15. arXiv:2406.00522  [pdf, other

    eess.AS cs.SD

    Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning

    Authors: Keqi Deng, Guangzhi Sun, Philip C. Woodland

    Abstract: Wav2Prompt is proposed which allows straightforward integration between spoken input and a text-based large language model (LLM). Wav2Prompt uses a simple training process with only the same data used to train an automatic speech recognition (ASR) model. After training, Wav2Prompt learns continuous representations from speech and uses them as LLM prompts. To avoid task over-fitting issues found in… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  16. arXiv:2404.14934  [pdf, other

    cs.MM cs.CV cs.HC

    G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition

    Authors: Kaikai Deng, Dong Zhao, Wenxin Zheng, Yue Ling, Kangwen Yin, Huadong Ma

    Abstract: Millimeter wave radar is gaining traction recently as a promising modality for enabling pervasive and privacy-preserving gesture recognition. However, the lack of rich and fine-grained radar datasets hinders progress in developing generalized deep learning models for gesture recognition across various user postures (e.g., standing, sitting), positions, and scenes. To remedy this, we resort to desi… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 18 pages, 29 figures

  17. arXiv:2404.04992  [pdf, other

    cs.CV stat.AP

    Efficient Surgical Tool Recognition via HMM-Stabilized Deep Learning

    Authors: Haifeng Wang, Hao Xu, Jun Wang, Jian Zhou, Ke Deng

    Abstract: Recognizing various surgical tools, actions and phases from surgery videos is an important problem in computer vision with exciting clinical applications. Existing deep-learning-based methods for this problem either process each surgical video as a series of independent images without considering their dependence, or rely on complicated deep learning models to count for dependence of video frames.… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  18. arXiv:2403.01325  [pdf, other

    cs.CV

    NeRF-VPT: Learning Novel View Representations with Neural Radiance Fields via View Prompt Tuning

    Authors: Linsheng Chen, Guangrun Wang, Liuchun Yuan, Keze Wang, Ken Deng, Philip H. S. Torr

    Abstract: Neural Radiance Fields (NeRF) have garnered remarkable success in novel view synthesis. Nonetheless, the task of generating high-quality images for novel views persists as a critical challenge. While the existing efforts have exhibited commendable progress, capturing intricate details, enhancing textures, and achieving superior Peak Signal-to-Noise Ratio (PSNR) metrics warrant further focused atte… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: AAAI 2024

  19. arXiv:2402.13251  [pdf, other

    cs.GR cs.CV cs.LG

    FlashTex: Fast Relightable Mesh Texturing with LightControlNet

    Authors: Kangle Deng, Timothy Omernick, Alexander Weiss, Deva Ramanan, Jun-Yan Zhu, Tinghui Zhou, Maneesh Agrawala

    Abstract: Manually creating textures for 3D meshes is time-consuming, even for expert visual content creators. We propose a fast approach for automatically texturing an input 3D mesh based on a user-provided text prompt. Importantly, our approach disentangles lighting from surface material/reflectance in the resulting texture so that the mesh can be properly relit and rendered in any lighting environment. W… ▽ More

    Submitted 17 October, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Project page: https://flashtex.github.io/

  20. arXiv:2402.10711  [pdf, other

    cs.RO

    StableLego: Stability Analysis of Block Stacking Assembly

    Authors: Ruixuan Liu, Kangle Deng, Ziwei Wang, Changliu Liu

    Abstract: Recent advancements in robotics enable robots to accomplish complex assembly tasks. However, designing an assembly requires a non-trivial effort since a slight variation in the design could significantly affect the task feasibility. It is critical to ensure the physical feasibility of the assembly design so that the assembly task can be successfully executed. To address the challenge, this paper s… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  21. arXiv:2402.03357  [pdf, other

    cs.SI cs.AI cs.LG

    Harnessing Network Effect for Fake News Mitigation: Selecting Debunkers via Self-Imitation Learning

    Authors: Xiaofei Xu, Ke Deng, Michael Dann, Xiuzhen Zhang

    Abstract: This study aims to minimize the influence of fake news on social networks by deploying debunkers to propagate true news. This is framed as a reinforcement learning problem, where, at each stage, one user is selected to propagate true news. A challenging issue is episodic reward where the "net" effect of selecting individual debunkers cannot be discerned from the interleaving information propagatio… ▽ More

    Submitted 28 January, 2024; originally announced February 2024.

    Comments: 10 pages, full version of this paper is accepted by AAAI'24

  22. arXiv:2312.09100  [pdf, other

    eess.AS cs.SD

    FastInject: Injecting Unpaired Text Data into CTC-based ASR training

    Authors: Keqi Deng, Philip C. Woodland

    Abstract: Recently, connectionist temporal classification (CTC)-based end-to-end (E2E) automatic speech recognition (ASR) models have achieved impressive results, especially with the development of self-supervised learning. However, E2E ASR models trained on paired speech-text data often suffer from domain shifts from training to testing. To alleviate this issue, this paper proposes a flat-start joint train… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP2024

  23. arXiv:2312.08367  [pdf, other

    cs.CV

    ViLA: Efficient Video-Language Alignment for Video Question Answering

    Authors: Xijun Wang, Junbang Liang, Chun-Kai Wang, Kenan Deng, Yu Lou, Ming Lin, Shan Yang

    Abstract: In this work, we propose an efficient Video-Language Alignment (ViLA) network. Our ViLA model addresses both efficient frame sampling and effective cross-modal alignment in a unified way. In our ViLA network, we design a new learnable text-guided Frame-Prompter together with a new cross-modal distillation (QFormer-Distiller) module. Pre-trained large image-language models have shown promising resu… ▽ More

    Submitted 1 October, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: ECCV 2024

  24. arXiv:2311.07797  [pdf, other

    cs.LG

    Explainable History Distillation by Marked Temporal Point Process

    Authors: Sishun Liu, Ke Deng, Yan Wang, Xiuzhen Zhang

    Abstract: Explainability of machine learning models is mandatory when researchers introduce these commonly believed black boxes to real-world tasks, especially high-stakes ones. In this paper, we build a machine learning system to automatically generate explanations of happened events from history by \gls{ca} based on the \acrfull{tpp}. Specifically, we propose a new task called \acrfull{ehd}. This task req… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  25. arXiv:2308.16741  [pdf, other

    cs.AI cs.CV

    Socratis: Are large multimodal models emotionally aware?

    Authors: Katherine Deng, Arijit Ray, Reuben Tan, Saadia Gabriel, Bryan A. Plummer, Kate Saenko

    Abstract: Existing emotion prediction benchmarks contain coarse emotion labels which do not consider the diversity of emotions that an image and text can elicit in humans due to various reasons. Learning diverse reactions to multimodal content is important as intelligent machines take a central role in generating and delivering content to society. To address this gap, we propose Socratis, a societal reactio… ▽ More

    Submitted 2 November, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: ICCV 2023 WECIA

  26. arXiv:2308.15395  [pdf, other

    cs.LG q-bio.MN q-bio.QM

    The CausalBench challenge: A machine learning contest for gene network inference from single-cell perturbation data

    Authors: Mathieu Chevalley, Jacob Sackett-Sanders, Yusuf Roohani, Pascal Notin, Artemy Bakulin, Dariusz Brzezinski, Kaiwen Deng, Yuanfang Guan, Justin Hong, Michael Ibrahim, Wojciech Kotlowski, Marcin Kowiel, Panagiotis Misiakos, Achille Nazaret, Markus Püschel, Chris Wendler, Arash Mehrjou, Patrick Schwab

    Abstract: In drug discovery, mapping interactions between genes within cellular systems is a crucial early step. This helps formulate hypotheses regarding molecular mechanisms that could potentially be targeted by future medicines. The CausalBench Challenge was an initiative to invite the machine learning community to advance the state of the art in constructing gene-gene interaction networks. These network… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

  27. arXiv:2308.13345  [pdf, other

    eess.AS cs.CL cs.SD

    Decoupled Structure for Improved Adaptability of End-to-End Models

    Authors: Keqi Deng, Philip C. Woodland

    Abstract: Although end-to-end (E2E) trainable automatic speech recognition (ASR) has shown great success by jointly learning acoustic and linguistic information, it still suffers from the effect of domain shifts, thus limiting potential applications. The E2E ASR model implicitly learns an internal language model (LM) which characterises the training distribution of the source domain, and the E2E trainable n… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  28. arXiv:2308.08761  [pdf, other

    cs.CR

    Privacy-Preserving Detection Method for Transmission Line Based on Edge Collaboration

    Authors: Quan Shi, Kaiyuan Deng

    Abstract: Unmanned aerial vehicles (UAVs) are commonly used for edge collaborative computing in current transmission line object detection, where computationally intensive tasks generated by user nodes are offloaded to more powerful edge servers for processing. However, performing edge collaborative processing on transmission line image data may result in serious privacy breaches. To address this issue, we… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

  29. arXiv:2308.02360  [pdf, other

    cs.LG stat.ML

    Intensity-free Integral-based Learning of Marked Temporal Point Processes

    Authors: Sishun Liu, Ke Deng, Xiuzhen Zhang, Yongli Ren

    Abstract: In the marked temporal point processes (MTPP), a core problem is to parameterize the conditional joint PDF (probability distribution function) $p^*(m,t)$ for inter-event time $t$ and mark $m$, conditioned on the history. The majority of existing studies predefine intensity functions. Their utility is challenged by specifying the intensity function's proper form, which is critical to balance expres… ▽ More

    Submitted 7 August, 2023; v1 submitted 4 August, 2023; originally announced August 2023.

  30. arXiv:2307.07445  [pdf, other

    cs.NI cs.AI cs.LG eess.SY

    TSNet-SAC: Leveraging Transformers for Efficient Task Scheduling

    Authors: Ke Deng, Zhiyuan He, Hao Zhang, Haohan Lin, Desheng Wang

    Abstract: In future 6G Mobile Edge Computing (MEC), autopilot systems require the capability of processing multimodal data with strong interdependencies. However, traditional heuristic algorithms are inadequate for real-time scheduling due to their requirement for multiple iterations to derive the optimal scheme. We propose a novel TSNet-SAC based on Transformer, that utilizes heuristic algorithms solely to… ▽ More

    Submitted 16 June, 2023; originally announced July 2023.

  31. arXiv:2306.08417  [pdf, other

    cs.NI eess.SY

    A Novel Channel-Constrained Model for 6G Vehicular Networks with Traffic Spikes

    Authors: Ke Deng, Zhiyuan He, Haohan Lin, Hao Zhang, Desheng Wang

    Abstract: Mobile Edge Computing (MEC) holds excellent potential in Congestion Management (CM) of 6G vehicular networks. A reasonable schedule of MEC ensures a more reliable and efficient CM system. Unfortunately, existing parallel and sequential models cannot cope with scarce computing resources and constrained channels, especially during traffic rush hour. In this paper, we propose a channel-constrained mu… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

  32. arXiv:2306.04769  [pdf, other

    math.OC cs.DC

    Achieving Consensus over Compact Submanifolds

    Authors: Jiang Hu, Jiaojiao Zhang, Kangkang Deng

    Abstract: We consider the consensus problem in a decentralized network, focusing on a compact submanifold that acts as a nonconvex constraint set. By leveraging the proximal smoothness of the compact submanifold, which encompasses the local singleton property and the local Lipschitz continuity of the projection operator on the manifold, and establishing the connection between the projection operator and gen… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: 25 pages

  33. arXiv:2306.02507  [pdf, other

    cs.CV

    Deep learning powered real-time identification of insects using citizen science data

    Authors: Shivani Chiranjeevi, Mojdeh Sadaati, Zi K Deng, Jayanth Koushik, Talukder Z Jubery, Daren Mueller, Matthew E O Neal, Nirav Merchant, Aarti Singh, Asheesh K Singh, Soumik Sarkar, Arti Singh, Baskar Ganapathysubramanian

    Abstract: Insect-pests significantly impact global agricultural productivity and quality. Effective management involves identifying the full insect community, including beneficial insects and harmful pests, to develop and implement integrated pest management strategies. Automated identification of insects under real-world conditions presents several challenges, including differentiating similar-looking spec… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

  34. arXiv:2304.14294  [pdf, other

    cs.RO

    Deep Imitation Learning for Automated Drop-In Gamma Probe Manipulation

    Authors: Kaizhong Deng, Baoru Huang, Daniel S. Elson

    Abstract: The increasing prevalence of prostate cancer has led to the widespread adoption of Robotic-Assisted Surgery (RAS) as a treatment option. Sentinel lymph node biopsy (SLNB) is a crucial component of prostate cancer surgery and requires accurate diagnostic evidence. This procedure can be improved by using a drop-in gamma probe, SENSEI system, to distinguish cancerous tissue from normal tissue. Howeve… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: Accepted for publication in Hamlyn Symposium on Medical Robotics, 2023

  35. arXiv:2304.12317  [pdf, other

    cs.CV cs.GR cs.LG

    Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis

    Authors: Chonghyuk Song, Gengshan Yang, Kangle Deng, Jun-Yan Zhu, Deva Ramanan

    Abstract: We explore the task of embodied view synthesis from monocular videos of deformable scenes. Given a minute-long RGBD video of people interacting with their pets, we render the scene from novel camera trajectories derived from the in-scene motion of actors: (1) egocentric cameras that simulate the point of view of a target actor and (2) 3rd-person cameras that follow the actor. Building such a syste… ▽ More

    Submitted 2 October, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: ICCV 2023 camera-ready version. Project page with code, models, and data: https://andrewsonga.github.io/totalrecon

  36. arXiv:2304.03556  [pdf

    cs.CV

    Construction of unbiased dental template and parametric dental model for precision digital dentistry

    Authors: Lei Ma, Jingyang Zhang, Ke Deng, Peng Xue, Zhiming Cui, Yu Fang, Minhui Tang, Yue Zhao, Min Zhu, Zhongxiang Ding, Dinggang Shen

    Abstract: Dental template and parametric dental models are important tools for various applications in digital dentistry. However, constructing an unbiased dental template and accurate parametric dental models remains a challenging task due to the complex anatomical and morphological dental structures and also low volume ratio of the teeth. In this study, we develop an unbiased dental template by constructi… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

  37. arXiv:2303.09611  [pdf, other

    math.OC cs.LG

    Decentralized Riemannian natural gradient methods with Kronecker-product approximations

    Authors: Jiang Hu, Kangkang Deng, Na Li, Quanzheng Li

    Abstract: With a computationally efficient approximation of the second-order information, natural gradient methods have been successful in solving large-scale structured optimization problems. We study the natural gradient methods for the large-scale decentralized optimization problems on Riemannian manifolds, where the local objective function defined by the local dataset is of a log-probability type. By u… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: 17 pages

  38. arXiv:2302.08579  [pdf, other

    eess.AS cs.SD

    Adaptable End-to-End ASR Models using Replaceable Internal LMs and Residual Softmax

    Authors: Keqi Deng, Philip C. Woodland

    Abstract: End-to-end (E2E) automatic speech recognition (ASR) implicitly learns the token sequence distribution of paired audio-transcript training data. However, it still suffers from domain shifts from training to testing, and domain adaptation is still challenging. To alleviate this problem, this paper designs a replaceable internal language model (RILM) method, which makes it feasible to directly replac… ▽ More

    Submitted 14 March, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP2023

  39. arXiv:2302.08509  [pdf, other

    cs.CV cs.GR cs.LG

    3D-aware Conditional Image Synthesis

    Authors: Kangle Deng, Gengshan Yang, Deva Ramanan, Jun-Yan Zhu

    Abstract: We propose pix2pix3D, a 3D-aware conditional generative model for controllable photorealistic image synthesis. Given a 2D label map, such as a segmentation or edge map, our model learns to synthesize a corresponding image from different viewpoints. To enable explicit 3D user control, we extend conditional generative models with neural radiance fields. Given widely-available monocular images and la… ▽ More

    Submitted 1 May, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: Project Page: https://www.cs.cmu.edu/~pix2pix3D/

  40. arXiv:2302.06793  [pdf, other

    cs.CV

    HR-NeuS: Recovering High-Frequency Surface Geometry via Neural Implicit Surfaces

    Authors: Erich Liang, Kenan Deng, Xi Zhang, Chun-Kai Wang

    Abstract: Recent advances in neural implicit surfaces for multi-view 3D reconstruction primarily focus on improving large-scale surface reconstruction accuracy, but often produce over-smoothed geometries that lack fine surface details. To address this, we present High-Resolution NeuS (HR-NeuS), a novel neural implicit surface reconstruction method that recovers high-frequency surface geometry while maintain… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

  41. arXiv:2207.08609  [pdf, other

    cs.CV

    ExAgt: Expert-guided Augmentation for Representation Learning of Traffic Scenarios

    Authors: Lakshman Balasubramanian, Jonas Wurst, Robin Egolf, Michael Botsch, Wolfgang Utschick, Ke Deng

    Abstract: Representation learning in recent years has been addressed with self-supervised learning methods. The input data is augmented into two distorted views and an encoder learns the representations that are invariant to distortions -- cross-view prediction. Augmentation is one of the key components in cross-view self-supervised learning frameworks to learn visual representations. This paper presents Ex… ▽ More

    Submitted 20 July, 2022; v1 submitted 18 July, 2022; originally announced July 2022.

    Comments: Accepted as a conference paper in ITSC 2022, Macau, China

  42. arXiv:2207.02495  [pdf, other

    eess.AS cs.SD

    Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies

    Authors: Zehan Li, Haoran Miao, Keqi Deng, Gaofeng Cheng, Sanli Tian, Ta Li, Yonghong Yan

    Abstract: There is often a trade-off between performance and latency in streaming automatic speech recognition (ASR). Traditional methods such as look-ahead and chunk-based methods, usually require information from future frames to advance recognition accuracy, which incurs inevitable latency even if the computation is fast enough. A causal model that computes without any future frames can avoid this latenc… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: Accepted by Interspeech 2022

  43. arXiv:2204.08920  [pdf, other

    cs.CL cs.SD eess.AS

    Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation

    Authors: Keqi Deng, Shinji Watanabe, Jiatong Shi, Siddhant Arora

    Abstract: Although Transformers have gained success in several speech processing tasks like spoken language understanding (SLU) and speech translation (ST), achieving online processing while keeping competitive performance is still essential for real-world interaction. In this paper, we take the first step on streaming SLU and simultaneous ST using a blockwise streaming Transformer, which is based on contex… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

    Comments: Submitted to Interspeech2022

  44. arXiv:2204.07434  [pdf, other

    cs.CL cs.AI

    ERGO: Event Relational Graph Transformer for Document-level Event Causality Identification

    Authors: Meiqi Chen, Yixin Cao, Kunquan Deng, Mukai Li, Kun Wang, Jing Shao, Yan Zhang

    Abstract: Document-level Event Causality Identification (DECI) aims to identify causal relations between event pairs in a document. It poses a great challenge of across-sentence reasoning without clear causal indicators. In this paper, we propose a novel Event Relational Graph TransfOrmer (ERGO) framework for DECI, which improves existing state-of-the-art (SOTA) methods upon two aspects. First, we formulate… ▽ More

    Submitted 15 April, 2022; originally announced April 2022.

  45. Identifying Cost-effective Debunkers for Multi-stage Fake News Mitigation Campaigns

    Authors: Xiaofei Xu, Ke Deng, Xiuzhen Zhang

    Abstract: Online social networks have become a fertile ground for spreading fake news. Methods to automatically mitigate fake news propagation have been proposed. Some studies focus on selecting top k influential users on social networks as debunkers, but the social influence of debunkers may not translate to wide mitigation information propagation as expected. Other studies assume a given set of debunkers… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

  46. arXiv:2203.16250  [pdf, other

    cs.CV

    PP-YOLOE: An evolved version of YOLO

    Authors: Shangliang Xu, Xinxin Wang, Wenyu Lv, Qinyao Chang, Cheng Cui, Kaipeng Deng, Guanzhong Wang, Qingqing Dang, Shengyu Wei, Yuning Du, Baohua Lai

    Abstract: In this report, we present PP-YOLOE, an industrial state-of-the-art object detector with high performance and friendly deployment. We optimize on the basis of the previous PP-YOLOv2, using anchor-free paradigm, more powerful backbone and neck equipped with CSPRepResStage, ET-head and dynamic label assignment algorithm TAL. We provide s/m/l/x models for different practice scenarios. As a result, PP… ▽ More

    Submitted 11 December, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: 7 pages, 3 figures, 4 tables

  47. Methods2Test: A dataset of focal methods mapped to test cases

    Authors: Michele Tufano, Shao Kun Deng, Neel Sundaresan, Alexey Svyatkovskiy

    Abstract: Unit testing is an essential part of the software development process, which helps to identify issues with source code in early stages of development and prevent regressions. Machine learning has emerged as viable approach to help software developers generate automated unit tests. However, generating reliable unit test cases that are semantically correct and capable of catching software bugs or un… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

    Comments: Accepted for publication in the proceedings of The 2022 Mining Software Repositories Conference (MSR 2022) - Data and Tool track

  48. arXiv:2203.03582  [pdf, other

    cs.CL cs.SD eess.AS

    Improving CTC-based speech recognition via knowledge transferring from pre-trained language models

    Authors: Keqi Deng, Songjun Cao, Yike Zhang, Long Ma, Gaofeng Cheng, Ji Xu, Pengyuan Zhang

    Abstract: Recently, end-to-end automatic speech recognition models based on connectionist temporal classification (CTC) have achieved impressive results, especially when fine-tuned from wav2vec2.0 models. Due to the conditional independence assumption, CTC-based models are always weaker than attention-based encoder-decoder models and require the assistance of external language models (LMs). To solve this is… ▽ More

    Submitted 22 February, 2022; originally announced March 2022.

    Comments: ICASSP 2022

  49. arXiv:2202.09609  [pdf, other

    eess.IV cs.CV

    A Lightweight Dual-Domain Attention Framework for Sparse-View CT Reconstruction

    Authors: Chang Sun, Ken Deng, Yitong Liu, Hongwen Yang

    Abstract: Computed Tomography (CT) plays an essential role in clinical diagnosis. Due to the adverse effects of radiation on patients, the radiation dose is expected to be reduced as low as possible. Sparse sampling is an effective way, but it will lead to severe artifacts on the reconstructed CT image, thus sparse-view CT image reconstruction has been a prevailing and challenging research area. With the po… ▽ More

    Submitted 19 February, 2022; originally announced February 2022.

  50. arXiv:2201.10103  [pdf, other

    eess.AS cs.SD

    Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models

    Authors: Keqi Deng, Zehui Yang, Shinji Watanabe, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang

    Abstract: While Transformers have achieved promising results in end-to-end (E2E) automatic speech recognition (ASR), their autoregressive (AR) structure becomes a bottleneck for speeding up the decoding process. For real-world deployment, ASR systems are desired to be highly accurate while achieving fast inference. Non-autoregressive (NAR) models have become a popular alternative due to their fast inference… ▽ More

    Submitted 26 January, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

    Comments: Accepted by ICASSP2022