subscribe to arXiv mailings

Cross-Document Event-Keyed Summarization

Authors: William Walden, Pavlo Kuchmiichuk, Alexander Martin, Chihsheng Jin, Angela Cao, Claire Sun, Curisia Allen, Aaron Steven White

Abstract: Event-keyed summarization (EKS) requires generating a summary about a specific event described in a document, given the document and an event representation extracted from it. In this work, we extend EKS to the cross-document setting (CDEKS), in which summaries must synthesize information from accounts of the same event given by multiple sources. We introduce SEAMUS (Summaries of Events Across Mul… ▽ More Event-keyed summarization (EKS) requires generating a summary about a specific event described in a document, given the document and an event representation extracted from it. In this work, we extend EKS to the cross-document setting (CDEKS), in which summaries must synthesize information from accounts of the same event given by multiple sources. We introduce SEAMUS (Summaries of Events Across Multiple Sources), a high-quality dataset for CDEKS based on an expert reannotation of the FAMUS dataset for cross-document argument extraction. We present a suite of baselines on SEAMUS, covering both smaller, fine-tuned models, as well as zero- and few-shot prompted LLMs, along with detailed ablations, and a human evaluation study, showing SEAMUS to be a valuable benchmark for this new task. △ Less

Submitted 18 October, 2024; originally announced October 2024.

arXiv:2410.09746 [pdf, other]

Autoionization-enhanced Rydberg dressing by fast contaminant removal

Authors: Alec Cao, Theodor Lukin Yelin, William J. Eckner, Nelson Darkwah Oppong, Adam M. Kaufman

Abstract: Rydberg dressing is a powerful tool for entanglement generation in long-lived atomic states. While already employed effectively in several demonstrations, a key challenge for this technique is the collective loss triggered by blackbody-radiation-driven transitions to contaminant Rydberg states of opposite parity. We demonstrate the rapid removal of such contaminants using autoionization (AI) trans… ▽ More Rydberg dressing is a powerful tool for entanglement generation in long-lived atomic states. While already employed effectively in several demonstrations, a key challenge for this technique is the collective loss triggered by blackbody-radiation-driven transitions to contaminant Rydberg states of opposite parity. We demonstrate the rapid removal of such contaminants using autoionization (AI) transitions found in alkaline-earth-like atoms. The AI is shown to be compatible with coherent operation of an array of optical clock qubits. By incorporating AI pulses into a stroboscopic Rydberg dressing (SRD) sequence, we enhance lifetimes by an order of magnitude for system sizes of up to 144 atoms, while maintaining an order of magnitude larger duty cycle than previously achieved. To highlight the utility of our approach, we use the AI-enhanced SRD protocol to improve the degree of spin-squeezing achieved during early-time dressing dynamics. These results bring Rydberg dressing lifetimes closer to fundamental limits, opening the door to previously infeasible dressing proposals. △ Less

Submitted 13 October, 2024; originally announced October 2024.

Comments: 8 pages, 4 figures + Supplementary Material 7 pages, 6 figures

arXiv:2410.08211 [pdf, other]

LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts

Authors: Anh-Quan Cao, Maximilian Jaritz, Matthieu Guillaumin, Raoul de Charette, Loris Bazzani

Abstract: Large-scale vision-language pre-trained (VLP) models (e.g., CLIP) are renowned for their versatility, as they can be applied to diverse applications in a zero-shot setup. However, when these models are used in specific domains, their performance often falls short due to domain gaps or the under-representation of these domains in the training data. While fine-tuning VLP models on custom datasets wi… ▽ More Large-scale vision-language pre-trained (VLP) models (e.g., CLIP) are renowned for their versatility, as they can be applied to diverse applications in a zero-shot setup. However, when these models are used in specific domains, their performance often falls short due to domain gaps or the under-representation of these domains in the training data. While fine-tuning VLP models on custom datasets with human-annotated labels can address this issue, annotating even a small-scale dataset (e.g., 100k samples) can be an expensive endeavor, often requiring expert annotators if the task is complex. To address these challenges, we propose LatteCLIP, an unsupervised method for fine-tuning CLIP models on classification with known class names in custom domains, without relying on human annotations. Our method leverages Large Multimodal Models (LMMs) to generate expressive textual descriptions for both individual images and groups of images. These provide additional contextual information to guide the fine-tuning process in the custom domains. Since LMM-generated descriptions are prone to hallucination or missing details, we introduce a novel strategy to distill only the useful information and stabilize the training. Specifically, we learn rich per-class prototype representations from noisy generated texts and dual pseudo-labels. Our experiments on 10 domain-specific datasets show that LatteCLIP outperforms pre-trained zero-shot methods by an average improvement of +4.74 points in top-1 accuracy and other state-of-the-art unsupervised methods by +3.45 points. △ Less

Submitted 10 October, 2024; originally announced October 2024.

arXiv:2409.08307 [pdf, other]

MedSegMamba: 3D CNN-Mamba Hybrid Architecture for Brain Segmentation

Authors: Aaron Cao, Zongyu Li, Jordan Jomsky, Andrew F. Laine, Jia Guo

Abstract: Widely used traditional pipelines for subcortical brain segmentation are often inefficient and slow, particularly when processing large datasets. Furthermore, deep learning models face challenges due to the high resolution of MRI images and the large number of anatomical classes involved. To address these limitations, we developed a 3D patch-based hybrid CNN-Mamba model that leverages Mamba's sele… ▽ More Widely used traditional pipelines for subcortical brain segmentation are often inefficient and slow, particularly when processing large datasets. Furthermore, deep learning models face challenges due to the high resolution of MRI images and the large number of anatomical classes involved. To address these limitations, we developed a 3D patch-based hybrid CNN-Mamba model that leverages Mamba's selective scan algorithm, thereby enhancing segmentation accuracy and efficiency for 3D inputs. This retrospective study utilized 1784 T1-weighted MRI scans from a diverse, multi-site dataset of healthy individuals. The dataset was divided into training, validation, and testing sets with a 1076/345/363 split. The scans were obtained from 1.5T and 3T MRI machines. Our model's performance was validated against several benchmarks, including other CNN-Mamba, CNN-Transformer, and pure CNN networks, using FreeSurfer-generated ground truths. We employed the Dice Similarity Coefficient (DSC), Volume Similarity (VS), and Average Symmetric Surface Distance (ASSD) as evaluation metrics. Statistical significance was determined using the Wilcoxon signed-rank test with a threshold of P < 0.05. The proposed model achieved the highest overall performance across all metrics (DSC 0.88383; VS 0.97076; ASSD 0.33604), significantly outperforming all non-Mamba-based models (P < 0.001). While the model did not show significant improvement in DSC or VS compared to another Mamba-based model (P-values of 0.114 and 0.425), it demonstrated a significant enhancement in ASSD (P < 0.001) with approximately 20% fewer parameters. In conclusion, our proposed hybrid CNN-Mamba architecture offers an efficient and accurate approach for 3D subcortical brain segmentation, demonstrating potential advantages over existing methods. Code is available at: https://github.com/aaroncao06/MedSegMamba. △ Less

Submitted 13 October, 2024; v1 submitted 11 September, 2024; originally announced September 2024.

Comments: 14 pages, 8 figures

arXiv:2408.16924 [pdf, other]

Enhancing Autism Spectrum Disorder Early Detection with the Parent-Child Dyads Block-Play Protocol and an Attention-enhanced GCN-xLSTM Hybrid Deep Learning Framework

Authors: Xiang Li, Lizhou Fan, Hanbo Wu, Kunping Chen, Xiaoxiao Yu, Chao Che, Zhifeng Cai, Xiuhong Niu, Aihua Cao, Xin Ma

Abstract: Autism Spectrum Disorder (ASD) is a rapidly growing neurodevelopmental disorder. Performing a timely intervention is crucial for the growth of young children with ASD, but traditional clinical screening methods lack objectivity. This study introduces an innovative approach to early detection of ASD. The contributions are threefold. First, this work proposes a novel Parent-Child Dyads Block-Play (P… ▽ More Autism Spectrum Disorder (ASD) is a rapidly growing neurodevelopmental disorder. Performing a timely intervention is crucial for the growth of young children with ASD, but traditional clinical screening methods lack objectivity. This study introduces an innovative approach to early detection of ASD. The contributions are threefold. First, this work proposes a novel Parent-Child Dyads Block-Play (PCB) protocol, grounded in kinesiological and neuroscientific research, to identify behavioral patterns distinguishing ASD from typically developing (TD) toddlers. Second, we have compiled a substantial video dataset, featuring 40 ASD and 89 TD toddlers engaged in block play with parents. This dataset exceeds previous efforts on both the scale of participants and the length of individual sessions. Third, our approach to action analysis in videos employs a hybrid deep learning framework, integrating a two-stream graph convolution network with attention-enhanced xLSTM (2sGCN-AxLSTM). This framework is adept at capturing dynamic interactions between toddlers and parents by extracting spatial features correlated with upper body and head movements and focusing on global contextual information of action sequences over time. By learning these global features with spatio-temporal correlations, our 2sGCN-AxLSTM effectively analyzes dynamic human behavior patterns and demonstrates an unprecedented accuracy of 89.6\% in early detection of ASD. Our approach shows strong potential for enhancing early ASD diagnosis by accurately analyzing parent-child interactions, providing a critical tool to support timely and informed clinical decision-making. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: 18 pages, 8 figures, and 4 tables

arXiv:2407.09922 [pdf]

Transcranial low-level laser stimulation in near infrared-II region for brain safety and protection

Authors: Zhilin Li, Yongheng Zhao, Yiqing Hu, Yang Li, Keyao Zhang, Zhibing Gao, Lirou Tan, Hanli Liu, Xiaoli Li, Aihua Cao, Zaixu Cui, Chenguang Zhao

Abstract: Background: The use of near-infrared lasers for transcranial photobiomodulation (tPBM) offers a non-invasive method for influencing brain activity and is beneficial for various neurological conditions. Objective: To investigate the safety and neuroprotective properties of tPBM using near-infrared (NIR)-II laser stimulation. Methods: We conducted thirteen experiments involving multidimensional and… ▽ More Background: The use of near-infrared lasers for transcranial photobiomodulation (tPBM) offers a non-invasive method for influencing brain activity and is beneficial for various neurological conditions. Objective: To investigate the safety and neuroprotective properties of tPBM using near-infrared (NIR)-II laser stimulation. Methods: We conducted thirteen experiments involving multidimensional and quantitative methods and measured serum neurobiomarkers, performed electroencephalogram (EEG) and magnetic resonance imaging (MRI) scans, assessed executive functions, and collected a subjective questionnaire. Results: Significant reductions (n=15) in neuron specific enolase (NSE) levels were observed after treatment, indicating neuroprotective effects. No structural or functional brain abnormalities were observed, confirming the safety of tPBM. Additionally, cognitive and executive functions were not impaired, with participants' feedback indicating minimal discomfort. Conclusions: Our data indicate that NIR-II tPBM is safe with specific parameters, highlighting its potential for brain protection. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.02599 [pdf, other]

Meta 3D Gen

Authors: Raphael Bensadoun, Tom Monnier, Yanir Kleiman, Filippos Kokkinos, Yawar Siddiqui, Mahendra Kariya, Omri Harosh, Roman Shapovalov, Benjamin Graham, Emilien Garreau, Animesh Karnewar, Ang Cao, Idan Azuri, Iurii Makarov, Eric-Tuan Le, Antoine Toisoul, David Novotny, Oran Gafni, Natalia Neverova, Andrea Vedaldi

Abstract: We introduce Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minute. It supports physically-based rendering (PBR), necessary for 3D asset relighting in real-world applications. Additionally, 3DGen supports generative retexturing of previously gener… ▽ More We introduce Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minute. It supports physically-based rendering (PBR), necessary for 3D asset relighting in real-world applications. Additionally, 3DGen supports generative retexturing of previously generated (or artist-created) 3D shapes using additional textual inputs provided by the user. 3DGen integrates key technical components, Meta 3D AssetGen and Meta 3D TextureGen, that we developed for text-to-3D and text-to-texture generation, respectively. By combining their strengths, 3DGen represents 3D objects simultaneously in three ways: in view space, in volumetric space, and in UV (or texture) space. The integration of these two techniques achieves a win rate of 68% with respect to the single-stage model. We compare 3DGen to numerous industry baselines, and show that it outperforms them in terms of prompt fidelity and visual quality for complex textual prompts, while being significantly faster. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2405.16248 [pdf]

Combining Radiomics and Machine Learning Approaches for Objective ASD Diagnosis: Verifying White Matter Associations with ASD

Authors: Junlin Song, Yuzhuo Chen, Yuan Yao, Zetong Chen, Renhao Guo, Lida Yang, Xinyi Sui, Qihang Wang, Xijiao Li, Aihua Cao, Wei Li

Abstract: Autism Spectrum Disorder is a condition characterized by a typical brain development leading to impairments in social skills, communication abilities, repetitive behaviors, and sensory processing. There have been many studies combining brain MRI images with machine learning algorithms to achieve objective diagnosis of autism, but the correlation between white matter and autism has not been fully u… ▽ More Autism Spectrum Disorder is a condition characterized by a typical brain development leading to impairments in social skills, communication abilities, repetitive behaviors, and sensory processing. There have been many studies combining brain MRI images with machine learning algorithms to achieve objective diagnosis of autism, but the correlation between white matter and autism has not been fully utilized. To address this gap, we develop a computer-aided diagnostic model focusing on white matter regions in brain MRI by employing radiomics and machine learning methods. This study introduced a MultiUNet model for segmenting white matter, leveraging the UNet architecture and utilizing manually segmented MRI images as the training data. Subsequently, we extracted white matter features using the Pyradiomics toolkit and applied different machine learning models such as Support Vector Machine, Random Forest, Logistic Regression, and K-Nearest Neighbors to predict autism. The prediction sets all exceeded 80% accuracy. Additionally, we employed Convolutional Neural Network to analyze segmented white matter images, achieving a prediction accuracy of 86.84%. Notably, Support Vector Machine demonstrated the highest prediction accuracy at 89.47%. These findings not only underscore the efficacy of the models but also establish a link between white matter abnormalities and autism. Our study contributes to a comprehensive evaluation of various diagnostic models for autism and introduces a computer-aided diagnostic algorithm for early and objective autism diagnosis based on MRI white matter regions. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.09150 [pdf, other]

Curriculum Dataset Distillation

Authors: Zhiheng Ma, Anjia Cao, Funing Yang, Xing Wei

Abstract: Most dataset distillation methods struggle to accommodate large-scale datasets due to their substantial computational and memory requirements. In this paper, we present a curriculum-based dataset distillation framework designed to harmonize scalability with efficiency. This framework strategically distills synthetic images, adhering to a curriculum that transitions from simple to complex. By incor… ▽ More Most dataset distillation methods struggle to accommodate large-scale datasets due to their substantial computational and memory requirements. In this paper, we present a curriculum-based dataset distillation framework designed to harmonize scalability with efficiency. This framework strategically distills synthetic images, adhering to a curriculum that transitions from simple to complex. By incorporating curriculum evaluation, we address the issue of previous methods generating images that tend to be homogeneous and simplistic, doing so at a manageable computational cost. Furthermore, we introduce adversarial optimization towards synthetic images to further improve their representativeness and safeguard against their overfitting to the neural network involved in distilling. This enhances the generalization capability of the distilled images across various neural network architectures and also increases their robustness to noise. Extensive experiments demonstrate that our framework sets new benchmarks in large-scale dataset distillation, achieving substantial improvements of 11.1\% on Tiny-ImageNet, 9.0\% on ImageNet-1K, and 7.3\% on ImageNet-21K. The source code will be released to the community. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2404.19760 [pdf, other]

Lightplane: Highly-Scalable Components for Neural 3D Fields

Authors: Ang Cao, Justin Johnson, Andrea Vedaldi, David Novotny

Abstract: Contemporary 3D research, particularly in reconstruction and generation, heavily relies on 2D images for inputs or supervision. However, current designs for these 2D-3D mapping are memory-intensive, posing a significant bottleneck for existing methods and hindering new applications. In response, we propose a pair of highly scalable components for 3D neural fields: Lightplane Render and Splatter, w… ▽ More Contemporary 3D research, particularly in reconstruction and generation, heavily relies on 2D images for inputs or supervision. However, current designs for these 2D-3D mapping are memory-intensive, posing a significant bottleneck for existing methods and hindering new applications. In response, we propose a pair of highly scalable components for 3D neural fields: Lightplane Render and Splatter, which significantly reduce memory usage in 2D-3D mapping. These innovations enable the processing of vastly more and higher resolution images with small memory and computational costs. We demonstrate their utility in various applications, from benefiting single-scene optimization with image-level losses to realizing a versatile pipeline for dramatically scaling 3D reconstruction and generation. Code: \url{https://github.com/facebookresearch/lightplane}. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: Project Page: https://lightplane.github.io/ Code: https://github.com/facebookresearch/lightplane

arXiv:2404.10279 [pdf, other]

EucliDreamer: Fast and High-Quality Texturing for 3D Models with Depth-Conditioned Stable Diffusion

Authors: Cindy Le, Congrui Hetang, Chendi Lin, Ang Cao, Yihui He

Abstract: We present EucliDreamer, a simple and effective method to generate textures for 3D models given text prompts and meshes. The texture is parametrized as an implicit function on the 3D surface, which is optimized with the Score Distillation Sampling (SDS) process and differentiable rendering. To generate high-quality textures, we leverage a depth-conditioned Stable Diffusion model guided by the dept… ▽ More We present EucliDreamer, a simple and effective method to generate textures for 3D models given text prompts and meshes. The texture is parametrized as an implicit function on the 3D surface, which is optimized with the Score Distillation Sampling (SDS) process and differentiable rendering. To generate high-quality textures, we leverage a depth-conditioned Stable Diffusion model guided by the depth image rendered from the mesh. We test our approach on 3D models in Objaverse and conducted a user study, which shows its superior quality compared to existing texturing methods like Text2Tex. In addition, our method converges 2 times faster than DreamFusion. Through text prompting, textures of diverse art styles can be produced. We hope Euclidreamer proides a viable solution to automate a labor-intensive stage in 3D content creation. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: Short version of arXiv:2311.15573

arXiv:2404.02928 [pdf, other]

Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models

Authors: Jiachen Ma, Anda Cao, Zhiqing Xiao, Yijiang Li, Jie Zhang, Chao Ye, Junbo Zhao

Abstract: Text-to-image (T2I) models can be maliciously used to generate harmful content such as sexually explicit, unfaithful, and misleading or Not-Safe-for-Work (NSFW) images. Previous attacks largely depend on the availability of the diffusion model or involve a lengthy optimization process. In this work, we investigate a more practical and universal attack that does not require the presence of a target… ▽ More Text-to-image (T2I) models can be maliciously used to generate harmful content such as sexually explicit, unfaithful, and misleading or Not-Safe-for-Work (NSFW) images. Previous attacks largely depend on the availability of the diffusion model or involve a lengthy optimization process. In this work, we investigate a more practical and universal attack that does not require the presence of a target model and demonstrate that the high-dimensional text embedding space inherently contains NSFW concepts that can be exploited to generate harmful images. We present the Jailbreaking Prompt Attack (JPA). JPA first searches for the target malicious concepts in the text embedding space using a group of antonyms generated by ChatGPT. Subsequently, a prefix prompt is optimized in the discrete vocabulary space to align malicious concepts semantically in the text embedding space. We further introduce a soft assignment with gradient masking technique that allows us to perform gradient ascent in the discrete vocabulary space. We perform extensive experiments with open-sourced T2I models, e.g. stable-diffusion-v1-4 and closed-sourced online services, e.g. DALLE2, Midjourney with black-box safety checkers. Results show that (1) JPA bypasses both text and image safety checkers (2) while preserving high semantic alignment with the target prompt. (3) JPA demonstrates a much faster speed than previous methods and can be executed in a fully automated manner. These merits render it a valuable tool for robustness evaluation in future text-to-image generation research. △ Less

Submitted 4 September, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2402.16289 [pdf, other]

doi 10.1038/s41586-024-07913-z

Multi-qubit gates and Schrödinger cat states in an optical clock

Authors: Alec Cao, William J. Eckner, Theodor Lukin Yelin, Aaron W. Young, Sven Jandura, Lingfeng Yan, Kyungtae Kim, Guido Pupillo, Jun Ye, Nelson Darkwah Oppong, Adam M. Kaufman

Abstract: Many-particle entanglement is a key resource for achieving the fundamental precision limits of a quantum sensor. Optical atomic clocks, the current state-of-the-art in frequency precision, are a rapidly emerging area of focus for entanglement-enhanced metrology. Augmenting tweezer-based clocks featuring microscopic control and detection with the high-fidelity entangling gates developed for atom-ar… ▽ More Many-particle entanglement is a key resource for achieving the fundamental precision limits of a quantum sensor. Optical atomic clocks, the current state-of-the-art in frequency precision, are a rapidly emerging area of focus for entanglement-enhanced metrology. Augmenting tweezer-based clocks featuring microscopic control and detection with the high-fidelity entangling gates developed for atom-array information processing offers a promising route towards leveraging highly entangled quantum states for improved optical clocks. Here we develop and employ a family of multi-qubit Rydberg gates to generate Schrödinger cat states of the Greenberger-Horne-Zeilinger (GHZ) type with up to 9 optical clock qubits in a programmable atom array. In an atom-laser comparison at sufficiently short dark times, we demonstrate a fractional frequency instability below the standard quantum limit using GHZ states of up to 4 qubits. However, due to their reduced dynamic range, GHZ states of a single size fail to improve the achievable clock precision at the optimal dark time compared to unentangled atoms. Towards overcoming this hurdle, we simultaneously prepare a cascade of varying-size GHZ states to perform unambiguous phase estimation over an extended interval. These results demonstrate key building blocks for approaching Heisenberg-limited scaling of optical atomic clock precision. △ Less

Submitted 13 October, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

Comments: 22 pages, 7 figures, 2 tables, corrected typo in Eq. (13) and added journal reference

Journal ref: Nature 634, 315-320 (2024)

arXiv:2402.14816 [pdf, other]

Universal quantum dynamics of Bose polarons

Authors: Jiří Etrych, Gevorg Martirosyan, Alec Cao, Christopher J. Ho, Zoran Hadzibabic, Christoph Eigen

Abstract: Predicting the emergent properties of impurities immersed in a quantum bath is a fundamental challenge that can defy quasiparticle treatments. Here, we measure the spectral properties and real-time dynamics of mobile impurities injected into a homogeneous Bose--Einstein condensate, using two Feshbach resonances to tune both the impurity-bath and intrabath interactions. We map out both attractive a… ▽ More Predicting the emergent properties of impurities immersed in a quantum bath is a fundamental challenge that can defy quasiparticle treatments. Here, we measure the spectral properties and real-time dynamics of mobile impurities injected into a homogeneous Bose--Einstein condensate, using two Feshbach resonances to tune both the impurity-bath and intrabath interactions. We map out both attractive and repulsive branches of polaron quasiparticles, resolving the repulsive polaron and the molecular state associated with the Feshbach resonance in the strongly interacting regime, and show that the latter also has a many-body character. Our measurements reveal remarkably universal behavior, controlled by the bath density and a single dimensionless interaction parameter; for near-resonant interactions the polarons are no longer well defined, but the universality still holds. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: Main text (6 pages, 5 figures), Supplementary Material (4 pages, 8 figures)

arXiv:2312.17142 [pdf, other]

DreamGaussian4D: Generative 4D Gaussian Splatting

Authors: Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, Ziwei Liu

Abstract: 4D content generation has achieved remarkable progress recently. However, existing methods suffer from long optimization times, a lack of motion controllability, and a low quality of details. In this paper, we introduce DreamGaussian4D (DG4D), an efficient 4D generation framework that builds on Gaussian Splatting (GS). Our key insight is that combining explicit modeling of spatial transformations… ▽ More 4D content generation has achieved remarkable progress recently. However, existing methods suffer from long optimization times, a lack of motion controllability, and a low quality of details. In this paper, we introduce DreamGaussian4D (DG4D), an efficient 4D generation framework that builds on Gaussian Splatting (GS). Our key insight is that combining explicit modeling of spatial transformations with static GS makes an efficient and powerful representation for 4D generation. Moreover, video generation methods have the potential to offer valuable spatial-temporal priors, enhancing the high-quality 4D generation. Specifically, we propose an integral framework with two major modules: 1) Image-to-4D GS - we initially generate static GS with DreamGaussianHD, followed by HexPlane-based dynamic generation with Gaussian deformation; and 2) Video-to-Video Texture Refinement - we refine the generated UV-space texture maps and meanwhile enhance their temporal consistency by utilizing a pre-trained image-to-video diffusion model. Notably, DG4D reduces the optimization time from several hours to just a few minutes, allows the generated 3D motion to be visually controlled, and produces animated meshes that can be realistically rendered in 3D engines. △ Less

Submitted 10 June, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: Technical report. Project page is at https://jiawei-ren.github.io/projects/dreamgaussian4d Code is at https://github.com/jiawei-ren/dreamgaussian4d

arXiv:2312.08267 [pdf, other]

TABSurfer: a Hybrid Deep Learning Architecture for Subcortical Segmentation

Authors: Aaron Cao, Vishwanatha M. Rao, Kejia Liu, Xinru Liu, Andrew F. Laine, Jia Guo

Abstract: Subcortical segmentation remains challenging despite its important applications in quantitative structural analysis of brain MRI scans. The most accurate method, manual segmentation, is highly labor intensive, so automated tools like FreeSurfer have been adopted to handle this task. However, these traditional pipelines are slow and inefficient for processing large datasets. In this study, we propo… ▽ More Subcortical segmentation remains challenging despite its important applications in quantitative structural analysis of brain MRI scans. The most accurate method, manual segmentation, is highly labor intensive, so automated tools like FreeSurfer have been adopted to handle this task. However, these traditional pipelines are slow and inefficient for processing large datasets. In this study, we propose TABSurfer, a novel 3D patch-based CNN-Transformer hybrid deep learning model designed for superior subcortical segmentation compared to existing state-of-the-art tools. To evaluate, we first demonstrate TABSurfer's consistent performance across various T1w MRI datasets with significantly shorter processing times compared to FreeSurfer. Then, we validate against manual segmentations, where TABSurfer outperforms FreeSurfer based on the manual ground truth. In each test, we also establish TABSurfer's advantage over a leading deep learning benchmark, FastSurferVINN. Together, these studies highlight TABSurfer's utility as a powerful tool for fully automated subcortical segmentation with high fidelity. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: 5 pages, 3 figures, 2 tables

arXiv:2312.05279 [pdf]

Quantitative perfusion maps using a novelty spatiotemporal convolutional neural network

Authors: Anbo Cao, Pin-Yu Le, Zhonghui Qie, Haseeb Hassan, Yingwei Guo, Asim Zaman, Jiaxi Lu, Xueqiang Zeng, Huihui Yang, Xiaoqiang Miao, Taiyu Han, Guangtao Huang, Yan Kang, Yu Luo, Jia Guo

Abstract: Dynamic susceptibility contrast magnetic resonance imaging (DSC-MRI) is widely used to evaluate acute ischemic stroke to distinguish salvageable tissue and infarct core. For this purpose, traditional methods employ deconvolution techniques, like singular value decomposition, which are known to be vulnerable to noise, potentially distorting the derived perfusion parameters. However, deep learning t… ▽ More Dynamic susceptibility contrast magnetic resonance imaging (DSC-MRI) is widely used to evaluate acute ischemic stroke to distinguish salvageable tissue and infarct core. For this purpose, traditional methods employ deconvolution techniques, like singular value decomposition, which are known to be vulnerable to noise, potentially distorting the derived perfusion parameters. However, deep learning technology could leverage it, which can accurately estimate clinical perfusion parameters compared to traditional clinical approaches. Therefore, this study presents a perfusion parameters estimation network that considers spatial and temporal information, the Spatiotemporal Network (ST-Net), for the first time. The proposed network comprises a designed physical loss function to enhance model performance further. The results indicate that the network can accurately estimate perfusion parameters, including cerebral blood volume (CBV), cerebral blood flow (CBF), and time to maximum of the residual function (Tmax). The structural similarity index (SSIM) mean values for CBV, CBF, and Tmax parameters were 0.952, 0.943, and 0.863, respectively. The DICE score for the hypo-perfused region reached 0.859, demonstrating high consistency. The proposed model also maintains time efficiency, closely approaching the performance of commercial gold-standard software. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.02158 [pdf, other]

PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness

Authors: Anh-Quan Cao, Angela Dai, Raoul de Charette

Abstract: We propose the task of Panoptic Scene Completion (PSC) which extends the recently popular Semantic Scene Completion (SSC) task with instance-level information to produce a richer understanding of the 3D scene. Our PSC proposal utilizes a hybrid mask-based technique on the non-empty voxels from sparse multi-scale completions. Whereas the SSC literature overlooks uncertainty which is critical for ro… ▽ More We propose the task of Panoptic Scene Completion (PSC) which extends the recently popular Semantic Scene Completion (SSC) task with instance-level information to produce a richer understanding of the 3D scene. Our PSC proposal utilizes a hybrid mask-based technique on the non-empty voxels from sparse multi-scale completions. Whereas the SSC literature overlooks uncertainty which is critical for robotics applications, we instead propose an efficient ensembling to estimate both voxel-wise and instance-wise uncertainties along PSC. This is achieved by building on a multi-input multi-output (MIMO) strategy, while improving performance and yielding better uncertainty for little additional compute. Additionally, we introduce a technique to aggregate permutation-invariant mask predictions. Our experiments demonstrate that our method surpasses all baselines in both Panoptic Scene Completion and uncertainty estimation on three large-scale autonomous driving datasets. Our code and data are available at https://astra-vision.github.io/PaSCo . △ Less

Submitted 25 May, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: CVPR 2024 Oral - Best paper award candidate. Project page: https://astra-vision.github.io/PaSCo

arXiv:2311.15573 [pdf, other]

EucliDreamer: Fast and High-Quality Texturing for 3D Models with Stable Diffusion Depth

Authors: Cindy Le, Congrui Hetang, Chendi Lin, Ang Cao, Yihui He

Abstract: This paper presents a novel method to generate textures for 3D models given text prompts and 3D meshes. Additional depth information is taken into account to perform the Score Distillation Sampling (SDS) process with depth conditional Stable Diffusion. We ran our model over the open-source dataset Objaverse and conducted a user study to compare the results with those of various 3D texturing method… ▽ More This paper presents a novel method to generate textures for 3D models given text prompts and 3D meshes. Additional depth information is taken into account to perform the Score Distillation Sampling (SDS) process with depth conditional Stable Diffusion. We ran our model over the open-source dataset Objaverse and conducted a user study to compare the results with those of various 3D texturing methods. We have shown that our model can generate more satisfactory results and produce various art styles for the same object. In addition, we achieved faster time when generating textures of comparable quality. We also conduct thorough ablation studies of how different factors may affect generation quality, including sampling steps, guidance scale, negative prompts, data augmentation, elevation range, and alternatives to SDS. △ Less

Submitted 13 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2310.01037 [pdf, other]

doi 10.1109/TGRS.2024.3371503

SeisT: A foundational deep learning model for earthquake monitoring tasks

Authors: Sen Li, Xu Yang, Anye Cao, Changbin Wang, Yaoqi Liu, Yapeng Liu, Qiang Niu

Abstract: Seismograms, the fundamental seismic records, have revolutionized earthquake research and monitoring. Recent advancements in deep learning have further enhanced seismic signal processing, leading to even more precise and effective earthquake monitoring capabilities. This paper introduces a foundational deep learning model, the Seismogram Transformer (SeisT), designed for a variety of earthquake mo… ▽ More Seismograms, the fundamental seismic records, have revolutionized earthquake research and monitoring. Recent advancements in deep learning have further enhanced seismic signal processing, leading to even more precise and effective earthquake monitoring capabilities. This paper introduces a foundational deep learning model, the Seismogram Transformer (SeisT), designed for a variety of earthquake monitoring tasks. SeisT combines multiple modules tailored to different tasks and exhibits impressive out-of-distribution generalization performance, outperforming or matching state-of-the-art models in tasks like earthquake detection, seismic phase picking, first-motion polarity classification, magnitude estimation, back-azimuth estimation, and epicentral distance estimation. The performance scores on the tasks are 0.96, 0.96, 0.68, 0.95, 0.86, 0.55, and 0.81, respectively. The most significant improvements, in comparison to existing models, are observed in phase-P picking, phase-S picking, and magnitude estimation, with gains of 1.7%, 9.5%, and 8.0%, respectively. Our study, through rigorous experiments and evaluations, suggests that SeisT has the potential to contribute to the advancement of seismic signal processing and earthquake research. △ Less

Submitted 26 December, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, 2024

arXiv:2305.01151 [pdf, ps, other]

Early Classifying Multimodal Sequences

Authors: Alexander Cao, Jean Utke, Diego Klabjan

Abstract: Often pieces of information are received sequentially over time. When did one collect enough such pieces to classify? Trading wait time for decision certainty leads to early classification problems that have recently gained attention as a means of adapting classification to more dynamic environments. However, so far results have been limited to unimodal sequences. In this pilot study, we expand in… ▽ More Often pieces of information are received sequentially over time. When did one collect enough such pieces to classify? Trading wait time for decision certainty leads to early classification problems that have recently gained attention as a means of adapting classification to more dynamic environments. However, so far results have been limited to unimodal sequences. In this pilot study, we expand into early classifying multimodal sequences by combining existing methods. We show our new method yields experimental AUC advantages of up to 8.7%. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Comments: 7 pages, 5 figures

arXiv:2304.06697 [pdf, other]

doi 10.1103/PhysRevLett.132.113401

Observation of subdiffusive dynamic scaling in a driven and disordered Bose gas

Authors: Gevorg Martirosyan, Christopher J. Ho, Jiří Etrych, Yansheng Zhang, Alec Cao, Zoran Hadzibabic, Christoph Eigen

Abstract: We explore the dynamics of a tuneable box-trapped Bose gas under strong periodic forcing in the presence of weak disorder. In absence of interparticle interactions, the interplay of the drive and disorder results in an isotropic nonthermal momentum distribution that shows subdiffusive dynamic scaling, with sublinear energy growth and the universal scaling function captured well by a compressed exp… ▽ More We explore the dynamics of a tuneable box-trapped Bose gas under strong periodic forcing in the presence of weak disorder. In absence of interparticle interactions, the interplay of the drive and disorder results in an isotropic nonthermal momentum distribution that shows subdiffusive dynamic scaling, with sublinear energy growth and the universal scaling function captured well by a compressed exponential. We explain that this subdiffusion in momentum space can naturally be understood as a random walk in energy space. We also experimentally show that for increasing interaction strength, the gas behavior smoothly crosses over to wave turbulence characterized by a power-law momentum distribution, which opens new possibilities for systematic studies of the interplay of disorder and interactions in driven quantum systems. △ Less

Submitted 17 December, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

Comments: Main text (5 pages, 5 figures), Supplemental Material (2 pages, 4 figures)

Journal ref: Phys. Rev. Lett. 132, 113401 (2024)

arXiv:2304.03463 [pdf, ps, other]

A Policy for Early Sequence Classification

Authors: Alexander Cao, Jean Utke, Diego Klabjan

Abstract: Sequences are often not received in their entirety at once, but instead, received incrementally over time, element by element. Early predictions yielding a higher benefit, one aims to classify a sequence as accurately as possible, as soon as possible, without having to wait for the last element. For this early sequence classification, we introduce our novel classifier-induced stopping. While previ… ▽ More Sequences are often not received in their entirety at once, but instead, received incrementally over time, element by element. Early predictions yielding a higher benefit, one aims to classify a sequence as accurately as possible, as soon as possible, without having to wait for the last element. For this early sequence classification, we introduce our novel classifier-induced stopping. While previous methods depend on exploration during training to learn when to stop and classify, ours is a more direct, supervised approach. Our classifier-induced stopping achieves an average Pareto frontier AUC increase of 11.8% over multiple experiments. △ Less

Submitted 6 April, 2023; originally announced April 2023.

Comments: 12 pages, 6 figures

arXiv:2303.11989 [pdf, other]

Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models

Authors: Lukas Höllein, Ang Cao, Andrew Owens, Justin Johnson, Matthias Nießner

Abstract: We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input. To this end, we leverage pre-trained 2D text-to-image models to synthesize a sequence of images from different poses. In order to lift these outputs into a consistent 3D scene representation, we combine monocular depth estimation with a text-conditioned inpainting model. The core idea of… ▽ More We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input. To this end, we leverage pre-trained 2D text-to-image models to synthesize a sequence of images from different poses. In order to lift these outputs into a consistent 3D scene representation, we combine monocular depth estimation with a text-conditioned inpainting model. The core idea of our approach is a tailored viewpoint selection such that the content of each image can be fused into a seamless, textured 3D mesh. More specifically, we propose a continuous alignment strategy that iteratively fuses scene frames with the existing geometry to create a seamless mesh. Unlike existing works that focus on generating single objects or zoom-out trajectories from text, our method generates complete 3D scenes with multiple objects and explicit 3D geometry. We evaluate our approach using qualitative and quantitative metrics, demonstrating it as the first method to generate room-scale 3D geometry with compelling textures from only text as input. △ Less

Submitted 10 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

Comments: Accepted to ICCV 2023 (Oral) video: https://youtu.be/fjRnFL91EZc project page: https://lukashoel.github.io/text-to-room/ code: https://github.com/lukasHoel/text2room

arXiv:2303.08078 [pdf, other]

doi 10.1038/s41586-023-06360-6

Realizing spin squeezing with Rydberg interactions in a programmable optical clock

Authors: William J. Eckner, Nelson Darkwah Oppong, Alec Cao, Aaron W. Young, William R. Milner, John M. Robinson, Jun Ye, Adam M. Kaufman

Abstract: Neutral-atom arrays trapped in optical potentials are a powerful platform for studying quantum physics, combining precise single-particle control and detection with a range of tunable entangling interactions. For example, these capabilities have been leveraged for state-of-the-art frequency metrology as well as microscopic studies of entangled many-particle states. In this work, we combine these a… ▽ More Neutral-atom arrays trapped in optical potentials are a powerful platform for studying quantum physics, combining precise single-particle control and detection with a range of tunable entangling interactions. For example, these capabilities have been leveraged for state-of-the-art frequency metrology as well as microscopic studies of entangled many-particle states. In this work, we combine these applications to realize spin squeezing - a widely studied operation for producing metrologically useful entanglement - in an optical atomic clock based on a programmable array of interacting optical qubits. In this first demonstration of Rydberg-mediated squeezing with a neutral-atom optical clock, we generate states that have almost 4 dB of metrological gain. Additionally, we perform a synchronous frequency comparison between independent squeezed states and observe a fractional frequency stability of $1.087(1)\times 10^{-15}$ at one-second averaging time, which is 1.94(1) dB below the standard quantum limit, and reaches a fractional precision at the $10^{-17}$ level during a half-hour measurement. We further leverage the programmable control afforded by optical tweezer arrays to apply local phase shifts in order to explore spin squeezing in measurements that operate beyond the relative coherence time with the optical local oscillator. The realization of this spin-squeezing protocol in a programmable atom-array clock opens the door to a wide range of quantum-information inspired techniques for optimal phase estimation and Heisenberg-limited optical atomic clocks. △ Less

Submitted 23 July, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

Comments: 13 pages, 4 figures; Supplementary Information

Journal ref: Nature 621, 734 (2023)

arXiv:2301.09632 [pdf, other]

HexPlane: A Fast Representation for Dynamic Scenes

Authors: Ang Cao, Justin Johnson

Abstract: Modeling and re-rendering dynamic 3D scenes is a challenging task in 3D vision. Prior approaches build on NeRF and rely on implicit representations. This is slow since it requires many MLP evaluations, constraining real-world applications. We show that dynamic 3D scenes can be explicitly represented by six planes of learned features, leading to an elegant solution we call HexPlane. A HexPlane comp… ▽ More Modeling and re-rendering dynamic 3D scenes is a challenging task in 3D vision. Prior approaches build on NeRF and rely on implicit representations. This is slow since it requires many MLP evaluations, constraining real-world applications. We show that dynamic 3D scenes can be explicitly represented by six planes of learned features, leading to an elegant solution we call HexPlane. A HexPlane computes features for points in spacetime by fusing vectors extracted from each plane, which is highly efficient. Pairing a HexPlane with a tiny MLP to regress output colors and training via volume rendering gives impressive results for novel view synthesis on dynamic scenes, matching the image quality of prior work but reducing training time by more than $100\times$. Extensive ablations confirm our HexPlane design and show that it is robust to different feature fusion mechanisms, coordinate systems, and decoding mechanisms. HexPlane is a simple and effective solution for representing 4D volumes, and we hope they can broadly contribute to modeling spacetime for dynamic 3D scenes. △ Less

Submitted 27 March, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

Comments: CVPR 2023, Camera Ready Project page: https://caoang327.github.io/HexPlane

arXiv:2212.08652 [pdf, other]

doi 10.1038/s41586-023-06240-z

Universal equation of state for wave turbulence in a quantum gas

Authors: Lena H. Dogra, Gevorg Martirosyan, Timon A. Hilker, Jake A. P. Glidden, Jiří Etrych, Alec Cao, Christoph Eigen, Robert P. Smith, Zoran Hadzibabic

Abstract: Boyle's 1662 observation that the volume of a gas is, at constant temperature, inversely proportional to pressure, offered a prototypical example of how an equation of state (EoS) can succinctly capture key properties of a many-particle system. Such relations are now cornerstones of equilibrium thermodynamics. Extending thermodynamic concepts to far-from-equilibrium systems is of great interest in… ▽ More Boyle's 1662 observation that the volume of a gas is, at constant temperature, inversely proportional to pressure, offered a prototypical example of how an equation of state (EoS) can succinctly capture key properties of a many-particle system. Such relations are now cornerstones of equilibrium thermodynamics. Extending thermodynamic concepts to far-from-equilibrium systems is of great interest in various contexts including glasses, active matter, and turbulence, but is in general an open problem. Here, using a homogeneous ultracold atomic Bose gas, we experimentally construct an EoS for a turbulent cascade of matter waves. Under continuous forcing at a large length scale and dissipation at a small one, the gas exhibits a non-thermal, but stationary state, which is characterised by a power-law momentum distribution sustained by a scale-invariant momentum-space energy flux. We establish the amplitude of the momentum distribution and the underlying energy flux as equilibrium-like state variables, related by an EoS that does not depend on the details of the energy injection or dissipation, or the history of the system. Moreover, we show that the equations of state for a wide range of interaction strengths and gas densities can be empirically scaled onto each other. This results in a universal dimensionless EoS that sets benchmarks for the theory and should also be relevant for other turbulent systems. △ Less

Submitted 5 May, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

Comments: 6 pages, 5 figures

Journal ref: Nature 620, 521-524 (2023)

arXiv:2212.02501 [pdf, other]

SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields

Authors: Anh-Quan Cao, Raoul de Charette

Abstract: 3D reconstruction from a single 2D image was extensively covered in the literature but relies on depth supervision at training time, which limits its applicability. To relax the dependence to depth we propose SceneRF, a self-supervised monocular scene reconstruction method using only posed image sequences for training. Fueled by the recent progress in neural radiance fields (NeRF) we optimize a ra… ▽ More 3D reconstruction from a single 2D image was extensively covered in the literature but relies on depth supervision at training time, which limits its applicability. To relax the dependence to depth we propose SceneRF, a self-supervised monocular scene reconstruction method using only posed image sequences for training. Fueled by the recent progress in neural radiance fields (NeRF) we optimize a radiance field though with explicit depth optimization and a novel probabilistic sampling strategy to efficiently handle large scenes. At inference, a single input image suffices to hallucinate novel depth views which are fused together to obtain 3D scene reconstruction. Thorough experiments demonstrate that we outperform all baselines for novel depth views synthesis and scene reconstruction, on indoor BundleFusion and outdoor SemanticKITTI. Code is available at https://astra-vision.github.io/SceneRF . △ Less

Submitted 24 August, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

Comments: ICCV 2023. Project page: https://astra-vision.github.io/SceneRF

arXiv:2210.01784 [pdf, other]

COARSE3D: Class-Prototypes for Contrastive Learning in Weakly-Supervised 3D Point Cloud Segmentation

Authors: Rong Li, Anh-Quan Cao, Raoul de Charette

Abstract: Annotation of large-scale 3D data is notoriously cumbersome and costly. As an alternative, weakly-supervised learning alleviates such a need by reducing the annotation by several order of magnitudes. We propose COARSE3D, a novel architecture-agnostic contrastive learning strategy for 3D segmentation. Since contrastive learning requires rich and diverse examples as keys and anchors, we leverage a p… ▽ More Annotation of large-scale 3D data is notoriously cumbersome and costly. As an alternative, weakly-supervised learning alleviates such a need by reducing the annotation by several order of magnitudes. We propose COARSE3D, a novel architecture-agnostic contrastive learning strategy for 3D segmentation. Since contrastive learning requires rich and diverse examples as keys and anchors, we leverage a prototype memory bank capturing class-wise global dataset information efficiently into a small number of prototypes acting as keys. An entropy-driven sampling technique then allows us to select good pixels from predictions as anchors. Experiments on three projection-based backbones show we outperform baselines on three challenging real-world outdoor datasets, working with as low as 0.001% annotations. △ Less

Submitted 7 October, 2022; v1 submitted 4 October, 2022; originally announced October 2022.

arXiv:2208.13766 [pdf, other]

doi 10.1103/PhysRevResearch.5.013174

Pinpointing Feshbach Resonances and Testing Efimov Universalities in $^{39}$K

Authors: Jiří Etrych, Gevorg Martirosyan, Alec Cao, Jake A. P. Glidden, Lena H. Dogra, Jeremy M. Hutson, Zoran Hadzibabic, Christoph Eigen

Abstract: Using a combination of bound-state spectroscopy and loss spectroscopy, we pinpoint eight intrastate Feshbach resonances in $^{39}$K, as well as six previously unexplored interstate ones. We also perform a detailed characterization of four of the intrastate resonances and two of the interstate ones. We carry out coupled-channel scattering calculations and find good agreement with experiment. The co… ▽ More Using a combination of bound-state spectroscopy and loss spectroscopy, we pinpoint eight intrastate Feshbach resonances in $^{39}$K, as well as six previously unexplored interstate ones. We also perform a detailed characterization of four of the intrastate resonances and two of the interstate ones. We carry out coupled-channel scattering calculations and find good agreement with experiment. The combination of experiment and theory provides a faithful map of the scattering length $a$ and permits precision measurements of the signatures of Efimov physics across four intermediate-strength resonances. We measure the modulation of the $a^4$ scaling of the three-body loss coefficient for both $a<0$ and $a>0$, as well as the many-body loss dynamics at unitarity (where $a$ diverges). The absolute positions of the observed Efimov features confirm a ubiquitous breakdown of Efimov--van-der-Waals universality in $^{39}$K, while their relative positions are in agreement with the universal Efimov ratios. The loss dynamics at the three broadest Feshbach resonances are universal within experimental uncertainties, consistent with observing little variation in the Efimov width parameters. △ Less

Submitted 29 August, 2022; originally announced August 2022.

Comments: 12 pages, 7 figures (including appendices)

Journal ref: Phys. Rev. Res. 5, 013174 (2023)

arXiv:2206.08355 [pdf, other]

FWD: Real-time Novel View Synthesis with Forward Warping and Depth

Authors: Ang Cao, Chris Rockwell, Justin Johnson

Abstract: Novel view synthesis (NVS) is a challenging task requiring systems to generate photorealistic images of scenes from new viewpoints, where both quality and speed are important for applications. Previous image-based rendering (IBR) methods are fast, but have poor quality when input views are sparse. Recent Neural Radiance Fields (NeRF) and generalizable variants give impressive results but are not r… ▽ More Novel view synthesis (NVS) is a challenging task requiring systems to generate photorealistic images of scenes from new viewpoints, where both quality and speed are important for applications. Previous image-based rendering (IBR) methods are fast, but have poor quality when input views are sparse. Recent Neural Radiance Fields (NeRF) and generalizable variants give impressive results but are not real-time. In our paper, we propose a generalizable NVS method with sparse inputs, called FWD, which gives high-quality synthesis in real-time. With explicit depth and differentiable rendering, it achieves competitive results to the SOTA methods with 130-1000x speedup and better perceptual quality. If available, we can seamlessly integrate sensor depth during either training or inference to improve image quality while retaining real-time speed. With the growing prevalence of depths sensors, we hope that methods making use of depth will become increasingly useful. △ Less

Submitted 5 August, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

Comments: CVPR 2022. Project website https://caoang327.github.io/FWD/

arXiv:2203.09442 [pdf, other]

Anomalous localization and multifractality in a kicked quasicrystal

Authors: Toshihiko Shimasaki, Max Prichard, H. Esat Kondakci, Jared Pagett, Yifei Bai, Peter Dotti, Alec Cao, Tsung-Cheng Lu, Tarun Grover, David M. Weld

Abstract: Multifractal states offer a "third way" for quantum matter, neither fully localized nor ergodic, exhibiting singular continuous spectra, self-similar wavefunctions, and transport and entanglement scaling exponents intermediate between extended and localized states. While multifractality in equilibrium systems generally requires fine-tuning to a critical point, externally driven quantum matter can… ▽ More Multifractal states offer a "third way" for quantum matter, neither fully localized nor ergodic, exhibiting singular continuous spectra, self-similar wavefunctions, and transport and entanglement scaling exponents intermediate between extended and localized states. While multifractality in equilibrium systems generally requires fine-tuning to a critical point, externally driven quantum matter can exhibit multifractal states with no equilibrium counterpart. We report the experimental observation of multifractal matter and anomalous localization in a kicked Aubry-André-Harper quasicrystal. Our cold-atom realization of this previously-unexplored model is enabled by apodized Floquet engineering techniques which expand the accessible phase diagram by five orders of magnitude. This kicked quantum quasicrystal exhibits a rich phase diagram including not only fully localized and fully delocalized phases but also an extended region comprising an intricate nested pattern of localized, delocalized, and multifractal states. Mapping transport properties throughout the phase diagram, we observe disorder-driven re-entrant delocalization and sub-ballistic transport, and present a theoretical explanation of these phenomena based on eigenstate multifractality. These results open up the exploration of new states of matter characterized by an intricate interplay of fractal structure and quantum dynamics. △ Less

Submitted 3 May, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

Comments: 22 pages, 16 figures (including supp. info)

arXiv:2201.02923 [pdf, ps, other]

Open-Set Recognition of Breast Cancer Treatments

Authors: Alexander Cao, Diego Klabjan, Yuan Luo

Abstract: Open-set recognition generalizes a classification task by classifying test samples as one of the known classes from training or "unknown." As novel cancer drug cocktails with improved treatment are continually discovered, predicting cancer treatments can naturally be formulated in terms of an open-set recognition problem. Drawbacks, due to modeling unknown samples during training, arise from strai… ▽ More Open-set recognition generalizes a classification task by classifying test samples as one of the known classes from training or "unknown." As novel cancer drug cocktails with improved treatment are continually discovered, predicting cancer treatments can naturally be formulated in terms of an open-set recognition problem. Drawbacks, due to modeling unknown samples during training, arise from straightforward implementations of prior work in healthcare open-set learning. Accordingly, we reframe the problem methodology and apply a recent existing Gaussian mixture variational autoencoder model, which achieves state-of-the-art results for image datasets, to breast cancer patient data. Not only do we obtain more accurate and robust classification results, with a 24.5% average F1 increase compared to a recent method, but we also reexamine open-set recognition in terms of deployability to a clinical setting. △ Less

Submitted 8 January, 2022; originally announced January 2022.

Comments: 22 pages, 9 figures and 9 tables

arXiv:2112.00726 [pdf, other]

MonoScene: Monocular 3D Semantic Scene Completion

Authors: Anh-Quan Cao, Raoul de Charette

Abstract: MonoScene proposes a 3D Semantic Scene Completion (SSC) framework, where the dense geometry and semantics of a scene are inferred from a single monocular RGB image. Different from the SSC literature, relying on 2.5 or 3D input, we solve the complex problem of 2D to 3D scene reconstruction while jointly inferring its semantics. Our framework relies on successive 2D and 3D UNets bridged by a novel 2… ▽ More MonoScene proposes a 3D Semantic Scene Completion (SSC) framework, where the dense geometry and semantics of a scene are inferred from a single monocular RGB image. Different from the SSC literature, relying on 2.5 or 3D input, we solve the complex problem of 2D to 3D scene reconstruction while jointly inferring its semantics. Our framework relies on successive 2D and 3D UNets bridged by a novel 2D-3D features projection inspiring from optics and introduces a 3D context relation prior to enforce spatio-semantic consistency. Along with architectural contributions, we introduce novel global scene and local frustums losses. Experiments show we outperform the literature on all metrics and datasets while hallucinating plausible scenery even beyond the camera field of view. Our code and trained models are available at https://github.com/cv-rits/MonoScene. △ Less

Submitted 29 March, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

Comments: Accepted at CVPR 2022. Project page: https://cv-rits.github.io/MonoScene/

arXiv:2110.01269 [pdf, other]

PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds

Authors: Anh-Quan Cao, Gilles Puy, Alexandre Boulch, Renaud Marlet

Abstract: Rigid registration of point clouds with partial overlaps is a longstanding problem usually solved in two steps: (a) finding correspondences between the point clouds; (b) filtering these correspondences to keep only the most reliable ones to estimate the transformation. Recently, several deep nets have been proposed to solve these steps jointly. We built upon these works and propose PCAM: a neural… ▽ More Rigid registration of point clouds with partial overlaps is a longstanding problem usually solved in two steps: (a) finding correspondences between the point clouds; (b) filtering these correspondences to keep only the most reliable ones to estimate the transformation. Recently, several deep nets have been proposed to solve these steps jointly. We built upon these works and propose PCAM: a neural network whose key element is a pointwise product of cross-attention matrices that permits to mix both low-level geometric and high-level contextual information to find point correspondences. These cross-attention matrices also permits the exchange of context information between the point clouds, at each layer, allowing the network construct better matching features within the overlapping regions. The experiments show that PCAM achieves state-of-the-art results among methods which, like us, solve steps (a) and (b) jointly via deepnets. Our code and trained models are available at https://github.com/valeoai/PCAM. △ Less

Submitted 4 October, 2021; originally announced October 2021.

Comments: ICCV21

arXiv:2109.11168 [pdf, other]

Unified Signal Compression Using a GAN with Iterative Latent Representation Optimization

Authors: Bowen Liu, Changwoo Lee, Ang Cao, Hun-Seok Kim

Abstract: We propose a unified signal compression framework that uses a generative adversarial network (GAN) to compress heterogeneous signals. The compressed signal is represented as a latent vector and fed into a generator network that is trained to produce high quality realistic signals that minimize a target objective function. To efficiently quantize the compressed signal, non-uniformly quantized optim… ▽ More We propose a unified signal compression framework that uses a generative adversarial network (GAN) to compress heterogeneous signals. The compressed signal is represented as a latent vector and fed into a generator network that is trained to produce high quality realistic signals that minimize a target objective function. To efficiently quantize the compressed signal, non-uniformly quantized optimal latent vectors are identified by iterative back-propagation with alternating direction method of multipliers (ADMM) optimization performed for each iteration. The performance of the proposed signal compression method is assessed using multiple metrics including PSNR and MS-SSIM for image compression and also PESR, Kaldi, LSTM, and MLP performance for speech compression. Test results show that the proposed work outperforms recent state-of-the-art hand-crafted and deep learning-based signal compression methods. △ Less

Submitted 23 September, 2021; originally announced September 2021.

Comments: 13 pages, 10 figures

arXiv:2109.03083 [pdf, ps, other]

Chasing the Threshold Bias of the 3-AP Game

Authors: Albert Cao, Felix Christian Clemen, Sean English, Xiaojian Li, Tatum Schmidt, Leeann Xoubi, Weian Yin

Abstract: In a Maker-Breaker game there are two players, Maker and Breaker, where Maker wins if they create a specified structure while Breaker wins if they prevent Maker from winning indefinitely. A $3$-term arithmetic progression, or $3$-AP, is a sequence of three distinct integers $a, b, c$ such that $b-a = c-b$. The $3$-AP game is a biased Maker-Breaker game played on $[n]$ where every round Breaker sel… ▽ More In a Maker-Breaker game there are two players, Maker and Breaker, where Maker wins if they create a specified structure while Breaker wins if they prevent Maker from winning indefinitely. A $3$-term arithmetic progression, or $3$-AP, is a sequence of three distinct integers $a, b, c$ such that $b-a = c-b$. The $3$-AP game is a biased Maker-Breaker game played on $[n]$ where every round Breaker selects $q$ unclaimed integers for every Maker's one integer. Maker is trying to select points such that they have a $3$-AP and Breaker is trying to prevent this. The main question of interest is determining the threshold bias $q^*(n)$, that is the minimum value of $q=q(n)$ for which Breaker has a winning strategy. Kusch, Rué, Spiegel and Szabó initially asked this question and proved $\sqrt{n/12-1/6}\leq q^*(n)\leq \sqrt{3n}$. We find new strategies for both Maker and Breaker which improve the existing bounds to \[ (1+o(1))\sqrt{\frac{n}{5.6}} \leq q^*(n) \leq \sqrt{2n} +O(1). \] △ Less

Submitted 12 January, 2022; v1 submitted 7 September, 2021; originally announced September 2021.

arXiv:2109.00696 [pdf, other]

doi 10.1103/PhysRevX.12.011035

Observation of the Quantum Boomerang Effect

Authors: Roshan Sajjad, Jeremy L. Tanlimco, Hector Mas, Alec Cao, Eber Nolasco-Martinez, Ethan Q. Simmons, Flávio L. N. Santos, Patrizia Vignolo, Tommaso Macrì, David M. Weld

Abstract: A particle in an Anderson-localized system, if launched in any direction, should on average return to its starting point and stay there. Despite the central role played by Anderson localization in the modern understanding of condensed matter, this "quantum boomerang" effect, an essential feature of the localized state, was only recently theoretically predicted and has not previously been observed.… ▽ More A particle in an Anderson-localized system, if launched in any direction, should on average return to its starting point and stay there. Despite the central role played by Anderson localization in the modern understanding of condensed matter, this "quantum boomerang" effect, an essential feature of the localized state, was only recently theoretically predicted and has not previously been observed. We report the experimental observation of the quantum boomerang effect. Using a degenerate gas and a phase-shifted pair of optical lattices, we probe the role of time reversal symmetry breaking, Floquet gauge, and initial state symmetry in supporting or disrupting the boomerang effect. Highlighting the key role of localization, we observe that as stochastic kicking destroys dynamical localization, the quantum boomerang effect also disappears. Measured dynamics are in agreement with analytical and numerical predictions. These results showcase a unique experimental probe of the underlying quantum nature of Anderson localized matter. △ Less

Submitted 23 February, 2022; v1 submitted 1 September, 2021; originally announced September 2021.

Comments: 10 pages including appendices, 4 figures

Journal ref: Phys. Rev. X 12, 011035 (2022)

arXiv:2106.13933 [pdf, other]

Inverting and Understanding Object Detectors

Authors: Ang Cao, Justin Johnson

Abstract: As a core problem in computer vision, the performance of object detection has improved drastically in the past few years. Despite their impressive performance, object detectors suffer from a lack of interpretability. Visualization techniques have been developed and widely applied to introspect the decisions made by other kinds of deep learning models; however, visualizing object detectors has been… ▽ More As a core problem in computer vision, the performance of object detection has improved drastically in the past few years. Despite their impressive performance, object detectors suffer from a lack of interpretability. Visualization techniques have been developed and widely applied to introspect the decisions made by other kinds of deep learning models; however, visualizing object detectors has been underexplored. In this paper, we propose using inversion as a primary tool to understand modern object detectors and develop an optimization-based approach to layout inversion, allowing us to generate synthetic images recognized by trained detectors as containing a desired configuration of objects. We reveal intriguing properties of detectors by applying our layout inversion technique to a variety of modern object detectors, and further investigate them via validation experiments: they rely on qualitatively different features for classification and regression; they learn canonical motifs of commonly co-occurring objects; they use diff erent visual cues to recognize objects of varying sizes. We hope our insights can help practitioners improve object detectors. △ Less

Submitted 25 June, 2021; originally announced June 2021.

Comments: Preprints

arXiv:2106.09698 [pdf, other]

doi 10.1038/s41567-022-01724-7

Interaction-driven breakdown of dynamical localization in a kicked quantum gas

Authors: Alec Cao, Roshan Sajjad, Hector Mas, Ethan Q. Simmons, Jeremy L. Tanlimco, Eber Nolasco-Martinez, Toshihiko Shimasaki, H. Esat Kondakci, Victor Galitski, David M. Weld

Abstract: Quantum interference can terminate energy growth in a continually kicked system, via a single-particle ergodicity-breaking mechanism known as dynamical localization. The effect of many-body interactions on dynamically localized states, while important to a fundamental understanding of quantum decoherence, has remained unexplored despite a quarter-century of experimental studies. We report the expe… ▽ More Quantum interference can terminate energy growth in a continually kicked system, via a single-particle ergodicity-breaking mechanism known as dynamical localization. The effect of many-body interactions on dynamically localized states, while important to a fundamental understanding of quantum decoherence, has remained unexplored despite a quarter-century of experimental studies. We report the experimental realization of a tunably-interacting kicked quantum rotor ensemble using a Bose-Einstein condensate in a pulsed optical lattice. We observe signatures of a prethermal localized plateau, followed for interacting samples by interaction-induced anomalous diffusion with an exponent near one half. Echo-type time reversal experiments establish the role of interactions in destroying reversibility. These results quantitatively elucidate the dynamical transition to many-body quantum chaos, advance our understanding of quantum anomalous diffusion, and delimit some possibilities for protecting quantum information in interacting driven systems. △ Less

Submitted 13 October, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

Comments: 17 pages including supp info

arXiv:2006.02003 [pdf, other]

Open-Set Recognition with Gaussian Mixture Variational Autoencoders

Authors: Alexander Cao, Yuan Luo, Diego Klabjan

Abstract: In inference, open-set classification is to either classify a sample into a known class from training or reject it as an unknown class. Existing deep open-set classifiers train explicit closed-set classifiers, in some cases disjointly utilizing reconstruction, which we find dilutes the latent representation's ability to distinguish unknown classes. In contrast, we train our model to cooperatively… ▽ More In inference, open-set classification is to either classify a sample into a known class from training or reject it as an unknown class. Existing deep open-set classifiers train explicit closed-set classifiers, in some cases disjointly utilizing reconstruction, which we find dilutes the latent representation's ability to distinguish unknown classes. In contrast, we train our model to cooperatively learn reconstruction and perform class-based clustering in the latent space. With this, our Gaussian mixture variational autoencoder (GMVAE) achieves more accurate and robust open-set classification results, with an average F1 improvement of 29.5%, through extensive experiments aided by analytical results. △ Less

Submitted 2 June, 2020; originally announced June 2020.

Comments: 12 pages including 8 figures and 4 tables, plus 6 pages of supplementary material

arXiv:2006.01612 [pdf, other]

doi 10.1103/PhysRevResearch.2.032032

Transport controlled by Poincaré orbit topology in a driven inhomogeneous lattice gas

Authors: Alec Cao, Roshan Sajjad, Ethan Q. Simmons, Cora J. Fujiwara, Toshihiko Shimasaki, David M. Weld

Abstract: In periodic quantum systems which are both homogeneously tilted and driven, the interplay between drive and Bloch oscillations controls transport dynamics. Using a quantum gas in a modulated optical lattice, we show experimentally that inhomogeneity of the applied force leads to a rich new variety of dynamical behaviors controlled by the drive phase, from self-parametrically-modulated Bloch epicyc… ▽ More In periodic quantum systems which are both homogeneously tilted and driven, the interplay between drive and Bloch oscillations controls transport dynamics. Using a quantum gas in a modulated optical lattice, we show experimentally that inhomogeneity of the applied force leads to a rich new variety of dynamical behaviors controlled by the drive phase, from self-parametrically-modulated Bloch epicycles to adaptive driving of transport against a force gradient to modulation-enhanced monopole modes. Matching experimental observations to fit-parameter-free numerical predictions of time-dependent band theory, we show that these phenomena can be quantitatively understood as manifestations of an underlying inhomogeneity-induced phase space structure, in which topological classification of stroboscopic Poincaré orbits controls the transport dynamics. △ Less

Submitted 2 June, 2020; originally announced June 2020.

Comments: 6 pages, 4 figures

Journal ref: Phys. Rev. Research 2, 032032 (2020)

arXiv:2005.07969 [pdf]

doi 10.1103/PhysRevB.102.104412

All-optical switching of magnetic domains in Co/Gd heterostructures with Dzyaloshinskii-Moriya Interaction

Authors: Anni Cao, Youri L. W. van Hees, Reinoud Lavrijsen, Weisheng Zhao, Bert Koopmans

Abstract: Given the development of hybrid spintronic-photonic devices and chiral magnetic structures, a combined interest in all-optical switching (AOS) of magnetization and current-induced domain wall motion in synthetic ferrimagnetic structures with strong Dzyaloshinskii-Moriya Interaction (DMI) is emerging. In this study, we report a study on single-pulse all-optical toggle switching and asymmetric bubbl… ▽ More Given the development of hybrid spintronic-photonic devices and chiral magnetic structures, a combined interest in all-optical switching (AOS) of magnetization and current-induced domain wall motion in synthetic ferrimagnetic structures with strong Dzyaloshinskii-Moriya Interaction (DMI) is emerging. In this study, we report a study on single-pulse all-optical toggle switching and asymmetric bubble expansion in specially engineered Co/Gd-based multilayer structures. In the absence of any external magnetic fields, we look into the AOS properties and the potential role of the DMI on the AOS process as well as the stability of optically written micro-magnetic domains. Particularly, interesting dynamics are observed in moon-shaped structures written by two successive laser pulses. The stability of domains resulting from an interplay of the dipolar interaction and domain-wall energy are compared to simple analytical models and micromagnetic simulations. △ Less

Submitted 25 May, 2020; v1 submitted 16 May, 2020; originally announced May 2020.

Journal ref: Phys. Rev. B 102, 104412 (2020)

arXiv:2005.05389 [pdf]

Citations versus expert opinions: Citation analysis of Featured Reviews of the American Mathematical Society

Authors: Lawrence Smolinsky, Daniel S. Sage, Aaron J. Lercher, Aaron Cao

Abstract: Peer review and citation metrics are two means of gauging the value of scientific research, but the lack of publicly available peer review data makes the comparison of these methods difficult. Mathematics can serve as a useful laboratory for considering these questions because as an exact science, there is a narrow range of reasons for citations. In mathematics, virtually all published articles ar… ▽ More Peer review and citation metrics are two means of gauging the value of scientific research, but the lack of publicly available peer review data makes the comparison of these methods difficult. Mathematics can serve as a useful laboratory for considering these questions because as an exact science, there is a narrow range of reasons for citations. In mathematics, virtually all published articles are post-publication reviewed by mathematicians in Mathematical Reviews (MathSciNet) and so the data set was essentially the Web of Science mathematics publications from 1993 to 2004. For a decade, especially important articles were singled out in Mathematical Reviews for featured reviews. In this study, we analyze the bibliometrics of elite articles selected by peer review and by citation count. We conclude that the two notions of significance described by being a featured review article and being highly cited are distinct. This indicates that peer review and citation counts give largely independent determinations of highly distinguished articles. We also consider whether hiring patterns of subfields and mathematicians' interest in subfields reflect subfields of featured review or highly cited articles. We reexamine data from two earlier studies in light of our methods for implications on the peer review/citation count relationship to a diversity of disciplines. △ Less

Submitted 16 December, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

Comments: 21 pages, 3 figures, 4 tables

arXiv:2002.03051 [pdf, other]

doi 10.1515/zna-2020-0020

Non-exponential decay in Floquet-Bloch bands

Authors: Alec Cao, Cora J. Fujiwara, Roshan Sajjad, Ethan Q. Simmons, Eva Lindroth, David M. Weld

Abstract: Exponential decay laws describe systems ranging from unstable nuclei to fluorescent molecules, in which the probability of jumping to a lower-energy state in any given time interval is static and history-independent. These decays, involving only a metastable state and fluctuations of the quantum vacuum, are the most fundamental nonequilibrium process, and provide a microscopic model for the origin… ▽ More Exponential decay laws describe systems ranging from unstable nuclei to fluorescent molecules, in which the probability of jumping to a lower-energy state in any given time interval is static and history-independent. These decays, involving only a metastable state and fluctuations of the quantum vacuum, are the most fundamental nonequilibrium process, and provide a microscopic model for the origins of irreversibility. Despite the fact that the apparently universal exponential decay law has been precisely tested in a variety of physical systems, it is a surprising truth that quantum mechanics requires that spontaneous decay processes have non-exponential time dependence at both very short and very long times. Cold-atom experiments both classic and recent have proven to be powerful probes of fundamental decay processes; in this paper, we propose the use of Bose condensates in Floquet-Bloch bands as a probe of long-time non-exponential decay in single isolated emitters. We identify a range of parameters that should enable observation of long-time deviations, and experimentally demonstrate a key element of the scheme: tunable decay between quasienergy bands in a driven optical lattice. △ Less

Submitted 7 February, 2020; originally announced February 2020.

Comments: 7 pages, 5 figures

arXiv:1912.05590 [pdf, other]

Peek Inside the Closed World: Evaluating Autoencoder-Based Detection of DDoS to Cloud

Authors: Hang Guo, Xun Fan, Anh Cao, Geoff Outhred, John Heidemann

Abstract: Machine-learning-based anomaly detection (ML-based AD) has been successful at detecting DDoS events in the lab. However published evaluations of ML-based AD have used only limited data and provided minimal insight into why it works. To address limited evaluation against real-world data, we apply autoencoder, an existing ML-AD model, to 57 DDoS attack events captured at 5 cloud IPs from a major clo… ▽ More Machine-learning-based anomaly detection (ML-based AD) has been successful at detecting DDoS events in the lab. However published evaluations of ML-based AD have used only limited data and provided minimal insight into why it works. To address limited evaluation against real-world data, we apply autoencoder, an existing ML-AD model, to 57 DDoS attack events captured at 5 cloud IPs from a major cloud provider. We show that our models detect nearly all malicious flows for 2 of the 4 cloud IPs under attack (at least 99.99%) and detect most malicious flows (94.75% and 91.37%) for the remaining 2 IPs. Our models also maintain near-zero false positives on benign flows to all 5 IPs. Our primary contribution is to improve our understanding for why ML-based AD works on some malicious flows but not others. We interpret our detection results with feature attribution and counterfactual explanation. We show that our models are better at detecting malicious flows with anomalies on allow-listed features (those with only a few benign values) than flows with anomalies on deny-listed features (those with mostly benign values) because our models are more likely to learn correct normality for allow-listed features. We then show that our models are better at detecting malicious flows with anomalies on unordered features (that have no ordering among their values) than flows with anomalies on ordered features because even with incomplete normality, our models could still detect anomalies on unordered feature with high recall. Lastly, we summarize the implications of what we learn on applying autoencoder-based AD in production: training with noisy real-world data is possible, autoencoder can reliably detect real-world anomalies on well-represented unordered features and combinations of autoencoder-based AD and heuristic-based filters can help both. △ Less

Submitted 20 June, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

arXiv:1912.03734 [pdf, other]

Unified Signal Compression Using Generative Adversarial Networks

Authors: Bowen Liu, Ang Cao, Hun-seok Kim

Abstract: We propose a unified compression framework that uses generative adversarial networks (GAN) to compress image and speech signals. The compressed signal is represented by a latent vector fed into a generator network which is trained to produce high quality signals that minimize a target objective function. To efficiently quantize the compressed signal, non-uniformly quantized optimal latent vectors… ▽ More We propose a unified compression framework that uses generative adversarial networks (GAN) to compress image and speech signals. The compressed signal is represented by a latent vector fed into a generator network which is trained to produce high quality signals that minimize a target objective function. To efficiently quantize the compressed signal, non-uniformly quantized optimal latent vectors are identified by iterative back-propagation with ADMM optimization performed for each iteration. Our experiments show that the proposed algorithm outperforms prior signal compression methods for both image and speech compression quantified in various metrics including bit rate, PSNR, and neural network based signal classification accuracy. △ Less

Submitted 8 December, 2019; originally announced December 2019.

Comments: submitted to ICASSP 2020

arXiv:1910.13159 [pdf]

Inward-growth plating of lithium driven by solid-solution based alloy phase for highly reversible lithium metal anode

Authors: Song Jin, Yadong Ye, Yijie Niu, Yansong Xu, Hongchang Jin, Jinxi Wang, Zhaowei Sun, Anmin Cao, Xiaojun Wu, Yi Luo, Hengxing Ji, Li-Jun Wan

Abstract: Lithium metal batteries (LMB) are vital devices for high-energy-density energy storage, but Li metal anode is highly reactive with electrolyte and forms uncontrolled dendrite that can cause undesirable parasitic reactions thus poor cycling stability and raise safety concerns. Despite remarkable progress made to partly solve these issues, the Li metal still plate at the electrode/electrolyte interf… ▽ More Lithium metal batteries (LMB) are vital devices for high-energy-density energy storage, but Li metal anode is highly reactive with electrolyte and forms uncontrolled dendrite that can cause undesirable parasitic reactions thus poor cycling stability and raise safety concerns. Despite remarkable progress made to partly solve these issues, the Li metal still plate at the electrode/electrolyte interface where the parasitic reactions and dendrite formation invariably occur. Here we demonstrate the inward-growth plating of Li into a metal foil while avoiding surface deposition, which is driven by the reversible solid-solution based alloy phase change. Lithiation of the solid solution alloy phase facilitates the freshly generated Li atoms at the surface to sink into the foil, while the reversible alloy phase change is companied by the dealloying reaction during delithiation, which extracts Li atoms from inside of the foil. The yielded dendrite free Li anode produces an enhanced Coulombic efficiency of 99.5 plus or minus 0.2% with a reversible capacity of 1660 mA h $g^{-1}$ (3.3 mA h cm$^{-2}$). △ Less

Submitted 29 October, 2019; originally announced October 2019.

Comments: 21 pages, 4 figures

arXiv:1908.03237 [pdf, other]

Image-based marker tracking and registration for intraoperative 3D image-guided interventions using augmented reality

Authors: Andong Cao, Ali Dhanaliwala, Jianbo Shi, Terence Gade, Brian Park

Abstract: Augmented reality has the potential to improve operating room workflow by allowing physicians to "see" inside a patient through the projection of imaging directly onto the surgical field. For this to be useful the acquired imaging must be quickly and accurately registered with patient and the registration must be maintained. Here we describe a method for projecting a CT scan with Microsoft Hololen… ▽ More Augmented reality has the potential to improve operating room workflow by allowing physicians to "see" inside a patient through the projection of imaging directly onto the surgical field. For this to be useful the acquired imaging must be quickly and accurately registered with patient and the registration must be maintained. Here we describe a method for projecting a CT scan with Microsoft Hololens and then aligning that projection to a set of fiduciary markers. Radio-opaque stickers with unique QR-codes are placed on an object prior to acquiring a CT scan. The location of the markers in the CT scan are extracted and the CT scan is converted into a 3D surface object. The 3D object is then projected using the Hololens onto a table on which the same markers are placed. We designed an algorithm that aligns the markers on the 3D object with the markers on the table. To extract the markers and convert the CT into a 3D object took less than 5 seconds. To align three markers, it took $0.9 \pm 0.2$ seconds to achieve an accuracy of $5 \pm 2$ mm. These findings show that it is feasible to use a combined radio-opaque optical marker, placed on a patient prior to a CT scan, to subsequently align the acquired CT scan with the patient. △ Less

Submitted 8 August, 2019; originally announced August 2019.

arXiv:1908.00487 [pdf, other]

A Partial Differential Equation for the Mean--Return-Time Phase of Planar Stochastic Oscillators

Authors: Alexander Cao, Benjamin Lindner, Peter J. Thomas

Abstract: Stochastic oscillations are ubiquitous in many systems. For deterministic systems, the oscillator's phase has been widely used as an effective one-dimensional description of a higher dimensional dynamics, particularly for driven or coupled systems. Hence, efforts have been made to generalize the phase concept to the stochastic framework. One notion of phase due to Schwabedal and Pikovsky is based… ▽ More Stochastic oscillations are ubiquitous in many systems. For deterministic systems, the oscillator's phase has been widely used as an effective one-dimensional description of a higher dimensional dynamics, particularly for driven or coupled systems. Hence, efforts have been made to generalize the phase concept to the stochastic framework. One notion of phase due to Schwabedal and Pikovsky is based on the mean-return time (MRT) of the oscillator but has so far been described only in terms of a numerical algorithm. Here we develop the boundary condition by which the partial differential equation for the MRT has to be complemented in order to obtain the isochrons (lines of equal phase) of a two-dimensional stochastic oscillator, and rigorously establish the existence and uniqueness of the MRT isochron function (up to an additive constant). We illustrate the method with a number of examples: the stochastic heteroclinic oscillator (which would not oscillate in the absence of noise); the isotropic Stuart-Landau oscillator, the Newby-Schwemmer oscillator, and the Stuart-Landau oscillator with polarized noise. For selected isochrons we confirm by extensive stochastic simulations that the return time from an isochron to the same isochron (after one complete rotation) is always the mean first-passage time (irrespective of the initial position on the isochron). Put differently, we verify that Schwabedal and Pikovsky's criterion for an isochron is satisfied. In addition, we discuss how to extend the construction to arbitrary finite dimensions. Our results will enable development of analytical tools to study and compare different notions of phase for stochastic oscillators. △ Less

Submitted 30 July, 2019; originally announced August 2019.

Comments: 32 pages, 10 figures, includes supplementary materials

Showing 1–50 of 66 results for author: Cao, A