-
From Tokens to Materials: Leveraging Language Models for Scientific Discovery
Authors:
Yuwei Wan,
Tong Xie,
Nan Wu,
Wenjie Zhang,
Chunyu Kit,
Bram Hoex
Abstract:
Exploring the predictive capabilities of language models in material science is an ongoing interest. This study investigates the application of language model embeddings to enhance material property prediction in materials science. By evaluating various contextual embedding methods and pre-trained models, including Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-t…
▽ More
Exploring the predictive capabilities of language models in material science is an ongoing interest. This study investigates the application of language model embeddings to enhance material property prediction in materials science. By evaluating various contextual embedding methods and pre-trained models, including Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformers (GPT), we demonstrate that domain-specific models, particularly MatBERT significantly outperform general-purpose models in extracting implicit knowledge from compound names and material properties. Our findings reveal that information-dense embeddings from the third layer of MatBERT, combined with a context-averaging approach, offer the most effective method for capturing material-property relationships from the scientific literature. We also identify a crucial "tokenizer effect," highlighting the importance of specialized text processing techniques that preserve complete compound names while maintaining consistent token counts. These insights underscore the value of domain-specific training and tokenization in materials science applications and offer a promising pathway for accelerating the discovery and development of new materials through AI-driven approaches.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting
Authors:
Eric Hanchen Jiang,
Zhi Zhang,
Dinghuai Zhang,
Andrew Lizarraga,
Chenheng Xu,
Yasi Zhang,
Siyan Zhao,
Zhengjie Xu,
Peiyu Yu,
Yuer Tang,
Deqian Kong,
Ying Nian Wu
Abstract:
Advancements in reinforcement learning have led to the development of sophisticated models capable of learning complex decision-making tasks. However, efficiently integrating world models with decision transformers remains a challenge. In this paper, we introduce a novel approach that combines the Dreamer algorithm's ability to generate anticipatory trajectories with the adaptive learning strength…
▽ More
Advancements in reinforcement learning have led to the development of sophisticated models capable of learning complex decision-making tasks. However, efficiently integrating world models with decision transformers remains a challenge. In this paper, we introduce a novel approach that combines the Dreamer algorithm's ability to generate anticipatory trajectories with the adaptive learning strengths of the Online Decision Transformer. Our methodology enables parallel training where Dreamer-produced trajectories enhance the contextual decision-making of the transformer, creating a bidirectional enhancement loop. We empirically demonstrate the efficacy of our approach on a suite of challenging benchmarks, achieving notable improvements in sample efficiency and reward maximization over existing methods. Our results indicate that the proposed integrated framework not only accelerates learning but also showcases robustness in diverse and dynamic scenarios, marking a significant step forward in model-based reinforcement learning.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
DSparsE: Dynamic Sparse Embedding for Knowledge Graph Completion
Authors:
Chuhong Yang,
Bin Li,
Nan Wu
Abstract:
Addressing the incompleteness problem in knowledge graph remains a significant challenge. Current knowledge graph completion methods have their limitations. For example, ComDensE is prone to overfitting and suffers from the degradation with the increase of network depth while InteractE has the limitations in feature interaction and interpretability. To this end, we propose a new method called dyna…
▽ More
Addressing the incompleteness problem in knowledge graph remains a significant challenge. Current knowledge graph completion methods have their limitations. For example, ComDensE is prone to overfitting and suffers from the degradation with the increase of network depth while InteractE has the limitations in feature interaction and interpretability. To this end, we propose a new method called dynamic sparse embedding (DSparsE) for knowledge graph completion. The proposed model embeds the input entity-relation pairs by a shallow encoder composed of a dynamic layer and a relation-aware layer. Subsequently, the concatenated output of the dynamic layer and relation-aware layer is passed through a projection layer and a deep decoder with residual connection structure. This model ensures the network robustness and maintains the capability of feature extraction. Furthermore, the conventional dense layers are replaced by randomly initialized sparse connection layers in the proposed method, which can mitigate the model overfitting. Finally, comprehensive experiments are conducted on the datasets of FB15k-237, WN18RR and YAGO3-10. It was demonstrated that the proposed method achieves the state-of-the-art performance in terms of Hits@1 compared to the existing baseline approaches. An ablation study is performed to examine the effects of the dynamic layer and relation-aware layer, where the combined model achieves the best performance.
△ Less
Submitted 22 September, 2024;
originally announced October 2024.
-
A Benchmark on Directed Graph Representation Learning in Hardware Designs
Authors:
Haoyu Wang,
Yinan Huang,
Nan Wu,
Pan Li
Abstract:
To keep pace with the rapid advancements in design complexity within modern computing systems, directed graph representation learning (DGRL) has become crucial, particularly for encoding circuit netlists, computational graphs, and developing surrogate models for hardware performance prediction. However, DGRL remains relatively unexplored, especially in the hardware domain, mainly due to the lack o…
▽ More
To keep pace with the rapid advancements in design complexity within modern computing systems, directed graph representation learning (DGRL) has become crucial, particularly for encoding circuit netlists, computational graphs, and developing surrogate models for hardware performance prediction. However, DGRL remains relatively unexplored, especially in the hardware domain, mainly due to the lack of comprehensive and user-friendly benchmarks. This study presents a novel benchmark comprising five hardware design datasets and 13 prediction tasks spanning various levels of circuit abstraction. We evaluate 21 DGRL models, employing diverse graph neural networks and graph transformers (GTs) as backbones, enhanced by positional encodings (PEs) tailored for directed graphs. Our results highlight that bidirected (BI) message passing neural networks (MPNNs) and robust PEs significantly enhance model performance. Notably, the top-performing models include PE-enhanced GTs interleaved with BI-MPNN layers and BI-Graph Isomorphism Network, both surpassing baselines across the 13 tasks. Additionally, our investigation into out-of-distribution (OOD) performance emphasizes the urgent need to improve OOD generalization in DGRL models. This benchmark, implemented with a modular codebase, streamlines the evaluation of DGRL models for both hardware and ML practitioners
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
Long-range gene expression prediction with token alignment of large language model
Authors:
Edouardo Honig,
Huixin Zhan,
Ying Nian Wu,
Zijun Frank Zhang
Abstract:
Gene expression is a cellular process that plays a fundamental role in human phenotypical variations and diseases. Despite advances of deep learning models for gene expression prediction, recent benchmarks have revealed their inability to learn distal regulatory grammar. Here, we address this challenge by leveraging a pretrained large language model to enhance gene expression prediction. We introd…
▽ More
Gene expression is a cellular process that plays a fundamental role in human phenotypical variations and diseases. Despite advances of deep learning models for gene expression prediction, recent benchmarks have revealed their inability to learn distal regulatory grammar. Here, we address this challenge by leveraging a pretrained large language model to enhance gene expression prediction. We introduce Genetic sequence Token Alignment (GTA), which aligns genetic sequence features with natural language tokens, allowing for symbolic reasoning of genomic sequence features via the frozen language model. This cross-modal adaptation learns the regulatory grammar and allows us to further incorporate gene-specific human annotations as prompts, enabling in-context learning that is not possible with existing models. Trained on lymphoblastoid cells, GTA was evaluated on cells from the Geuvadis consortium and outperforms state-of-the-art models such as Enformer, achieving a Spearman correlation of 0.65, a 10\% improvement. Additionally, GTA offers improved interpretation of long-range interactions through the identification of the most meaningful sections of the input genetic context. GTA represents a powerful and novel cross-modal approach to gene expression prediction by utilizing a pretrained language model, in a paradigm shift from conventional gene expression models trained only on sequence data.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
Autonomous tip-induced chemical reactions in scanning probe microscopy
Authors:
Nian Wu,
Markus Aapro,
Joakim S. Jestilä,
Robert Drost,
Miguel Martınez Garcıa,
Tomas Torres,
Feifei Xiang,
Nan Cao,
Zhijie He,
Giovanni Bottari,
Peter Liljeroth,
Adam S. Foster
Abstract:
Scanning Probe Microscopy (SPM) techniques have shown great potential in fabricating nanoscale structures endowed with exotic quantum properties achieved through various manipulations of atoms and molecules. However, the selection of proper manipulation parameters requires extensive domain knowledge, which is not necessarily transferable to new systems. Therefore, efficient and autonomous SPM tech…
▽ More
Scanning Probe Microscopy (SPM) techniques have shown great potential in fabricating nanoscale structures endowed with exotic quantum properties achieved through various manipulations of atoms and molecules. However, the selection of proper manipulation parameters requires extensive domain knowledge, which is not necessarily transferable to new systems. Therefore, efficient and autonomous SPM techniques are needed to reduce the reliance on user supervision and learn optimal strategies for new systems, in particular for the challenge of controlling chemical reactions. In this paper, we developed a software infrastructure named AutoOSS (Autonomous On-Surface Synthesis) to automate bromine removal from Zn(II)-5,15- bis(4-bromo-2,6-dimethylphenyl)porphyrin (ZnBr2Me4DPP) on Au(111), using neural network models to interpret STM outputs and deep reinforcement learning models to optimize manipulation parameters. This is further supported by Bayesian Optimization Structure Search (BOSS) and Density Functional Theory (DFT) computations to explore 3D structures and reaction mechanisms based on STM images.
△ Less
Submitted 30 September, 2024;
originally announced September 2024.
-
ControlMath: Controllable Data Generation Promotes Math Generalist Models
Authors:
Nuo Chen,
Ning Wu,
Jianhui Chang,
Jia Li
Abstract:
Utilizing large language models (LLMs) for data augmentation has yielded encouraging results in mathematical reasoning. However, these approaches face constraints in problem diversity, potentially restricting them to in-domain/distribution data generation. To this end, we propose ControlMath, an iterative method involving an equation-generator module and two LLM-based agents. The module creates di…
▽ More
Utilizing large language models (LLMs) for data augmentation has yielded encouraging results in mathematical reasoning. However, these approaches face constraints in problem diversity, potentially restricting them to in-domain/distribution data generation. To this end, we propose ControlMath, an iterative method involving an equation-generator module and two LLM-based agents. The module creates diverse equations, which the Problem-Crafter agent then transforms into math word problems. The Reverse-Agent filters and selects high-quality data, adhering to the "less is more" principle, achieving better results with fewer data points. This approach enables the generation of diverse math problems, not limited to specific domains or distributions. As a result, we collect ControlMathQA, which involves 190k math word problems. Extensive results prove that combining our dataset with in-domain datasets like GSM8K can help improve the model's mathematical ability to generalize, leading to improved performances both within and beyond specific domains.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
LoopTree: Exploring the Fused-layer Dataflow Accelerator Design Space
Authors:
Michael Gilbert,
Yannan Nellie Wu,
Joel S. Emer,
Vivienne Sze
Abstract:
Latency and energy consumption are key metrics in the performance of deep neural network (DNN) accelerators. A significant factor contributing to latency and energy is data transfers. One method to reduce transfers or data is reusing data when multiple operations use the same data. Fused-layer accelerators reuse data across operations in different layers by retaining intermediate data in on-chip b…
▽ More
Latency and energy consumption are key metrics in the performance of deep neural network (DNN) accelerators. A significant factor contributing to latency and energy is data transfers. One method to reduce transfers or data is reusing data when multiple operations use the same data. Fused-layer accelerators reuse data across operations in different layers by retaining intermediate data in on-chip buffers, which has been shown to reduce energy consumption and latency. Moreover, the intermediate data is often tiled (i.e., broken into chunks) to reduce the on-chip buffer capacity required to reuse the data. Because on-chip buffer capacity is frequently more limited than computation units, fused-layer dataflow accelerators may also recompute certain parts of the intermediate data instead of retaining them in a buffer. Achieving efficient trade-offs between on-chip buffer capacity, off-chip transfers, and recomputation requires systematic exploration of the fused-layer dataflow design space. However, prior work only explored a subset of the design space, and more efficient designs are left unexplored.
In this work, we propose (1) a more extensive design space that has more choices in terms of tiling, data retention, recomputation and, importantly, allows us to explore them in combination, (2) a taxonomy to systematically specify designs, and (3) a model, LoopTree, to evaluate the latency, energy consumption, buffer capacity requirements, and off-chip transfers of designs in this design space. We validate our model against a representative set of prior architectures, achieving a worst-case 4% error. Finally, we present case studies that show how exploring this larger space results in more efficient designs (e.g., up to a 10$\times$ buffer capacity reduction to achieve the same off-chip transfers).
△ Less
Submitted 14 October, 2024; v1 submitted 20 September, 2024;
originally announced September 2024.
-
High-Fidelity Data-Driven Dynamics Model for Reinforcement Learning-based Magnetic Control in HL-3 Tokamak
Authors:
Niannian Wu,
Zongyu Yang,
Rongpeng Li,
Ning Wei,
Yihang Chen,
Qianyun Dong,
Jiyuan Li,
Guohui Zheng,
Xinwen Gong,
Feng Gao,
Bo Li,
Min Xu,
Zhifeng Zhao,
Wulyu Zhong
Abstract:
The drive to control tokamaks, a prominent technology in nuclear fusion, is essential due to its potential to provide a virtually unlimited source of clean energy. Reinforcement learning (RL) promises improved flexibility to manage the intricate and non-linear dynamics of the plasma encapsulated in a tokamak. However, RL typically requires substantial interaction with a simulator capable of accura…
▽ More
The drive to control tokamaks, a prominent technology in nuclear fusion, is essential due to its potential to provide a virtually unlimited source of clean energy. Reinforcement learning (RL) promises improved flexibility to manage the intricate and non-linear dynamics of the plasma encapsulated in a tokamak. However, RL typically requires substantial interaction with a simulator capable of accurately evolving the high-dimensional plasma state. Compared to first-principle-based simulators, whose intense computations lead to sluggish RL training, we devise an effective method to acquire a fully data-driven simulator, by mitigating the arising compounding error issue due to the underlying autoregressive nature. With high accuracy and appealing extrapolation capability, this high-fidelity dynamics model subsequently enables the rapid training of a qualified RL agent to directly generate engineering-reasonable magnetic coil commands, aiming at the desired long-term targets of plasma current and last closed flux surface. Together with a surrogate magnetic equilibrium reconstruction model EFITNN, the RL agent successfully maintains a $100$-ms, $1$ kHz trajectory control with accurate waveform tracking on the HL-3 tokamak. Furthermore, it also demonstrates the feasibility of zero-shot adaptation to changed triangularity targets, confirming the robustness of the developed data-driven dynamics model. Our work underscores the advantage of fully data-driven dynamics models in yielding RL-based trajectory control policies at a sufficiently fast pace, an anticipated engineering requirement in daily discharge practices for the upcoming ITER device.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Think Twice Before You Act: Improving Inverse Problem Solving With MCMC
Authors:
Yaxuan Zhu,
Zehao Dou,
Haoxin Zheng,
Yasi Zhang,
Ying Nian Wu,
Ruiqi Gao
Abstract:
Recent studies demonstrate that diffusion models can serve as a strong prior for solving inverse problems. A prominent example is Diffusion Posterior Sampling (DPS), which approximates the posterior distribution of data given the measure using Tweedie's formula. Despite the merits of being versatile in solving various inverse problems without re-training, the performance of DPS is hindered by the…
▽ More
Recent studies demonstrate that diffusion models can serve as a strong prior for solving inverse problems. A prominent example is Diffusion Posterior Sampling (DPS), which approximates the posterior distribution of data given the measure using Tweedie's formula. Despite the merits of being versatile in solving various inverse problems without re-training, the performance of DPS is hindered by the fact that this posterior approximation can be inaccurate especially for high noise levels. Therefore, we propose \textbf{D}iffusion \textbf{P}osterior \textbf{MC}MC (\textbf{DPMC}), a novel inference algorithm based on Annealed MCMC to solve inverse problems with pretrained diffusion models. We define a series of intermediate distributions inspired by the approximated conditional distributions used by DPS. Through annealed MCMC sampling, we encourage the samples to follow each intermediate distribution more closely before moving to the next distribution at a lower noise level, and therefore reduce the accumulated error along the path. We test our algorithm in various inverse problems, including super resolution, Gaussian deblurring, motion deblurring, inpainting, and phase retrieval. Our algorithm outperforms DPS with less number of evaluations across nearly all tasks, and is competitive among existing approaches.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs
Authors:
Jiaxing Wu,
Lin Ning,
Luyang Liu,
Harrison Lee,
Neo Wu,
Chao Wang,
Sushant Prakash,
Shawn O'Banion,
Bradley Green,
Jun Xie
Abstract:
LLM-powered personalization agent systems employ Large Language Models (LLMs) to predict users' behavior from their past activities. However, their effectiveness often hinges on the ability to effectively leverage extensive, long user historical data due to its inherent noise and length of such data. Existing pretrained LLMs may generate summaries that are concise but lack the necessary context fo…
▽ More
LLM-powered personalization agent systems employ Large Language Models (LLMs) to predict users' behavior from their past activities. However, their effectiveness often hinges on the ability to effectively leverage extensive, long user historical data due to its inherent noise and length of such data. Existing pretrained LLMs may generate summaries that are concise but lack the necessary context for downstream tasks, hindering their utility in personalization systems. To address these challenges, we introduce Reinforcement Learning from Prediction Feedback (RLPF). RLPF fine-tunes LLMs to generate concise, human-readable user summaries that are optimized for downstream task performance. By maximizing the usefulness of the generated summaries, RLPF effectively distills extensive user history data while preserving essential information for downstream tasks. Our empirical evaluation demonstrates significant improvements in both extrinsic downstream task utility and intrinsic summary quality, surpassing baseline methods by up to 22% on downstream task performance and achieving an up to 84.59% win rate on Factuality, Abstractiveness, and Readability. RLPF also achieves a remarkable 74% reduction in context length while improving performance on 16 out of 19 unseen tasks and/or datasets, showcasing its generalizability. This approach offers a promising solution for enhancing LLM personalization by effectively transforming long, noisy user histories into informative and human-readable representations.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
Latent Space Energy-based Neural ODEs
Authors:
Sheng Cheng,
Deqian Kong,
Jianwen Xie,
Kookjin Lee,
Ying Nian Wu,
Yezhou Yang
Abstract:
This paper introduces a novel family of deep dynamical models designed to represent continuous-time sequence data. This family of models generates each data point in the time series by a neural emission model, which is a non-linear transformation of a latent state vector. The trajectory of the latent states is implicitly described by a neural ordinary differential equation (ODE), with the initial…
▽ More
This paper introduces a novel family of deep dynamical models designed to represent continuous-time sequence data. This family of models generates each data point in the time series by a neural emission model, which is a non-linear transformation of a latent state vector. The trajectory of the latent states is implicitly described by a neural ordinary differential equation (ODE), with the initial state following an informative prior distribution parameterized by an energy-based model. Furthermore, we can extend this model to disentangle dynamic states from underlying static factors of variation, represented as time-invariant variables in the latent space. We train the model using maximum likelihood estimation with Markov chain Monte Carlo (MCMC) in an end-to-end manner, without requiring additional assisting components such as an inference network. Our experiments on oscillating systems, videos and real-world state sequences (MuJoCo) illustrate that ODEs with the learnable energy-based prior outperform existing counterparts, and can generalize to new dynamic parameterization, enabling long-horizon predictions.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
UserSumBench: A Benchmark Framework for Evaluating User Summarization Approaches
Authors:
Chao Wang,
Neo Wu,
Lin Ning,
Jiaxing Wu,
Luyang Liu,
Jun Xie,
Shawn O'Banion,
Bradley Green
Abstract:
Large language models (LLMs) have shown remarkable capabilities in generating user summaries from a long list of raw user activity data. These summaries capture essential user information such as preferences and interests, and therefore are invaluable for LLM-based personalization applications, such as explainable recommender systems. However, the development of new summarization techniques is hin…
▽ More
Large language models (LLMs) have shown remarkable capabilities in generating user summaries from a long list of raw user activity data. These summaries capture essential user information such as preferences and interests, and therefore are invaluable for LLM-based personalization applications, such as explainable recommender systems. However, the development of new summarization techniques is hindered by the lack of ground-truth labels, the inherent subjectivity of user summaries, and human evaluation which is often costly and time-consuming. To address these challenges, we introduce \UserSumBench, a benchmark framework designed to facilitate iterative development of LLM-based summarization approaches. This framework offers two key components: (1) A reference-free summary quality metric. We show that this metric is effective and aligned with human preferences across three diverse datasets (MovieLens, Yelp and Amazon Review). (2) A novel robust summarization method that leverages time-hierarchical summarizer and self-critique verifier to produce high-quality summaries while eliminating hallucination. This method serves as a strong baseline for further innovation in summarization techniques.
△ Less
Submitted 5 September, 2024; v1 submitted 29 August, 2024;
originally announced August 2024.
-
Evolution of two-magnon bound states in a higher-spin ferromagnetic chain with single-ion anisotropy: A complete solution
Authors:
Xinlan Lou,
Jiawei Li,
Ning Wu
Abstract:
Few-magnon bound states in quantum spin chains have been long studied and attracted much recent attentions. For a higher-spin ferromagnetic XXZ chain with single-ion anisotropy, several features regarding the evolution of the low-lying two-magnon bound states with varying wave number were observed in the literature. However, most of these observations are only qualitatively understood due to the l…
▽ More
Few-magnon bound states in quantum spin chains have been long studied and attracted much recent attentions. For a higher-spin ferromagnetic XXZ chain with single-ion anisotropy, several features regarding the evolution of the low-lying two-magnon bound states with varying wave number were observed in the literature. However, most of these observations are only qualitatively understood due to the lack of analytical tools. By combining a set of exact two-magnon Bloch states and a plane-wave ansatz, we achieve a complete solution of the two-magnon problem in such a system. We identify parameter regions that support different types of two-magnon bound states, with the boundaries defined by algebraic equations. We discover for the first time a narrow region in which two single-ion bound states coexist. We show that the phase diagrams for distinct wave numbers are similar to each other, which enables us to map the evolution of the bound states to the rectilinear movement of a representative point for given parameters in a rescaled phase diagram. This dynamic picture provides quantitative interpretations of the observed features.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
In situ mixer calibration for superconducting quantum circuits
Authors:
Nan Wu,
Jing Lin,
Changrong Xie,
Zechen Guo,
Wenhui Huang,
Libo Zhang,
Yuxuan Zhou,
Xuandong Sun,
Jiawei Zhang,
Weijie Guo,
Xiayu Linpeng,
Song Liu,
Yang Liu,
Wenhui Ren,
Ziyu Tao,
Ji Jiang,
Ji Chu,
Jingjing Niu,
Youpeng Zhong,
Dapeng Yu
Abstract:
Mixers play a crucial role in superconducting quantum computing, primarily by facilitating frequency conversion of signals to enable precise control and readout of quantum states. However, imperfections, particularly carrier leakage and unwanted sideband signal, can significantly compromise control fidelity. To mitigate these defects, regular and precise mixer calibrations are indispensable, yet t…
▽ More
Mixers play a crucial role in superconducting quantum computing, primarily by facilitating frequency conversion of signals to enable precise control and readout of quantum states. However, imperfections, particularly carrier leakage and unwanted sideband signal, can significantly compromise control fidelity. To mitigate these defects, regular and precise mixer calibrations are indispensable, yet they pose a formidable challenge in large-scale quantum control. Here, we introduce an in situ calibration technique and outcome-focused mixer calibration scheme using superconducting qubits. Our method leverages the qubit's response to imperfect signals, allowing for calibration without modifying the wiring configuration. We experimentally validate the efficacy of this technique by benchmarking single-qubit gate fidelity and qubit coherence time.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Relaxing towards generalized one-body Boltzmann states
Authors:
Sheng-Wen Li,
Ning Wu
Abstract:
Isolated quantum systems follow the reversible unitary evolution; if we focus on the dynamics of local states and observables, they exhibit the irreversible relaxation behaviors. Here we study the local relaxation process in an isolated chain consisting of \emph{N} three level systems. Though the entropy of the full many body state keeps a constant, it turns out the total correlation of this syste…
▽ More
Isolated quantum systems follow the reversible unitary evolution; if we focus on the dynamics of local states and observables, they exhibit the irreversible relaxation behaviors. Here we study the local relaxation process in an isolated chain consisting of \emph{N} three level systems. Though the entropy of the full many body state keeps a constant, it turns out the total correlation of this system approximately exhibits a monotonically increasing behavior. More importantly, a variation analysis shows that, the total correlation entropy would achieve its theoretical maximum when each site stays in a generalized one-body Boltzmann state, which is not solely determined by the energy but also depends on the spin value of each onsite level. It turns out such a theoretical correlation maximum is highly coincident with the result obtained from the exact time dependent evolution. In this sense, the total correlation entropy well serves as an indicator for the dynamical irreversibility of the nonequilibrium relaxation in this isolated system.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Visual Agents as Fast and Slow Thinkers
Authors:
Guangyan Sun,
Mingyu Jin,
Zhenting Wang,
Cheng-Long Wang,
Siqi Ma,
Qifan Wang,
Ying Nian Wu,
Yongfeng Zhang,
Dongfang Liu
Abstract:
Achieving human-level intelligence requires refining cognitive distinctions between System 1 and System 2 thinking. While contemporary AI, driven by large language models, demonstrates human-like traits, it falls short of genuine cognition. Transitioning from structured benchmarks to real-world scenarios presents challenges for visual agents, often leading to inaccurate and overly confident respon…
▽ More
Achieving human-level intelligence requires refining cognitive distinctions between System 1 and System 2 thinking. While contemporary AI, driven by large language models, demonstrates human-like traits, it falls short of genuine cognition. Transitioning from structured benchmarks to real-world scenarios presents challenges for visual agents, often leading to inaccurate and overly confident responses. To address the challenge, we introduce FaST, which incorporates the Fast and Slow Thinking mechanism into visual agents. FaST employs a switch adapter to dynamically select between System 1/2 modes, tailoring the problem-solving approach to different task complexity. It tackles uncertain and unseen objects by adjusting model confidence and integrating new contextual data. With this novel design, we advocate a flexible system, hierarchical reasoning capabilities, and a transparent decision-making pipeline, all of which contribute to its ability to emulate human-like cognitive processes in visual intelligence. Empirical results demonstrate that FaST outperforms various well-known baselines, achieving 80.8% accuracy over VQA^{v2} for visual question answering and 48.7% GIoU score over ReasonSeg for reasoning segmentation, demonstrate FaST's superior performance. Extensive testing validates the efficacy and robustness of FaST's core components, showcasing its potential to advance the development of cognitive visual agents in AI systems. The code is available at ttps://github.com/GuangyanS/Sys2-LLaVA.
△ Less
Submitted 6 September, 2024; v1 submitted 16 August, 2024;
originally announced August 2024.
-
Predicting Lung Cancer Patient Prognosis with Large Language Models
Authors:
Danqing Hu,
Bing Liu,
Xiang Li,
Xiaofeng Zhu,
Nan Wu
Abstract:
Prognosis prediction is crucial for determining optimal treatment plans for lung cancer patients. Traditionally, such predictions relied on models developed from retrospective patient data. Recently, large language models (LLMs) have gained attention for their ability to process and generate text based on extensive learned knowledge. In this study, we evaluate the potential of GPT-4o mini and GPT-…
▽ More
Prognosis prediction is crucial for determining optimal treatment plans for lung cancer patients. Traditionally, such predictions relied on models developed from retrospective patient data. Recently, large language models (LLMs) have gained attention for their ability to process and generate text based on extensive learned knowledge. In this study, we evaluate the potential of GPT-4o mini and GPT-3.5 in predicting the prognosis of lung cancer patients. We collected two prognosis datasets, i.e., survival and post-operative complication datasets, and designed multiple tasks to assess the models' performance comprehensively. Logistic regression models were also developed as baselines for comparison. The experimental results demonstrate that LLMs can achieve competitive, and in some tasks superior, performance in lung cancer prognosis prediction compared to data-driven logistic regression models despite not using additional patient data. These findings suggest that LLMs can be effective tools for prognosis prediction in lung cancer, particularly when patient data is limited or unavailable.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
Diff-PIC: Revolutionizing Particle-In-Cell Nuclear Fusion Simulation with Diffusion Models
Authors:
Chuan Liu,
Chunshu Wu,
Shihui Cao,
Mingkai Chen,
James Chenhao Liang,
Ang Li,
Michael Huang,
Chuang Ren,
Dongfang Liu,
Ying Nian Wu,
Tong Geng
Abstract:
The rapid development of AI highlights the pressing need for sustainable energy, a critical global challenge for decades. Nuclear fusion, generally seen as an ultimate solution, has been the focus of intensive research for nearly a century, with investments reaching hundreds of billions of dollars. Recent advancements in Inertial Confinement Fusion have drawn significant attention to fusion resear…
▽ More
The rapid development of AI highlights the pressing need for sustainable energy, a critical global challenge for decades. Nuclear fusion, generally seen as an ultimate solution, has been the focus of intensive research for nearly a century, with investments reaching hundreds of billions of dollars. Recent advancements in Inertial Confinement Fusion have drawn significant attention to fusion research, in which Laser-Plasma Interaction (LPI) is critical for ensuring fusion stability and efficiency. However, the complexity of LPI upon fusion ignition makes analytical approaches impractical, leaving researchers depending on extremely computation-demanding Particle-in-Cell (PIC) simulations to generate data, presenting a significant bottleneck to advancing fusion research. In response, this work introduces Diff-PIC, a novel framework that leverages conditional diffusion models as a computationally efficient alternative to PIC simulations for generating high-fidelity scientific LPI data. In this work, physical patterns captured by PIC simulations are distilled into diffusion models associated with two tailored enhancements: (1) To effectively capture the complex relationships between physical parameters and corresponding outcomes, the parameters are encoded in a physically-informed manner. (2) To further enhance efficiency while maintaining high fidelity and physical validity, the rectified flow technique is employed to transform our model into a one-step conditional diffusion model. Experimental results show that Diff-PIC achieves 16,200$\times$ speedup compared to traditional PIC on a 100 picosecond simulation, with an average reduction in MAE / RMSE / FID of 59.21% / 57.15% / 39.46% with respect to two other SOTA data generation approaches.
△ Less
Submitted 5 October, 2024; v1 submitted 3 August, 2024;
originally announced August 2024.
-
The Power of Combining Data and Knowledge: GPT-4o is an Effective Interpreter of Machine Learning Models in Predicting Lymph Node Metastasis of Lung Cancer
Authors:
Danqing Hu,
Bing Liu,
Xiaofeng Zhu,
Nan Wu
Abstract:
Lymph node metastasis (LNM) is a crucial factor in determining the initial treatment for patients with lung cancer, yet accurate preoperative diagnosis of LNM remains challenging. Recently, large language models (LLMs) have garnered significant attention due to their remarkable text generation capabilities. Leveraging the extensive medical knowledge learned from vast corpora, LLMs can estimate pro…
▽ More
Lymph node metastasis (LNM) is a crucial factor in determining the initial treatment for patients with lung cancer, yet accurate preoperative diagnosis of LNM remains challenging. Recently, large language models (LLMs) have garnered significant attention due to their remarkable text generation capabilities. Leveraging the extensive medical knowledge learned from vast corpora, LLMs can estimate probabilities for clinical problems, though their performance has historically been inferior to data-driven machine learning models. In this paper, we propose a novel ensemble method that combines the medical knowledge acquired by LLMs with the latent patterns identified by machine learning models to enhance LNM prediction performance. Initially, we developed machine learning models using patient data. We then designed a prompt template to integrate the patient data with the predicted probability from the machine learning model. Subsequently, we instructed GPT-4o, the most advanced LLM developed by OpenAI, to estimate the likelihood of LNM based on patient data and then adjust the estimate using the machine learning output. Finally, we collected three outputs from the GPT-4o using the same prompt and ensembled these results as the final prediction. Using the proposed method, our models achieved an AUC value of 0.778 and an AP value of 0.426 for LNM prediction, significantly improving predictive performance compared to baseline machine learning models. The experimental results indicate that GPT-4o can effectively leverage its medical knowledge and the probabilities predicted by machine learning models to achieve more accurate LNM predictions. These findings demonstrate that LLMs can perform well in clinical risk prediction tasks, offering a new paradigm for integrating medical knowledge and patient data in clinical predictions.
△ Less
Submitted 14 August, 2024; v1 submitted 25 July, 2024;
originally announced July 2024.
-
TLRN: Temporal Latent Residual Networks For Large Deformation Image Registration
Authors:
Nian Wu,
Jiarui Xing,
Miaomiao Zhang
Abstract:
This paper presents a novel approach, termed {\em Temporal Latent Residual Network (TLRN)}, to predict a sequence of deformation fields in time-series image registration. The challenge of registering time-series images often lies in the occurrence of large motions, especially when images differ significantly from a reference (e.g., the start of a cardiac cycle compared to the peak stretching phase…
▽ More
This paper presents a novel approach, termed {\em Temporal Latent Residual Network (TLRN)}, to predict a sequence of deformation fields in time-series image registration. The challenge of registering time-series images often lies in the occurrence of large motions, especially when images differ significantly from a reference (e.g., the start of a cardiac cycle compared to the peak stretching phase). To achieve accurate and robust registration results, we leverage the nature of motion continuity and exploit the temporal smoothness in consecutive image frames. Our proposed TLRN highlights a temporal residual network with residual blocks carefully designed in latent deformation spaces, which are parameterized by time-sequential initial velocity fields. We treat a sequence of residual blocks over time as a dynamic training system, where each block is designed to learn the residual function between desired deformation features and current input accumulated from previous time frames. We validate the effectivenss of TLRN on both synthetic data and real-world cine cardiac magnetic resonance (CMR) image videos. Our experimental results shows that TLRN is able to achieve substantially improved registration accuracy compared to the state-of-the-art. Our code is publicly available at https://github.com/nellie689/TLRN.
△ Less
Submitted 23 July, 2024; v1 submitted 15 July, 2024;
originally announced July 2024.
-
Inertial Confinement Fusion Forecasting via Large Language Models
Authors:
Mingkai Chen,
Taowen Wang,
Shihui Cao,
James Chenhao Liang,
Chuan Liu,
Chunshu Wu,
Qifan Wang,
Ying Nian Wu,
Michael Huang,
Chuang Ren,
Ang Li,
Tong Geng,
Dongfang Liu
Abstract:
Controlled fusion energy is deemed pivotal for the advancement of human civilization. In this study, we introduce $\textbf{LPI-LLM}$, a novel integration of Large Language Models (LLMs) with classical reservoir computing paradigms tailored to address a critical challenge, Laser-Plasma Instabilities ($\texttt{LPI}$), in Inertial Confinement Fusion ($\texttt{ICF}$). Our approach offers several key c…
▽ More
Controlled fusion energy is deemed pivotal for the advancement of human civilization. In this study, we introduce $\textbf{LPI-LLM}$, a novel integration of Large Language Models (LLMs) with classical reservoir computing paradigms tailored to address a critical challenge, Laser-Plasma Instabilities ($\texttt{LPI}$), in Inertial Confinement Fusion ($\texttt{ICF}$). Our approach offers several key contributions: Firstly, we propose the $\textit{LLM-anchored Reservoir}$, augmented with a $\textit{Fusion-specific Prompt}$, enabling accurate forecasting of $\texttt{LPI}$-generated-hot electron dynamics during implosion. Secondly, we develop $\textit{Signal-Digesting Channels}$ to temporally and spatially describe the driver laser intensity across time, capturing the unique characteristics of $\texttt{ICF}$ inputs. Lastly, we design the $\textit{Confidence Scanner}$ to quantify the confidence level in forecasting, providing valuable insights for domain experts to design the $\texttt{ICF}$ process. Extensive experiments demonstrate the superior performance of our method, achieving 1.90 CAE, 0.14 $\texttt{top-1}$ MAE, and 0.11 $\texttt{top-5}$ MAE in predicting Hard X-ray ($\texttt{HXR}$) energies emitted by the hot electrons in $\texttt{ICF}$ implosions, which presents state-of-the-art comparisons against concurrent best systems. Additionally, we present $\textbf{LPI4AI}$, the first $\texttt{LPI}$ benchmark based on physical experiments, aimed at fostering novel ideas in $\texttt{LPI}$ research and enhancing the utility of LLMs in scientific exploration. Overall, our work strives to forge an innovative synergy between AI and $\texttt{ICF}$ for advancing fusion energy.
△ Less
Submitted 14 October, 2024; v1 submitted 15 July, 2024;
originally announced July 2024.
-
Systematic Literature Review of AI-enabled Spectrum Management in 6G and Future Networks
Authors:
Bushra Sabir,
Shuiqiao Yang,
David Nguyen,
Nan Wu,
Alsharif Abuadbba,
Hajime Suzuki,
Shangqi Lai,
Wei Ni,
Ding Ming,
Surya Nepal
Abstract:
Artificial Intelligence (AI) has advanced significantly in various domains like healthcare, finance, and cybersecurity, with successes such as DeepMind's medical imaging and Tesla's autonomous vehicles. As telecommunications transition from 5G to 6G, integrating AI is crucial for complex demands like data processing, network optimization, and security. Despite ongoing research, there's a gap in co…
▽ More
Artificial Intelligence (AI) has advanced significantly in various domains like healthcare, finance, and cybersecurity, with successes such as DeepMind's medical imaging and Tesla's autonomous vehicles. As telecommunications transition from 5G to 6G, integrating AI is crucial for complex demands like data processing, network optimization, and security. Despite ongoing research, there's a gap in consolidating AI-enabled Spectrum Management (AISM) advancements. Traditional spectrum management methods are inadequate for 6G due to its dynamic and complex demands, making AI essential for spectrum optimization, security, and network efficiency. This study aims to address this gap by: (i) Conducting a systematic review of AISM methodologies, focusing on learning models, data handling techniques, and performance metrics. (ii) Examining security and privacy concerns related to AI and traditional network threats within AISM contexts. Using the Systematic Literature Review (SLR) methodology, we meticulously analyzed 110 primary studies to: (a) Identify AI's utility in spectrum management. (b) Develop a taxonomy of AI approaches. (c) Classify datasets and performance metrics used. (d) Detail security and privacy threats and countermeasures. Our findings reveal challenges such as under-explored AI usage in critical AISM systems, computational resource demands, transparency issues, the need for real-world datasets, imbalances in security and privacy research, and the absence of testbeds, benchmarks, and security analysis tools. Addressing these challenges is vital for maximizing AI's potential in advancing 6G technology.
△ Less
Submitted 12 June, 2024;
originally announced July 2024.
-
Adaptive Bayesian Regression on Data with Low Intrinsic Dimensionality
Authors:
Tao Tang,
Nan Wu,
Xiuyuan Cheng,
David Dunson
Abstract:
We study how the posterior contraction rate under a Gaussian process (GP) prior depends on the intrinsic dimension of the predictors and smoothness of the regression function. An open question is whether a generic GP prior that does not incorporate knowledge of the intrinsic lower-dimensional structure of the predictors can attain an adaptive rate for a broad class of such structures. We show that…
▽ More
We study how the posterior contraction rate under a Gaussian process (GP) prior depends on the intrinsic dimension of the predictors and smoothness of the regression function. An open question is whether a generic GP prior that does not incorporate knowledge of the intrinsic lower-dimensional structure of the predictors can attain an adaptive rate for a broad class of such structures. We show that this is indeed the case, establishing conditions under which the posterior contraction rates become adaptive to the intrinsic dimension $\varrho$ in terms of the covering number of the data domain $X$ (the Minkowski dimension), and prove the optimal posterior contraction rate $O(n^{-s/(2s +\varrho)})$, up to a logarithmic factor, assuming an approximation order $s$ of the reproducing kernel Hilbert space (RKHS) on ${X}$. When ${X}$ is a $\varrho$-dimensional compact smooth manifold, we study RKHS approximations to intrinsically defined $s$-order Hölder functions on the manifold for any positive $s$ by a novel analysis of kernel approximations on manifolds, leading to the optimal adaptive posterior contraction rate. We propose an empirical Bayes prior on the kernel bandwidth using kernel affinity and $k$-nearest neighbor statistics, eliminating the need for prior knowledge of the intrinsic dimension. The efficiency of the proposed Bayesian regression approach is demonstrated on various numerical experiments.
△ Less
Submitted 5 September, 2024; v1 submitted 12 July, 2024;
originally announced July 2024.
-
GazeFusion: Saliency-guided Image Generation
Authors:
Yunxiang Zhang,
Nan Wu,
Connor Z. Lin,
Gordon Wetzstein,
Qi Sun
Abstract:
Diffusion models offer unprecedented image generation capabilities given just a text prompt. While emerging control mechanisms have enabled users to specify the desired spatial arrangements of the generated content, they cannot predict or control where viewers will pay more attention due to the complexity of human vision. Recognizing the critical necessity of attention-controllable image generatio…
▽ More
Diffusion models offer unprecedented image generation capabilities given just a text prompt. While emerging control mechanisms have enabled users to specify the desired spatial arrangements of the generated content, they cannot predict or control where viewers will pay more attention due to the complexity of human vision. Recognizing the critical necessity of attention-controllable image generation in practical applications, we present a saliency-guided framework to incorporate the data priors of human visual attention into the generation process. Given a desired viewer attention distribution, our control module conditions a diffusion model to generate images that attract viewers' attention toward desired areas. To assess the efficacy of our approach, we performed an eye-tracked user study and a large-scale model-based saliency analysis. The results evidence that both the cross-user eye gaze distributions and the saliency model predictions align with the desired attention distributions. Lastly, we outline several applications, including interactive design of saliency guidance, attention suppression in unwanted regions, and adaptive generation for varied display/viewing conditions.
△ Less
Submitted 16 March, 2024;
originally announced July 2024.
-
FedIA: Federated Medical Image Segmentation with Heterogeneous Annotation Completeness
Authors:
Yangyang Xiang,
Nannan Wu,
Li Yu,
Xin Yang,
Kwang-Ting Cheng,
Zengqiang Yan
Abstract:
Federated learning has emerged as a compelling paradigm for medical image segmentation, particularly in light of increasing privacy concerns. However, most of the existing research relies on relatively stringent assumptions regarding the uniformity and completeness of annotations across clients. Contrary to this, this paper highlights a prevalent challenge in medical practice: incomplete annotatio…
▽ More
Federated learning has emerged as a compelling paradigm for medical image segmentation, particularly in light of increasing privacy concerns. However, most of the existing research relies on relatively stringent assumptions regarding the uniformity and completeness of annotations across clients. Contrary to this, this paper highlights a prevalent challenge in medical practice: incomplete annotations. Such annotations can introduce incorrectly labeled pixels, potentially undermining the performance of neural networks in supervised learning. To tackle this issue, we introduce a novel solution, named FedIA. Our insight is to conceptualize incomplete annotations as noisy data (i.e., low-quality data), with a focus on mitigating their adverse effects. We begin by evaluating the completeness of annotations at the client level using a designed indicator. Subsequently, we enhance the influence of clients with more comprehensive annotations and implement corrections for incomplete ones, thereby ensuring that models are trained on accurate data. Our method's effectiveness is validated through its superior performance on two extensively used medical image segmentation datasets, outperforming existing solutions. The code is available at https://github.com/HUSTxyy/FedIA.
△ Less
Submitted 3 July, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
LaMoD: Latent Motion Diffusion Model For Myocardial Strain Generation
Authors:
Jiarui Xing,
Nivetha Jayakumar,
Nian Wu,
Yu Wang,
Frederick H. Epstein,
Miaomiao Zhang
Abstract:
Motion and deformation analysis of cardiac magnetic resonance (CMR) imaging videos is crucial for assessing myocardial strain of patients with abnormal heart functions. Recent advances in deep learning-based image registration algorithms have shown promising results in predicting motion fields from routinely acquired CMR sequences. However, their accuracy often diminishes in regions with subtle ap…
▽ More
Motion and deformation analysis of cardiac magnetic resonance (CMR) imaging videos is crucial for assessing myocardial strain of patients with abnormal heart functions. Recent advances in deep learning-based image registration algorithms have shown promising results in predicting motion fields from routinely acquired CMR sequences. However, their accuracy often diminishes in regions with subtle appearance change, with errors propagating over time. Advanced imaging techniques, such as displacement encoding with stimulated echoes (DENSE) CMR, offer highly accurate and reproducible motion data but require additional image acquisition, which poses challenges in busy clinical flows. In this paper, we introduce a novel Latent Motion Diffusion model (LaMoD) to predict highly accurate DENSE motions from standard CMR videos. More specifically, our method first employs an encoder from a pre-trained registration network that learns latent motion features (also considered as deformation-based shape features) from image sequences. Supervised by the ground-truth motion provided by DENSE, LaMoD then leverages a probabilistic latent diffusion model to reconstruct accurate motion from these extracted features. Experimental results demonstrate that our proposed method, LaMoD, significantly improves the accuracy of motion analysis in standard CMR images; hence improving myocardial strain analysis in clinical settings for cardiac patients. Our code will be publicly available on upon acceptance.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
FedMLP: Federated Multi-Label Medical Image Classification under Task Heterogeneity
Authors:
Zhaobin Sun,
Nannan Wu,
Junjie Shi,
Li Yu,
Xin Yang,
Kwang-Ting Cheng,
Zengqiang Yan
Abstract:
Cross-silo federated learning (FL) enables decentralized organizations to collaboratively train models while preserving data privacy and has made significant progress in medical image classification. One common assumption is task homogeneity where each client has access to all classes during training. However, in clinical practice, given a multi-label classification task, constrained by the level…
▽ More
Cross-silo federated learning (FL) enables decentralized organizations to collaboratively train models while preserving data privacy and has made significant progress in medical image classification. One common assumption is task homogeneity where each client has access to all classes during training. However, in clinical practice, given a multi-label classification task, constrained by the level of medical knowledge and the prevalence of diseases, each institution may diagnose only partial categories, resulting in task heterogeneity. How to pursue effective multi-label medical image classification under task heterogeneity is under-explored. In this paper, we first formulate such a realistic label missing setting in the multi-label FL domain and propose a two-stage method FedMLP to combat class missing from two aspects: pseudo label tagging and global knowledge learning. The former utilizes a warmed-up model to generate class prototypes and select samples with high confidence to supplement missing labels, while the latter uses a global model as a teacher for consistency regularization to prevent forgetting missing class knowledge. Experiments on two publicly-available medical datasets validate the superiority of FedMLP against the state-of-the-art both federated semi-supervised and noisy label learning approaches under task heterogeneity. Code is available at https://github.com/szbonaldo/FedMLP.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Boundary Detection Algorithm Inspired by Locally Linear Embedding
Authors:
Pei-Cheng Kuo,
Nan Wu
Abstract:
In the study of high-dimensional data, it is often assumed that the data set possesses an underlying lower-dimensional structure. A practical model for this structure is an embedded compact manifold with boundary. Since the underlying manifold structure is typically unknown, identifying boundary points from the data distributed on the manifold is crucial for various applications. In this work, we…
▽ More
In the study of high-dimensional data, it is often assumed that the data set possesses an underlying lower-dimensional structure. A practical model for this structure is an embedded compact manifold with boundary. Since the underlying manifold structure is typically unknown, identifying boundary points from the data distributed on the manifold is crucial for various applications. In this work, we propose a method for detecting boundary points inspired by the widely used locally linear embedding algorithm. We implement this method using two nearest neighborhood search schemes: the $ε$-radius ball scheme and the $K$-nearest neighbor scheme. This algorithm incorporates the geometric information of the data structure, particularly through its close relation with the local covariance matrix. We discuss the selection the key parameter and analyze the algorithm through our exploration of the spectral properties of the local covariance matrix in both neighborhood search schemes. Furthermore, we demonstrate the algorithm's performance with simulated examples.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning
Authors:
Nemin Wu,
Qian Cao,
Zhangyu Wang,
Zeping Liu,
Yanlin Qi,
Jielu Zhang,
Joshua Ni,
Xiaobai Yao,
Hongxu Ma,
Lan Mu,
Stefano Ermon,
Tanuja Ganu,
Akshay Nambi,
Ni Lao,
Gengchen Mai
Abstract:
Spatial representation learning (SRL) aims at learning general-purpose neural network representations from various types of spatial data (e.g., points, polylines, polygons, networks, images, etc.) in their native formats. Learning good spatial representations is a fundamental problem for various downstream applications such as species distribution modeling, weather forecasting, trajectory generati…
▽ More
Spatial representation learning (SRL) aims at learning general-purpose neural network representations from various types of spatial data (e.g., points, polylines, polygons, networks, images, etc.) in their native formats. Learning good spatial representations is a fundamental problem for various downstream applications such as species distribution modeling, weather forecasting, trajectory generation, geographic question answering, etc. Even though SRL has become the foundation of almost all geospatial artificial intelligence (GeoAI) research, we have not yet seen significant efforts to develop an extensive deep learning framework and benchmark to support SRL model development and evaluation. To fill this gap, we propose TorchSpatial, a learning framework and benchmark for location (point) encoding, which is one of the most fundamental data types of spatial representation learning. TorchSpatial contains three key components: 1) a unified location encoding framework that consolidates 15 commonly recognized location encoders, ensuring scalability and reproducibility of the implementations; 2) the LocBench benchmark tasks encompassing 7 geo-aware image classification and 4 geo-aware image regression datasets; 3) a comprehensive suite of evaluation metrics to quantify geo-aware models' overall performance as well as their geographic bias, with a novel Geo-Bias Score metric. Finally, we provide a detailed analysis and insights into the model performance and geographic bias of different location encoders. We believe TorchSpatial will foster future advancement of spatial representation learning and spatial fairness in GeoAI research. The TorchSpatial model framework, LocBench, and Geo-Bias Score evaluation framework are available at https://github.com/seai-lab/TorchSpatial.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Towards Truthful Multilingual Large Language Models: Benchmarking and Alignment Strategies
Authors:
Weihao Liu,
Ning Wu,
Wenbiao Ding,
Shining Liang,
Ming Gong,
Dongmei Zhang
Abstract:
In the era of large language models (LLMs), building multilingual large language models (MLLMs) that can serve users worldwide holds great significance. However, existing research seldom focuses on the truthfulness of MLLMs. Meanwhile, contemporary multilingual aligning technologies struggle to balance massive languages and often exhibit serious truthfulness gaps across different languages, especi…
▽ More
In the era of large language models (LLMs), building multilingual large language models (MLLMs) that can serve users worldwide holds great significance. However, existing research seldom focuses on the truthfulness of MLLMs. Meanwhile, contemporary multilingual aligning technologies struggle to balance massive languages and often exhibit serious truthfulness gaps across different languages, especially those that differ greatly from English. In our work, we construct a benchmark for truthfulness evaluation in multilingual scenarios and explore the ways to align facts across languages to enhance the truthfulness of MLLMs. Furthermore, we propose Fact-aware Multilingual Selective Synergy (FaMSS) to optimize the data allocation across a large number of languages and different data types. Experimental results demonstrate that our approach can effectively reduce the multilingual representation disparity and enhance the multilingual capabilities of LLMs.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
On the Bures metric, C*-norm, and the quantum metric
Authors:
Konrad Aguilar,
Karina Behera,
Tron Omland,
Nicole Wu
Abstract:
We prove that the topology on the density space with respect to a unital C*-algebra and a faithful induced by the C*-norm is finer than the Bures metric topology. We also provide an example when this containment is strict. Next, we provide a metric on the density space induced by a quantum metric in the sense of Rieffel and prove that the induced topology is the same as the topology induced by the…
▽ More
We prove that the topology on the density space with respect to a unital C*-algebra and a faithful induced by the C*-norm is finer than the Bures metric topology. We also provide an example when this containment is strict. Next, we provide a metric on the density space induced by a quantum metric in the sense of Rieffel and prove that the induced topology is the same as the topology induced by the Bures metric and C*-norm when the C*-algebra is assumed to be finite dimensional. Finally, we provide an example of when the Bures metric and induced quantum metric are not metric equivalent. Thus, we provide a bridge between these aspects of quantum information theory and noncommutative metric geometry.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Active Islanding Detection Using Pulse Compression Probing
Authors:
Nicholas Piaquadio,
N. Eva Wu,
Morteza Sarailoo
Abstract:
An islanding detection scheme is developed using pulse compression probing (PCP). A state space system realization is taken from the probing output. The nu-gap metric is applied to compare the measured system to fully intact system and classify it as islanded, or grid-connected. The designed detector displays fast operation, accurate islanding detection results under varying grid condition, and is…
▽ More
An islanding detection scheme is developed using pulse compression probing (PCP). A state space system realization is taken from the probing output. The nu-gap metric is applied to compare the measured system to fully intact system and classify it as islanded, or grid-connected. The designed detector displays fast operation, accurate islanding detection results under varying grid condition, and is physically implementable at the terminals of an inverter. The method is verified via electro-magnetic transient (EMT) simulation on a modified IEEE 34 bus test system with randomized loads and simultaneous probing at three independent solar plants, with the probing signal directly implemented into the logic of a switching inverter model.
△ Less
Submitted 18 July, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
InterPreT: Interactive Predicate Learning from Language Feedback for Generalizable Task Planning
Authors:
Muzhi Han,
Yifeng Zhu,
Song-Chun Zhu,
Ying Nian Wu,
Yuke Zhu
Abstract:
Learning abstract state representations and knowledge is crucial for long-horizon robot planning. We present InterPreT, an LLM-powered framework for robots to learn symbolic predicates from language feedback of human non-experts during embodied interaction. The learned predicates provide relational abstractions of the environment state, facilitating the learning of symbolic operators that capture…
▽ More
Learning abstract state representations and knowledge is crucial for long-horizon robot planning. We present InterPreT, an LLM-powered framework for robots to learn symbolic predicates from language feedback of human non-experts during embodied interaction. The learned predicates provide relational abstractions of the environment state, facilitating the learning of symbolic operators that capture action preconditions and effects. By compiling the learned predicates and operators into a PDDL domain on-the-fly, InterPreT allows effective planning toward arbitrary in-domain goals using a PDDL planner. In both simulated and real-world robot manipulation domains, we demonstrate that InterPreT reliably uncovers the key predicates and operators governing the environment dynamics. Although learned from simple training tasks, these predicates and operators exhibit strong generalization to novel tasks with significantly higher complexity. In the most challenging generalization setting, InterPreT attains success rates of 73% in simulation and 40% in the real world, substantially outperforming baseline methods.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Flow Priors for Linear Inverse Problems via Iterative Corrupted Trajectory Matching
Authors:
Yasi Zhang,
Peiyu Yu,
Yaxuan Zhu,
Yingshan Chang,
Feng Gao,
Ying Nian Wu,
Oscar Leong
Abstract:
Generative models based on flow matching have attracted significant attention for their simplicity and superior performance in high-resolution image synthesis. By leveraging the instantaneous change-of-variables formula, one can directly compute image likelihoods from a learned flow, making them enticing candidates as priors for downstream tasks such as inverse problems. In particular, a natural a…
▽ More
Generative models based on flow matching have attracted significant attention for their simplicity and superior performance in high-resolution image synthesis. By leveraging the instantaneous change-of-variables formula, one can directly compute image likelihoods from a learned flow, making them enticing candidates as priors for downstream tasks such as inverse problems. In particular, a natural approach would be to incorporate such image probabilities in a maximum-a-posteriori (MAP) estimation problem. A major obstacle, however, lies in the slow computation of the log-likelihood, as it requires backpropagating through an ODE solver, which can be prohibitively slow for high-dimensional problems. In this work, we propose an iterative algorithm to approximate the MAP estimator efficiently to solve a variety of linear inverse problems. Our algorithm is mathematically justified by the observation that the MAP objective can be approximated by a sum of $N$ ``local MAP'' objectives, where $N$ is the number of function evaluations. By leveraging Tweedie's formula, we show that we can perform gradient steps to sequentially optimize these objectives. We validate our approach for various linear inverse problems, such as super-resolution, deblurring, inpainting, and compressed sensing, and demonstrate that we can outperform other methods based on flow matching.
△ Less
Submitted 30 September, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
Atlas3D: Physically Constrained Self-Supporting Text-to-3D for Simulation and Fabrication
Authors:
Yunuo Chen,
Tianyi Xie,
Zeshun Zong,
Xuan Li,
Feng Gao,
Yin Yang,
Ying Nian Wu,
Chenfanfu Jiang
Abstract:
Existing diffusion-based text-to-3D generation methods primarily focus on producing visually realistic shapes and appearances, often neglecting the physical constraints necessary for downstream tasks. Generated models frequently fail to maintain balance when placed in physics-based simulations or 3D printed. This balance is crucial for satisfying user design intentions in interactive gaming, embod…
▽ More
Existing diffusion-based text-to-3D generation methods primarily focus on producing visually realistic shapes and appearances, often neglecting the physical constraints necessary for downstream tasks. Generated models frequently fail to maintain balance when placed in physics-based simulations or 3D printed. This balance is crucial for satisfying user design intentions in interactive gaming, embodied AI, and robotics, where stable models are needed for reliable interaction. Additionally, stable models ensure that 3D-printed objects, such as figurines for home decoration, can stand on their own without requiring additional supports. To fill this gap, we introduce Atlas3D, an automatic and easy-to-implement method that enhances existing Score Distillation Sampling (SDS)-based text-to-3D tools. Atlas3D ensures the generation of self-supporting 3D models that adhere to physical laws of stability under gravity, contact, and friction. Our approach combines a novel differentiable simulation-based loss function with physically inspired regularization, serving as either a refinement or a post-processing module for existing frameworks. We verify Atlas3D's efficacy through extensive generation tasks and validate the resulting 3D models in both simulated and real-world environments.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
An Investigation of Conformal Isometry Hypothesis for Grid Cells
Authors:
Dehong Xu,
Ruiqi Gao,
Wen-Hao Zhang,
Xue-Xin Wei,
Ying Nian Wu
Abstract:
This paper investigates the conformal isometry hypothesis as a potential explanation for hexagonal periodic patterns in grid cell response maps. The hypothesis posits that grid cell activity forms a high-dimensional vector in neural space, encoding the agent's position in 2D physical space. As the agent moves, this vector rotates within a 2D manifold in the neural space, driven by a recurrent neur…
▽ More
This paper investigates the conformal isometry hypothesis as a potential explanation for hexagonal periodic patterns in grid cell response maps. The hypothesis posits that grid cell activity forms a high-dimensional vector in neural space, encoding the agent's position in 2D physical space. As the agent moves, this vector rotates within a 2D manifold in the neural space, driven by a recurrent neural network. The conformal hypothesis suggests that this neural manifold is a conformally isometric embedding of physical space, where local displacements in neural space are proportional to those in physical space. In this paper, we conduct numerical experiments to show that this hypothesis leads to the hexagon periodic patterns of grid cells, agnostic to the choice of transformation models. Furthermore, we present a theoretical understanding that hexagon patterns emerge by minimizing our loss function because hexagon flat torus exhibits minimal deviation from local conformal isometry. In addition, we propose a conformal modulation of the agent's input velocity, enabling the recurrent neural network of grid cells to satisfy the conformal isometry hypothesis automatically.
△ Less
Submitted 10 October, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
EM Distillation for One-step Diffusion Models
Authors:
Sirui Xie,
Zhisheng Xiao,
Diederik P Kingma,
Tingbo Hou,
Ying Nian Wu,
Kevin Patrick Murphy,
Tim Salimans,
Ben Poole,
Ruiqi Gao
Abstract:
While diffusion models can learn complex distributions, sampling requires a computationally expensive iterative process. Existing distillation methods enable efficient sampling, but have notable limitations, such as performance degradation with very few sampling steps, reliance on training data access, or mode-seeking optimization that may fail to capture the full distribution. We propose EM Disti…
▽ More
While diffusion models can learn complex distributions, sampling requires a computationally expensive iterative process. Existing distillation methods enable efficient sampling, but have notable limitations, such as performance degradation with very few sampling steps, reliance on training data access, or mode-seeking optimization that may fail to capture the full distribution. We propose EM Distillation (EMD), a maximum likelihood-based approach that distills a diffusion model to a one-step generator model with minimal loss of perceptual quality. Our approach is derived through the lens of Expectation-Maximization (EM), where the generator parameters are updated using samples from the joint distribution of the diffusion teacher prior and inferred generator latents. We develop a reparametrized sampling scheme and a noise cancellation technique that together stabilizes the distillation process. We further reveal an interesting connection of our method with existing methods that minimize mode-seeking KL. EMD outperforms existing one-step generative methods in terms of FID scores on ImageNet-64 and ImageNet-128, and compares favorably with prior work on distilling text-to-image diffusion models.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space
Authors:
Peiyu Yu,
Dinghuai Zhang,
Hengzhi He,
Xiaojian Ma,
Ruiyao Miao,
Yifan Lu,
Yasi Zhang,
Deqian Kong,
Ruiqi Gao,
Jianwen Xie,
Guang Cheng,
Ying Nian Wu
Abstract:
Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues inclu…
▽ More
Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues include but are not limited to high sample complexity, which relates to inaccurate approximation of black-box function; and insufficient coverage and exploration of input design modes, which leads to suboptimal proposal of new input designs. In this work, we consider finding a latent space that serves as a compressed yet accurate representation of the design-value joint space, enabling effective latent exploration of high-value input design modes. To this end, we formulate an learnable energy-based latent space, and propose Noise-intensified Telescoping density-Ratio Estimation (NTRE) scheme for variational learning of an accurate latent space model without costly Markov Chain Monte Carlo. The optimization process is then exploration of high-value designs guided by the learned energy-based model in the latent space, formulated as gradient-based sampling from a latent-variable-parameterized inverse model. We show that our particular parameterization encourages expanded exploration around high-value design modes, motivated by inversion thinking of a fundamental result of conditional covariance matrix typically used for variance reduction. We observe that our method, backed by an accurately learned informative latent space and an expanding-exploration model design, yields significant improvements over strong previous methods on both synthetic and real world datasets such as the design-bench suite.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Finetuning Large Language Model for Personalized Ranking
Authors:
Zhuoxi Bai,
Ning Wu,
Fengyu Cai,
Xinyi Zhu,
Yun Xiong
Abstract:
Large Language Models (LLMs) have demonstrated remarkable performance across various domains, motivating researchers to investigate their potential use in recommendation systems. However, directly applying LLMs to recommendation tasks has proven challenging due to the significant disparity between the data used for pre-training LLMs and the specific requirements of recommendation tasks. In this st…
▽ More
Large Language Models (LLMs) have demonstrated remarkable performance across various domains, motivating researchers to investigate their potential use in recommendation systems. However, directly applying LLMs to recommendation tasks has proven challenging due to the significant disparity between the data used for pre-training LLMs and the specific requirements of recommendation tasks. In this study, we introduce Direct Multi-Preference Optimization (DMPO), a streamlined framework designed to bridge the gap and enhance the alignment of LLMs for recommendation tasks. DMPO enhances the performance of LLM-based recommenders by simultaneously maximizing the probability of positive samples and minimizing the probability of multiple negative samples. We conducted experimental evaluations to compare DMPO against traditional recommendation methods and other LLM-based recommendation approaches. The results demonstrate that DMPO significantly improves the recommendation capabilities of LLMs across three real-world public datasets in few-shot scenarios. Additionally, the experiments indicate that DMPO exhibits superior generalization ability in cross-domain recommendations. A case study elucidates the reasons behind these consistent improvements and also underscores DMPO's potential as an explainable recommendation system.
△ Less
Submitted 20 June, 2024; v1 submitted 25 May, 2024;
originally announced May 2024.
-
Watermarking Generative Tabular Data
Authors:
Hengzhi He,
Peiyu Yu,
Junpeng Ren,
Ying Nian Wu,
Guang Cheng
Abstract:
In this paper, we introduce a simple yet effective tabular data watermarking mechanism with statistical guarantees. We show theoretically that the proposed watermark can be effectively detected, while faithfully preserving the data fidelity, and also demonstrates appealing robustness against additive noise attack. The general idea is to achieve the watermarking through a strategic embedding based…
▽ More
In this paper, we introduce a simple yet effective tabular data watermarking mechanism with statistical guarantees. We show theoretically that the proposed watermark can be effectively detected, while faithfully preserving the data fidelity, and also demonstrates appealing robustness against additive noise attack. The general idea is to achieve the watermarking through a strategic embedding based on simple data binning. Specifically, it divides the feature's value range into finely segmented intervals and embeds watermarks into selected ``green list" intervals. To detect the watermarks, we develop a principled statistical hypothesis-testing framework with minimal assumptions: it remains valid as long as the underlying data distribution has a continuous density function. The watermarking efficacy is demonstrated through rigorous theoretical analysis and empirical validation, highlighting its utility in enhancing the security of synthetic and real-world datasets.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRI
Authors:
Yirong Zhou,
Chengyan Wang,
Mengtian Lu,
Kunyuan Guo,
Zi Wang,
Dan Ruan,
Rui Guo,
Peijun Zhao,
Jianhua Wang,
Naiming Wu,
Jianzhong Lin,
Yinyin Chen,
Hang Jin,
Lianxin Xie,
Lilan Wu,
Liuhong Zhu,
Jianjun Zhou,
Congbo Cai,
He Wang,
Xiaobo Qu
Abstract:
In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features…
▽ More
In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features a T2-refine fusion decoder for quantitative analysis, leveraging global features from the Transformer, and a segmentation decoder with multiple local region supervision for enhanced accuracy. A tight coupling module aligns and fuses CNN and Transformer branch features, enabling SQNet to focus on myocardium regions. Evaluation on healthy controls (HC) and acute myocardial infarction patients (AMI) demonstrates superior segmentation dice scores (89.3/89.2) compared to state-of-the-art methods (87.7/87.9). T2 quantification yields strong linear correlations (Pearson coefficients: 0.84/0.93) with label values for HC/AMI, indicating accurate mapping. Radiologist evaluations confirm SQNet's superior image quality scores (4.60/4.58 for segmentation, 4.32/4.42 for T2 quantification) over state-of-the-art methods (4.50/4.44 for segmentation, 3.59/4.37 for T2 quantification). SQNet thus offers accurate simultaneous segmentation and quantification, enhancing cardiac disease diagnosis, such as AMI.
△ Less
Submitted 29 May, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
A First Look at Immersive Telepresence on Apple Vision Pro
Authors:
Ruizhi Cheng,
Nan Wu,
Matteo Varvello,
Eugene Chai,
Songqing Chen,
Bo Han
Abstract:
Due to the widespread adoption of "work-from-home" policies, videoconferencing applications (e.g., Zoom) have become indispensable for remote communication. However, they often lack immersiveness, leading to the so-called "Zoom fatigue" and degrading communication efficiency. The recent debut of Apple Vision Pro, a mobile headset that supports "spatial persona", aims to offer an immersive telepres…
▽ More
Due to the widespread adoption of "work-from-home" policies, videoconferencing applications (e.g., Zoom) have become indispensable for remote communication. However, they often lack immersiveness, leading to the so-called "Zoom fatigue" and degrading communication efficiency. The recent debut of Apple Vision Pro, a mobile headset that supports "spatial persona", aims to offer an immersive telepresence experience. In this paper, we conduct a first-of-its-kind in-depth and empirical study to analyze the performance of immersive telepresence with Apple FaceTime, Cisco Webex, Microsoft Teams, and Zoom on Vision Pro. We find that only FaceTime provides a truly immersive experience with spatial personas, whereas others still operate 2D personas. Our measurement results reveal that (1) FaceTime delivers semantic data to optimize bandwidth consumption, which is even lower than that of 2D persona for other applications, and (2) it employs visibility-aware optimizations to reduce rendering overhead. However, the scalability of FaceTime remains limited, with a simple server-allocation strategy that potentially leads to high network delay for users.
△ Less
Submitted 11 September, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
From Optimization to Generalization: Fair Federated Learning against Quality Shift via Inter-Client Sharpness Matching
Authors:
Nannan Wu,
Zhuo Kuang,
Zengqiang Yan,
Li Yu
Abstract:
Due to escalating privacy concerns, federated learning has been recognized as a vital approach for training deep neural networks with decentralized medical data. In practice, it is challenging to ensure consistent imaging quality across various institutions, often attributed to equipment malfunctions affecting a minority of clients. This imbalance in image quality can cause the federated model to…
▽ More
Due to escalating privacy concerns, federated learning has been recognized as a vital approach for training deep neural networks with decentralized medical data. In practice, it is challenging to ensure consistent imaging quality across various institutions, often attributed to equipment malfunctions affecting a minority of clients. This imbalance in image quality can cause the federated model to develop an inherent bias towards higher-quality images, thus posing a severe fairness issue. In this study, we pioneer the identification and formulation of this new fairness challenge within the context of the imaging quality shift. Traditional methods for promoting fairness in federated learning predominantly focus on balancing empirical risks across diverse client distributions. This strategy primarily facilitates fair optimization across different training data distributions, yet neglects the crucial aspect of generalization. To address this, we introduce a solution termed Federated learning with Inter-client Sharpness Matching (FedISM). FedISM enhances both local training and global aggregation by incorporating sharpness-awareness, aiming to harmonize the sharpness levels across clients for fair generalization. Our empirical evaluations, conducted using the widely-used ICH and ISIC 2019 datasets, establish FedISM's superiority over current state-of-the-art federated learning methods in promoting fairness. Code is available at https://github.com/wnn2000/FFL4MIA.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Maximal Procurement under a Budget
Authors:
Nicole Immorlica,
Nicholas Wu,
Brendan Lucier
Abstract:
We study the problem of a principal who wants to influence an agent's observable action, subject to an ex-post budget. The agent has a private type determining their cost function. This paper endogenizes the value of the resource driving incentives, which holds no inherent value but is restricted by finite availability. We characterize the optimal mechanism, showing the emergence of a pooling regi…
▽ More
We study the problem of a principal who wants to influence an agent's observable action, subject to an ex-post budget. The agent has a private type determining their cost function. This paper endogenizes the value of the resource driving incentives, which holds no inherent value but is restricted by finite availability. We characterize the optimal mechanism, showing the emergence of a pooling region where the budget constraint binds for low-cost types. We then introduce a linear value for the transferable resource; as the principal's value increases, the mechanism demands more from agents with binding budget constraint but less from others.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models
Authors:
Yasi Zhang,
Peiyu Yu,
Ying Nian Wu
Abstract:
Text-to-image diffusion models have shown great success in generating high-quality text-guided images. Yet, these models may still fail to semantically align generated images with the provided text prompts, leading to problems like incorrect attribute binding and/or catastrophic object neglect. Given the pervasive object-oriented structure underlying text prompts, we introduce a novel object-condi…
▽ More
Text-to-image diffusion models have shown great success in generating high-quality text-guided images. Yet, these models may still fail to semantically align generated images with the provided text prompts, leading to problems like incorrect attribute binding and/or catastrophic object neglect. Given the pervasive object-oriented structure underlying text prompts, we introduce a novel object-conditioned Energy-Based Attention Map Alignment (EBAMA) method to address the aforementioned problems. We show that an object-centric attribute binding loss naturally emerges by approximately maximizing the log-likelihood of a $z$-parameterized energy-based model with the help of the negative sampling technique. We further propose an object-centric intensity regularizer to prevent excessive shifts of objects attention towards their attributes. Extensive qualitative and quantitative experiments, including human evaluation, on several challenging benchmarks demonstrate the superior performance of our method over previous strong counterparts. With better aligned attention maps, our approach shows great promise in further enhancing the text-controlled image editing ability of diffusion models.
△ Less
Submitted 1 October, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Machine-learning-inspired quantum control in many-body dynamics
Authors:
Meng-Yun Mao,
Zheng Cheng,
Liangsheng Li,
Ning Wu,
Wen-Long You
Abstract:
Achieving precise preparation of quantum many-body states is crucial for the practical implementation of quantum computation and quantum simulation. However, the inherent challenges posed by unavoidable excitations at critical points during quench processes necessitate careful design of control fields. In this work, we introduce a promising and versatile dynamic control neural network tailored to…
▽ More
Achieving precise preparation of quantum many-body states is crucial for the practical implementation of quantum computation and quantum simulation. However, the inherent challenges posed by unavoidable excitations at critical points during quench processes necessitate careful design of control fields. In this work, we introduce a promising and versatile dynamic control neural network tailored to optimize control fields. We address the problem of suppressing defect density and enhancing cat-state fidelity during the passage across the critical point in the quantum Ising model. Our method facilitates seamless transitions between different objective functions by adjusting the {optimization strategy}. In comparison to gradient-based power-law quench methods, our approach demonstrates significant advantages for both small system sizes and long-term evolutions. We provide a detailed analysis of the specific forms of control fields and summarize common features for experimental implementation. Furthermore, numerical simulations demonstrate the robustness of our proposal against random noise and spin number fluctuations. The optimized defect density and cat-state fidelity exhibit a transition at a critical ratio of the quench duration to the system size, coinciding with the quantum speed limit for quantum evolution.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Adaptive Line-Of-Sight guidance law based on vector fields path following for underactuated unmanned surface vehicle
Authors:
Jie Qi,
Ronghua Wanga,
Nailong Wu
Abstract:
The focus of this paper is to develop a methodology that enables an unmanned surface vehicle (USV) to efficiently track a planned path. The introduction of a vector field-based adaptive line of-sight guidance law (VFALOS) for accurate trajectory tracking and minimizing the overshoot response time during USV tracking of curved paths improves the overall line-of-sight (LOS) guidance method. These im…
▽ More
The focus of this paper is to develop a methodology that enables an unmanned surface vehicle (USV) to efficiently track a planned path. The introduction of a vector field-based adaptive line of-sight guidance law (VFALOS) for accurate trajectory tracking and minimizing the overshoot response time during USV tracking of curved paths improves the overall line-of-sight (LOS) guidance method. These improvements contribute to faster convergence to the desired path, reduce oscillations, and can mitigate the effects of persistent external disturbances. It is shown that the proposed guidance law exhibits k-exponential stability when converging to the desired path consisting of straight and curved lines. The results in the paper show that the proposed method effectively improves the accuracy of the USV tracking the desired path while ensuring the safety of the USV work.
△ Less
Submitted 5 April, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
A Non-Terminating Game of Beggar-My-Neighbor
Authors:
Brayden Casella,
Philip M. Anderson,
Michael Kleber,
Richard P. Mann,
Reed Nessler,
William Rucklidge,
Samuel G. Williams,
Nicolas Wu
Abstract:
We demonstrate the existence of a non-terminating game of Beggar-My-Neighbor, discovered by lead author Brayden Casella. We detail the method for constructing this game and identify a cyclical structure of 62 tricks that is reached by 30 distinct starting hands. We further present a short history of the search for this solution since the problem was posed, and a record of previously found longest…
▽ More
We demonstrate the existence of a non-terminating game of Beggar-My-Neighbor, discovered by lead author Brayden Casella. We detail the method for constructing this game and identify a cyclical structure of 62 tricks that is reached by 30 distinct starting hands. We further present a short history of the search for this solution since the problem was posed, and a record of previously found longest terminating games. The existence of this non-terminating game provides a solution to a long-standing question which John H. Conway called an `anti-Hilbert problem.'
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
LLM3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning
Authors:
Shu Wang,
Muzhi Han,
Ziyuan Jiao,
Zeyu Zhang,
Ying Nian Wu,
Song-Chun Zhu,
Hangxin Liu
Abstract:
Conventional Task and Motion Planning (TAMP) approaches rely on manually crafted interfaces connecting symbolic task planning with continuous motion generation. These domain-specific and labor-intensive modules are limited in addressing emerging tasks in real-world settings. Here, we present LLM^3, a novel Large Language Model (LLM)-based TAMP framework featuring a domain-independent interface. Sp…
▽ More
Conventional Task and Motion Planning (TAMP) approaches rely on manually crafted interfaces connecting symbolic task planning with continuous motion generation. These domain-specific and labor-intensive modules are limited in addressing emerging tasks in real-world settings. Here, we present LLM^3, a novel Large Language Model (LLM)-based TAMP framework featuring a domain-independent interface. Specifically, we leverage the powerful reasoning and planning capabilities of pre-trained LLMs to propose symbolic action sequences and select continuous action parameters for motion planning. Crucially, LLM^3 incorporates motion planning feedback through prompting, allowing the LLM to iteratively refine its proposals by reasoning about motion failure. Consequently, LLM^3 interfaces between task planning and motion planning, alleviating the intricate design process of handling domain-specific messages between them. Through a series of simulations in a box-packing domain, we quantitatively demonstrate the effectiveness of LLM^3 in solving TAMP problems and the efficiency in selecting action parameters. Ablation studies underscore the significant contribution of motion failure reasoning to the success of LLM^3. Furthermore, we conduct qualitative experiments on a physical manipulator, demonstrating the practical applicability of our approach in real-world settings.
△ Less
Submitted 21 August, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.