-
Spatial Cascaded Clustering and Weighted Memory for Unsupervised Person Re-identification
Authors:
Jiahao Hong,
Jialong Zuo,
Chuchu Han,
Ruochen Zheng,
Ming Tian,
Changxin Gao,
Nong Sang
Abstract:
Recent unsupervised person re-identification (re-ID) methods achieve high performance by leveraging fine-grained local context. These methods are referred to as part-based methods. However, most part-based methods obtain local contexts through horizontal division, which suffer from misalignment due to various human poses. Additionally, the misalignment of semantic information in part features rest…
▽ More
Recent unsupervised person re-identification (re-ID) methods achieve high performance by leveraging fine-grained local context. These methods are referred to as part-based methods. However, most part-based methods obtain local contexts through horizontal division, which suffer from misalignment due to various human poses. Additionally, the misalignment of semantic information in part features restricts the use of metric learning, thus affecting the effectiveness of part-based methods. The two issues mentioned above result in the under-utilization of part features in part-based methods. We introduce the Spatial Cascaded Clustering and Weighted Memory (SCWM) method to address these challenges. SCWM aims to parse and align more accurate local contexts for different human body parts while allowing the memory module to balance hard example mining and noise suppression. Specifically, we first analyze the foreground omissions and spatial confusions issues in the previous method. Then, we propose foreground and space corrections to enhance the completeness and reasonableness of the human parsing results. Next, we introduce a weighted memory and utilize two weighting strategies. These strategies address hard sample mining for global features and enhance noise resistance for part features, which enables better utilization of both global and part features. Extensive experiments on Market-1501 and MSMT17 validate the proposed method's effectiveness over many state-of-the-art methods.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset
Authors:
Jiantao Qiu,
Haijun Lv,
Zhenjiang Jin,
Rui Wang,
Wenchang Ning,
Jia Yu,
ChaoBin Zhang,
Zhenxiang Li,
Pei Chu,
Yuan Qu,
Jin Shi,
Lindong Lu,
Runyu Peng,
Zhiyuan Zeng,
Huanze Tang,
Zhikai Lei,
Jiawei Hong,
Keyu Chen,
Zhaoye Fei,
Ruiliang Xu,
Wei Li,
Zhongying Tu,
Lin Dahua,
Yu Qiao,
Hang Yan
, et al. (1 additional authors not shown)
Abstract:
This paper presents WanJuan-CC, a safe and high-quality open-sourced English webtext dataset derived from Common Crawl data. The study addresses the challenges of constructing large-scale pre-training datasets for language models, which require vast amounts of high-quality data. A comprehensive process was designed to handle Common Crawl data, including extraction, heuristic rule filtering, fuzzy…
▽ More
This paper presents WanJuan-CC, a safe and high-quality open-sourced English webtext dataset derived from Common Crawl data. The study addresses the challenges of constructing large-scale pre-training datasets for language models, which require vast amounts of high-quality data. A comprehensive process was designed to handle Common Crawl data, including extraction, heuristic rule filtering, fuzzy deduplication, content safety filtering, and data quality filtering. From approximately 68 billion original English documents, we obtained 2.22T Tokens of safe data and selected 1.0T Tokens of high-quality data as part of WanJuan-CC. We have open-sourced 100B Tokens from this dataset. The paper also provides statistical information related to data quality, enabling users to select appropriate data according to their needs. To evaluate the quality and utility of the dataset, we trained 1B-parameter and 3B-parameter models using WanJuan-CC and another dataset, RefinedWeb. Results show that WanJuan-CC performs better on validation datasets and downstream tasks.
△ Less
Submitted 17 March, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
Online Ecological Gearshift Strategy via Neural Network with Soft-Argmax Operator
Authors:
Xi Luo,
Shiying Dong,
Jinlong Hong,
Bingzhao Gao,
Hong Chen
Abstract:
This paper presents a neural network optimizer with soft-argmax operator to achieve an ecological gearshift strategy in real-time. The strategy is reformulated as the mixed-integer model predictive control (MIMPC) problem to minimize energy consumption. Then the outer convexification is introduced to transform integer variables into relaxed binary controls. To approximate binary solutions properly…
▽ More
This paper presents a neural network optimizer with soft-argmax operator to achieve an ecological gearshift strategy in real-time. The strategy is reformulated as the mixed-integer model predictive control (MIMPC) problem to minimize energy consumption. Then the outer convexification is introduced to transform integer variables into relaxed binary controls. To approximate binary solutions properly within training, the soft-argmax operator is applied to the neural network with the fact that all the operations of this scheme are differentiable. Moreover, this operator can help push the relaxed binary variables close to 0 or 1. To evaluate the strategy effect, we deployed it to a 2-speed electric vehicle (EV). In contrast to the mature solver Bonmin, our proposed method not only achieves similar energy-saving effects but also significantly reduces the solution time to meet real-time requirements. This results in a notable energy savings of 6.02% compared to the rule-based method.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Local Manipulation of Skyrmion Lattice in Fe3GaTe2 at Room Temperature
Authors:
Shuaizhao Jin,
Zhan Wang,
Shouzhe Dong,
Yiting Wang,
Kun Han,
Guangcheng Wang,
Zunyi Deng,
Xingan Jiang,
Ying Zhang,
Houbing Huang,
Jiawang Hong,
Xiaolei Wang,
Tianlong Xia,
Sang-Wook Cheong,
Xueyun Wang
Abstract:
Motivated by advances in spintronic devices, an extensive exploration is underway to uncover materials that host topologically protected spin textures, exemplified by skyrmions. One critical challenge involved in the potential application of skyrmions in van der Waals (vdW) materials is the attainment and manipulation of skyrmions at room temperature. In this study, we report the creation of intri…
▽ More
Motivated by advances in spintronic devices, an extensive exploration is underway to uncover materials that host topologically protected spin textures, exemplified by skyrmions. One critical challenge involved in the potential application of skyrmions in van der Waals (vdW) materials is the attainment and manipulation of skyrmions at room temperature. In this study, we report the creation of intrinsic skyrmion state in van der Waals ferromagnet Fe3GaTe2. By employing variable temperature magnetic force microscopy, the skyrmion lattice can be locally manipulated on Fe3GaTe2 flake. The ordering of skyrmion state is further analyzed. Our result suggest Fe3GaTe2 emerges as a highly promising contender for the realization of skyrmion-based layered spintronic memory devices.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Developing an Automated Detection, Tracking and Analysis Method for Solar Filaments Observed by CHASE via Machine Learning
Authors:
Z. Zheng,
Q. Hao,
Y. Qiu,
J. Hong,
C. Li,
M. D. Ding
Abstract:
Studies on the dynamics of solar filaments have significant implications for understanding their formation, evolution, and eruption, which are of great importance for space weather warning and forecasting. The H$α$ Imaging Spectrograph (HIS) onboard the recently launched Chinese H$α$ Solar Explorer (CHASE) can provide full-disk solar H$α$ spectroscopic observations, which bring us an opportunity t…
▽ More
Studies on the dynamics of solar filaments have significant implications for understanding their formation, evolution, and eruption, which are of great importance for space weather warning and forecasting. The H$α$ Imaging Spectrograph (HIS) onboard the recently launched Chinese H$α$ Solar Explorer (CHASE) can provide full-disk solar H$α$ spectroscopic observations, which bring us an opportunity to systematically explore and analyze the plasma dynamics of filaments. The dramatically increased observation data require automate processing and analysis which are impossible if dealt with manually. In this paper, we utilize the U-Net model to identify filaments and implement the Channel and Spatial Reliability Tracking (CSRT) algorithm for automated filament tracking. In addition, we use the cloud model to invert the line-of-sight velocity of filaments and employ the graph theory algorithm to extract the filament spine, which can advance our understanding of the dynamics of filaments. The favorable test performance confirms the validity of our method, which will be implemented in the following statistical analyses of filament features and dynamics of CHASE/HIS observations.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Fast Discrete-Event Simulation of Markovian Queueing Networks through Euler Approximation
Authors:
L. Jeff Hong,
Yingda Song,
Tan Wang
Abstract:
The efficient management of large-scale queueing networks is critical for a variety of sectors, including healthcare, logistics, and customer service, where system performance has profound implications for operational effectiveness and cost management. To address this key challenge, our paper introduces simulation techniques tailored for complex, large-scale Markovian queueing networks. We develop…
▽ More
The efficient management of large-scale queueing networks is critical for a variety of sectors, including healthcare, logistics, and customer service, where system performance has profound implications for operational effectiveness and cost management. To address this key challenge, our paper introduces simulation techniques tailored for complex, large-scale Markovian queueing networks. We develop two simulation schemes based on Euler approximation, namely the backward and forward schemes. These schemes can accommodate time-varying dynamics and are optimized for efficient implementation using vectorization. Assuming a feedforward queueing network structure, we establish that the two schemes provide stochastic upper and lower bounds for the system state, while the approximation error remains bounded over the simulation horizon. With the recommended choice of time step, we show that our approximation schemes exhibit diminishing asymptotic relative error as the system scales up, while maintaining much lower computational complexity compared to traditional discrete-event simulation and achieving speedups up to tens of thousands times. This study highlights the substantial potential of Euler approximation in simulating large-scale discrete systems.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Reliable long timescale decision-directed channel estimation for OFDM system
Authors:
Xun Wang,
Xin Xie,
Cunqing Hua,
Jianan Hong,
Pengwenlong Gu
Abstract:
Decision-directed channel estimation (DDCE) is one kind of blind channel estimation method that tracks the channel blindly by an iterative algorithm without relying on the pilots, which can increase the utilization of wireless resource. However, one major problem of DDCE is the performance degradation caused by error accumulation during the tracking process. In this paper, we propose an reliable D…
▽ More
Decision-directed channel estimation (DDCE) is one kind of blind channel estimation method that tracks the channel blindly by an iterative algorithm without relying on the pilots, which can increase the utilization of wireless resource. However, one major problem of DDCE is the performance degradation caused by error accumulation during the tracking process. In this paper, we propose an reliable DDCE (RDDCE) scheme for an OFDM-based communication system in the time-varying deep fading environment. By combining the conventional DDCE and discrete Fourier transform (DFT) channel estimation method, the proposed RDDCE scheme selects the reliable estimated channels on the subcarriers which are less affected by deep fading, and then estimates the channel based on the selected subcarriers by an extended DFT channel estimation where the indices of selected subcarriers are not distributed evenly. Simulation results show that RRDCE can alleviate the performance degradation effectively, track the channel with high accuracy on a long time scale, and has good performance under time-varying and noisy channel conditions.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark
Authors:
Yihua Zhang,
Pingzhi Li,
Junyuan Hong,
Jiaxiang Li,
Yimeng Zhang,
Wenqing Zheng,
Pin-Yu Chen,
Jason D. Lee,
Wotao Yin,
Mingyi Hong,
Zhangyang Wang,
Sijia Liu,
Tianlong Chen
Abstract:
In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard. Yet, as LLMs grow {in size}, the substantial memory overhead from back-propagation (BP) for FO gradient computation presents a significant challenge. Addressing this issue is crucial, especially for applications…
▽ More
In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard. Yet, as LLMs grow {in size}, the substantial memory overhead from back-propagation (BP) for FO gradient computation presents a significant challenge. Addressing this issue is crucial, especially for applications like on-device training where memory efficiency is paramount. This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during LLM fine-tuning, building on the initial concept introduced by MeZO. Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques, through a comprehensive, first-of-its-kind benchmarking study across five LLM families (Roberta, OPT, LLaMA, Vicuna, Mistral), three task complexities, and five fine-tuning schemes. Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance. We further introduce novel enhancements to ZO optimization, including block-wise descent, hybrid training, and gradient sparsity. Our study offers a promising direction for achieving further memory-efficient LLM fine-tuning. Codes to reproduce all our experiments are at https://github.com/ZO-Bench/ZO-LLM .
△ Less
Submitted 27 May, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
Many-body localization properties of fully frustrated Heisenberg spin-1/2 ladder model with next-nearest-neighbor interaction
Authors:
Jiameng Hong,
Taotao Hu
Abstract:
Many-body localization (MBL) is an intriguing physical phenomenon that arises from the interplay of interaction and disorder, allowing quantum systems to prevent thermalization. In this study, we investigate the MBL properties of the fully frustrated Heisenberg spin-1/2 ladder model with next-nearest-neighbor hopping interaction along the leg direction and compare it with the Heisenberg spin-1/2 s…
▽ More
Many-body localization (MBL) is an intriguing physical phenomenon that arises from the interplay of interaction and disorder, allowing quantum systems to prevent thermalization. In this study, we investigate the MBL properties of the fully frustrated Heisenberg spin-1/2 ladder model with next-nearest-neighbor hopping interaction along the leg direction and compare it with the Heisenberg spin-1/2 single-chain model with next-nearest-neighbor hopping interaction. We explore the MBL transition using random matrix theory and study the characteristics of entanglement entropy and its variance. Our results show that for the single-chain model, the critical point $w _{1} \sim$ 7.5 $\pm$ 0.5, whereas for the frustrated ladder model, $w _{2} \sim$ 10.5 $\pm$ 0.5. Moreover, we observe the existence of a many-body mobility edge in the frustrated ladder model. We also investigate the dynamical properties of the frustrated ladder model and identify the logarithmic growth of entanglement entropy, high fidelity of initial information, and magnetic localization phenomenon in the localized phase. Finally, we explore the finite-size scaling of the two models. Our findings suggest that interpreting MBL transition as a continuous second-order phase transition yields a better scaling solution than the Kosterlitz-Thouless type transition for our two models, and this difference is more pronounced in the frustrated ladder model compared with the single-chain model.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Two-sided Loop Solar Jet Driven by the Eruption of a Small Filament in a Big Filament Channel
Authors:
Jiayan Yang,
Hechao Chen,
Junchao Hong,
Bo Yang,
Yi Bi
Abstract:
Similar to the cases of anemone jets, two-sided loop solar jets could also be produced by either flux emergence from the solar interior or small scale filament eruptions. Using the high-quality data from the Solar Dynamic Observatory (SDO), we analyzed a two-sided loop solar jet triggered by the eruption of a small filament in this paper. The jet was occurred in a pre-existing big filament channel…
▽ More
Similar to the cases of anemone jets, two-sided loop solar jets could also be produced by either flux emergence from the solar interior or small scale filament eruptions. Using the high-quality data from the Solar Dynamic Observatory (SDO), we analyzed a two-sided loop solar jet triggered by the eruption of a small filament in this paper. The jet was occurred in a pre-existing big filament channel. The detailed processes involved in the small filament eruption, the interaction between the erupted filament and the big filament channel, and the launch of the two-sided loop jet are presented. The observations further revealed notable asymmetry between the two branches of the jet spire, with the northeastern branch is narrow and short, while the southern branch is wide and long and accompanied by discernible untwisting motions. We explored the unique appearance of the jet by employing the local potential field extrapolation to calculate the coronal magnetic field configuration around the jet. The photospheric magnetic flux below the small filament underwent cancellation for approximately 7 hours before the filament eruption, and the negative flux near the southern foot-point of the filament decreased by about 56 percent during this interval. Therefore, we proposed that the primary photospheric driver of the filament eruption and the associated two-sided loop jet in this event is flux cancellation rather than flux emergence.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
H2O-SDF: Two-phase Learning for 3D Indoor Reconstruction using Object Surface Fields
Authors:
Minyoung Park,
Mirae Do,
YeonJae Shin,
Jaeseok Yoo,
Jongkwang Hong,
Joongrock Kim,
Chul Lee
Abstract:
Advanced techniques using Neural Radiance Fields (NeRF), Signed Distance Fields (SDF), and Occupancy Fields have recently emerged as solutions for 3D indoor scene reconstruction. We introduce a novel two-phase learning approach, H2O-SDF, that discriminates between object and non-object regions within indoor environments. This method achieves a nuanced balance, carefully preserving the geometric in…
▽ More
Advanced techniques using Neural Radiance Fields (NeRF), Signed Distance Fields (SDF), and Occupancy Fields have recently emerged as solutions for 3D indoor scene reconstruction. We introduce a novel two-phase learning approach, H2O-SDF, that discriminates between object and non-object regions within indoor environments. This method achieves a nuanced balance, carefully preserving the geometric integrity of room layouts while also capturing intricate surface details of specific objects. A cornerstone of our two-phase learning framework is the introduction of the Object Surface Field (OSF), a novel concept designed to mitigate the persistent vanishing gradient problem that has previously hindered the capture of high-frequency details in other methods. Our proposed approach is validated through several experiments that include ablation studies.
△ Less
Submitted 8 March, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Capturing Cancer as Music: Cancer Mechanisms Expressed through Musification
Authors:
Rostyslav Hnatyshyn,
Jiayi Hong,
Ross Maciejewski,
Christopher Norby,
Carlo C. Maley
Abstract:
The development of cancer is difficult to express on a simple and intuitive level due to its complexity. Since cancer is so widespread, raising public awareness about its mechanisms can help those affected cope with its realities, as well as inspire others to make lifestyle adjustments and screen for the disease. Unfortunately, studies have shown that cancer literature is too technical for the gen…
▽ More
The development of cancer is difficult to express on a simple and intuitive level due to its complexity. Since cancer is so widespread, raising public awareness about its mechanisms can help those affected cope with its realities, as well as inspire others to make lifestyle adjustments and screen for the disease. Unfortunately, studies have shown that cancer literature is too technical for the general public to understand. We found that musification, the process of turning data into music, remains an unexplored avenue for conveying this information. We explore the pedagogical effectiveness of musification through the use of an algorithm that manipulates a piece of music in a manner analogous to the development of cancer. We conducted two lab studies and found that our approach is marginally more effective at promoting cancer literacy when accompanied by a text-based article than text-based articles alone.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
Authors:
Huaiyuan Ying,
Shuo Zhang,
Linyang Li,
Zhejian Zhou,
Yunfan Shao,
Zhaoye Fei,
Yichuan Ma,
Jiawei Hong,
Kuikun Liu,
Ziyi Wang,
Yudong Wang,
Zijian Wu,
Shuaibin Li,
Fengzhe Zhou,
Hongwei Liu,
Songyang Zhang,
Wenwei Zhang,
Hang Yan,
Xipeng Qiu,
Jiayu Wang,
Kai Chen,
Dahua Lin
Abstract:
The math abilities of large language models can represent their abstract reasoning ability. In this paper, we introduce and open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format and supervise our model to be a versatil…
▽ More
The math abilities of large language models can represent their abstract reasoning ability. In this paper, we introduce and open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format and supervise our model to be a versatile math reasoner, verifier, prover, and augmenter. These abilities can be used to develop the next math LLMs or self-iteration. InternLM-Math obtains open-sourced state-of-the-art performance under the setting of in-context learning, supervised fine-tuning, and code-assisted reasoning in various informal and formal benchmarks including GSM8K, MATH, Hungary math exam, MathBench-ZH, and MiniF2F. Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning. We further explore how to use LEAN to solve math problems and study its performance under the setting of multi-task learning which shows the possibility of using LEAN as a unified platform for solving and proving in math. Our models, codes, and data are released at \url{https://github.com/InternLM/InternLM-Math}.
△ Less
Submitted 24 May, 2024; v1 submitted 9 February, 2024;
originally announced February 2024.
-
Optimal transport in the frame of abstract Lax-Oleinik operator revisited
Authors:
Wei Cheng,
Jiahui Hong,
Tianqi Shi
Abstract:
This is our first paper on the extension of our recent work on the Lax-Oleinik commutators and its applications to the intrinsic approach of propagation of singularities of the viscosity solutions of Hamilton-Jacobi equations. We reformulate Kantorovich-Rubinstein duality theorem in the theory of optimal transport in terms of abstract Lax-Oleinik operators, and analyze the relevant optimal transpo…
▽ More
This is our first paper on the extension of our recent work on the Lax-Oleinik commutators and its applications to the intrinsic approach of propagation of singularities of the viscosity solutions of Hamilton-Jacobi equations. We reformulate Kantorovich-Rubinstein duality theorem in the theory of optimal transport in terms of abstract Lax-Oleinik operators, and analyze the relevant optimal transport problem in the case the cost function $c(x,y)=h(t_1,t_2,x,y)$ is the fundamental solution of Hamilton-Jacobi equation. For further applications to the problem of cut locus and propagation of singularities in optimal transport, we introduce corresponding random Lax-Oleinik operators. We also study the problem of singularities for $c$-concave functions and its dynamical implication when $c$ is the fundamental solution with $t_2-t_1\ll1$ and $t_2-t_1<\infty$, and $c$ is the Peierls' barrier respectively.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Matcha: An IDE Plugin for Creating Accurate Privacy Nutrition Labels
Authors:
Tianshi Li,
Lorrie Faith Cranor,
Yuvraj Agarwal,
Jason I. Hong
Abstract:
Apple and Google introduced their versions of privacy nutrition labels to the mobile app stores to better inform users of the apps' data practices. However, these labels are self-reported by developers and have been found to contain many inaccuracies due to misunderstandings of the label taxonomy. In this work, we present Matcha, an IDE plugin that uses automated code analysis to help developers c…
▽ More
Apple and Google introduced their versions of privacy nutrition labels to the mobile app stores to better inform users of the apps' data practices. However, these labels are self-reported by developers and have been found to contain many inaccuracies due to misunderstandings of the label taxonomy. In this work, we present Matcha, an IDE plugin that uses automated code analysis to help developers create accurate Google Play data safety labels. Developers can benefit from Matcha's ability to detect user data accesses and transmissions while staying in control of the generated label by adding custom Java annotations and modifying an auto-generated XML specification. Our evaluation with 12 developers showed that Matcha helped our participants improved the accuracy of a label they created with Google's official tool for a real-world app they developed. We found that participants preferred Matcha for its accuracy benefits. Drawing on Matcha, we discuss general design recommendations for developer tools used to create accurate standardized privacy notices.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Long-time dynamics of stochastic wave equation with dissipative damping and its full discretization: exponential ergodicity and strong law of large numbers
Authors:
Meng Cai,
Chuchu Chen,
Jialin Hong,
Tau Zhou
Abstract:
For stochastic wave equation, when the dissipative damping is a non-globally Lipschitz function of the velocity, there are few results on the long-time dynamics, in particular, the exponential ergodicity and strong law of large numbers, for the equation and its numerical discretization to our knowledge. Focus on this issue, the main contributions of this paper are as follows. First, based on const…
▽ More
For stochastic wave equation, when the dissipative damping is a non-globally Lipschitz function of the velocity, there are few results on the long-time dynamics, in particular, the exponential ergodicity and strong law of large numbers, for the equation and its numerical discretization to our knowledge. Focus on this issue, the main contributions of this paper are as follows. First, based on constructing novel Lyapunov functionals, we show the unique invariant measure and exponential ergodicity of the underlying equation and its full discretization. Second, the error estimates of invariant measures both in Wasserstein distance and in the weak sense are obtained. Third, the strong laws of large numbers of the equation and the full discretization are obtained, which states that the time averages of the exact and numerical solutions are shown to converge to the ergodic limit almost surely.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
AlphaRank: An Artificial Intelligence Approach for Ranking and Selection Problems
Authors:
Ruihan Zhou,
L. Jeff Hong,
Yijie Peng
Abstract:
We introduce AlphaRank, an artificial intelligence approach to address the fixed-budget ranking and selection (R&S) problems. We formulate the sequential sampling decision as a Markov decision process and propose a Monte Carlo simulation-based rollout policy that utilizes classic R&S procedures as base policies for efficiently learning the value function of stochastic dynamic programming. We accel…
▽ More
We introduce AlphaRank, an artificial intelligence approach to address the fixed-budget ranking and selection (R&S) problems. We formulate the sequential sampling decision as a Markov decision process and propose a Monte Carlo simulation-based rollout policy that utilizes classic R&S procedures as base policies for efficiently learning the value function of stochastic dynamic programming. We accelerate online sample-allocation by using deep reinforcement learning to pre-train a neural network model offline based on a given prior. We also propose a parallelizable computing framework for large-scale problems, effectively combining "divide and conquer" and "recursion" for enhanced scalability and efficiency. Numerical experiments demonstrate that the performance of AlphaRank is significantly improved over the base policies, which could be attributed to AlphaRank's superior capability on the trade-off among mean, variance, and induced correlation overlooked by many existing policies.
△ Less
Submitted 31 January, 2024;
originally announced February 2024.
-
Accelerating Multilingual Language Model for Excessively Tokenized Languages
Authors:
Jimin Hong,
Gibbeum Lee,
Jaewoong Cho
Abstract:
Recent advancements in large language models (LLMs) have remarkably enhanced performances on a variety of tasks in multiple languages. However, tokenizers in LLMs trained primarily on English-centric corpora often overly fragment a text into character or Unicode-level tokens in non-Roman alphabetic languages, leading to inefficient text generation. We introduce a simple yet effective framework to…
▽ More
Recent advancements in large language models (LLMs) have remarkably enhanced performances on a variety of tasks in multiple languages. However, tokenizers in LLMs trained primarily on English-centric corpora often overly fragment a text into character or Unicode-level tokens in non-Roman alphabetic languages, leading to inefficient text generation. We introduce a simple yet effective framework to accelerate text generation in such languages. Our approach involves employing a new language model head with a vocabulary set tailored to a specific target language for a pre-trained LLM. This is followed by fine-tuning the new head while incorporating a verification step to ensure the model's performance is preserved. We show that this targeted fine-tuning, while freezing other model parameters, effectively reduces token fragmentation for the target language. Our extensive experiments demonstrate that the proposed framework increases the generation speed by a factor of 1.7 while maintaining the performance of pre-trained multilingual models on target monolingual tasks.
△ Less
Submitted 6 August, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
Monostatic imaging of an extended target with MCMC sampling
Authors:
Jiho Hong,
Sangwoo Kang,
Mikyoung Lim
Abstract:
We consider the imaging of a planar extended target from far-field data under a monostatic measurement configuration, in which the data is measured by a single moving transducer, as frequently encountered in practical application. In this paper, we develop a Bayesian approach to recover the shape of the extended target with MCMC sampling, where a new shape basis selection is proposed based on the…
▽ More
We consider the imaging of a planar extended target from far-field data under a monostatic measurement configuration, in which the data is measured by a single moving transducer, as frequently encountered in practical application. In this paper, we develop a Bayesian approach to recover the shape of the extended target with MCMC sampling, where a new shape basis selection is proposed based on the shape derivative analysis for the measurement data. In order to optimize the center and radius of the initial disk, we use the monostatic sampling method for the center and the explicit scattered field expression for disks for the radius. Numerical simulations are presented to validate the proposed method.
△ Less
Submitted 8 December, 2023;
originally announced January 2024.
-
Stochastic modelling of the instantaneous velocity profile in rough-wall turbulent boundary layers
Authors:
Roozbeh Ehsani,
Michael Heisel,
Jiaqi Li,
Vaughan Voller,
Jiarong Hong,
Michele Guala
Abstract:
The statistical properties of Uniform Momentum Zones (UMZs) are extracted from laboratory and field measurements in rough wall turbulent boundary layers to formulate a set of stochastic models for the simulation of instantaneous velocity profiles. A spatio-temporally resolved velocity dataset, covering a field of view of $8 \times 9$ m$^2$, was obtained in the atmospheric surface layer using super…
▽ More
The statistical properties of Uniform Momentum Zones (UMZs) are extracted from laboratory and field measurements in rough wall turbulent boundary layers to formulate a set of stochastic models for the simulation of instantaneous velocity profiles. A spatio-temporally resolved velocity dataset, covering a field of view of $8 \times 9$ m$^2$, was obtained in the atmospheric surface layer using super-large-scale particle image velocimetry (SLPIV), as part of the Grand-scale Atmospheric Imaging Apparatus (GAIA). Wind tunnel data from a previous study are included for comparison \citep{heisel2020mixing}. The probability density function of UMZ attributes such as their thickness, modal velocity, and averaged vertical velocity are built at varying elevations and modeled using log-normal and Gaussian distributions. Inverse transform sampling of the distributions is used to generate synthetic step-like velocity profiles that are spatially and temporally uncorrelated. Results show that in the wide range of wall-normal distances and $Re_τ$ up to $ \sim O(10^6)$ investigated here, shear velocity scaling is manifested in the velocity jump across shear interfaces between adjacent UMZs, and attached eddy behavior is observed in the linear proportionality between UMZ thickness and their wall normal location. These very same characteristics are recovered in the generated instantaneous profiles, using both a fully stochastic and a data-driven hybrid stochastic models, which address, in different ways, the coupling between modal velocities and UMZ thickness. Our method provides a stochastic approach for generating an ensemble of instantaneous velocity profiles, consistent with the structural organization of UMZs, where the ensemble reproduces the logarithmic mean velocity profile and recovers significant portions of the Reynolds stresses and thus of the streamwise and vertical velocity variability.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
A Survey of Designs for Combined 2D+3D Visual Representations
Authors:
Jiayi Hong,
Rostyslav Hnatyshyn,
Ebrar A. D. Santos,
Ross Maciejewski,
Tobias Isenberg
Abstract:
We examine visual representations of data that make use of combinations of both 2D and 3D data mappings. Combining 2D and 3D representations is a common technique that allows viewers to understand multiple facets of the data with which they are interacting. While 3D representations focus on the spatial character of the data or the dedicated 3D data mapping, 2D representations often show abstract d…
▽ More
We examine visual representations of data that make use of combinations of both 2D and 3D data mappings. Combining 2D and 3D representations is a common technique that allows viewers to understand multiple facets of the data with which they are interacting. While 3D representations focus on the spatial character of the data or the dedicated 3D data mapping, 2D representations often show abstract data properties and take advantage of the unique benefits of mapping to a plane. Many systems have used unique combinations of both types of data mappings effectively. Yet there are no systematic reviews of the methods in linking 2D and 3D representations. We systematically survey the relationships between 2D and 3D visual representations in major visualization publications -- IEEE VIS, IEEE TVCG, and EuroVis -- from 2012 to 2022. We closely examined 105 papers where 2D and 3D representations are connected visually, interactively, or through animation. These approaches are designed based on their visual environment, the relationships between their visual representations, and their possible layouts. Through our analysis, we introduce a design space as well as provide design guidelines for effectively linking 2D and 3D visual representations.
△ Less
Submitted 12 January, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
Flexomagnetoelectric effect in Sr2IrO4 thin films
Authors:
Xin Liu,
Ting Hu,
Yujun Zhang,
Xueli Xu,
Biao Wu,
Zongwei Ma,
Peng Lv,
Yuelin Zhang,
Shih-Wen Huang,
Jialu Wu,
Jing Ma,
Jiawang Hong,
Zhigao Sheng,
Chenglong Jia,
Erjun Kan,
Ce-Wen Nan,
Jinxing Zhang
Abstract:
Symmetry engineering is explicitly effective to manipulate and even create phases and orderings in strongly correlated materials. Flexural stress is universally practical to break the space-inversion or time-reversal symmetry. Here, by introducing strain gradient in a centrosymmetric antiferromagnet Sr2IrO4, the space-inversion symmetry is broken accompanying a non-equivalent O p-Ir d orbital hybr…
▽ More
Symmetry engineering is explicitly effective to manipulate and even create phases and orderings in strongly correlated materials. Flexural stress is universally practical to break the space-inversion or time-reversal symmetry. Here, by introducing strain gradient in a centrosymmetric antiferromagnet Sr2IrO4, the space-inversion symmetry is broken accompanying a non-equivalent O p-Ir d orbital hybridization along z axis. Thus, emergent polar phase and out-of-plane magnetic moment have been simultaneously observed in these asymmetric Sr2IrO4 thin films, which both are absent in its ground state. Furthermore, upon the application of magnetic field, such polarization can be controlled by modifying the occupied d orbitals through spin-orbit interaction, giving rise to a flexomagnetoelectric effect. This work provides a general strategy to artificially design multiple symmetries and ferroic orderings in strongly correlated systems.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
GTA: Guided Transfer of Spatial Attention from Object-Centric Representations
Authors:
SeokHyun Seo,
Jinwoo Hong,
JungWoo Chae,
Kyungyul Kim,
Sangheum Hwang
Abstract:
Utilizing well-trained representations in transfer learning often results in superior performance and faster convergence compared to training from scratch. However, even if such good representations are transferred, a model can easily overfit the limited training dataset and lose the valuable properties of the transferred representations. This phenomenon is more severe in ViT due to its low induct…
▽ More
Utilizing well-trained representations in transfer learning often results in superior performance and faster convergence compared to training from scratch. However, even if such good representations are transferred, a model can easily overfit the limited training dataset and lose the valuable properties of the transferred representations. This phenomenon is more severe in ViT due to its low inductive bias. Through experimental analysis using attention maps in ViT, we observe that the rich representations deteriorate when trained on a small dataset. Motivated by this finding, we propose a novel and simple regularization method for ViT called Guided Transfer of spatial Attention (GTA). Our proposed method regularizes the self-attention maps between the source and target models. A target model can fully exploit the knowledge related to object localization properties through this explicit regularization. Our experimental results show that the proposed GTA consistently improves the accuracy across five benchmark datasets especially when the number of training data is small.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Randomly Weighted Neuromodulation in Neural Networks Facilitates Learning of Manifolds Common Across Tasks
Authors:
Jinyung Hong,
Theodore P. Pavlic
Abstract:
Geometric Sensitive Hashing functions, a family of Local Sensitive Hashing functions, are neural network models that learn class-specific manifold geometry in supervised learning. However, given a set of supervised learning tasks, understanding the manifold geometries that can represent each task and the kinds of relationships between the tasks based on them has received little attention. We explo…
▽ More
Geometric Sensitive Hashing functions, a family of Local Sensitive Hashing functions, are neural network models that learn class-specific manifold geometry in supervised learning. However, given a set of supervised learning tasks, understanding the manifold geometries that can represent each task and the kinds of relationships between the tasks based on them has received little attention. We explore a formalization of this question by considering a generative process where each task is associated with a high-dimensional manifold, which can be done in brain-like models with neuromodulatory systems. Following this formulation, we define \emph{Task-specific Geometric Sensitive Hashing~(T-GSH)} and show that a randomly weighted neural network with a neuromodulation system can realize this function.
△ Less
Submitted 17 November, 2023;
originally announced January 2024.
-
A Split-and-Privatize Framework for Large Language Model Fine-Tuning
Authors:
Xicong Shen,
Yang Liu,
Huiqi Liu,
Jue Hong,
Bing Duan,
Zirui Huang,
Yunlong Mao,
Ye Wu,
Di Wu
Abstract:
Fine-tuning is a prominent technique to adapt a pre-trained language model to downstream scenarios. In parameter-efficient fine-tuning, only a small subset of modules are trained over the downstream datasets, while leaving the rest of the pre-trained model frozen to save computation resources. In recent years, a popular productization form arises as Model-as-a-Service (MaaS), in which vendors prov…
▽ More
Fine-tuning is a prominent technique to adapt a pre-trained language model to downstream scenarios. In parameter-efficient fine-tuning, only a small subset of modules are trained over the downstream datasets, while leaving the rest of the pre-trained model frozen to save computation resources. In recent years, a popular productization form arises as Model-as-a-Service (MaaS), in which vendors provide abundant pre-trained language models, server resources and core functions, and customers can fine-tune, deploy and invoke their customized model by accessing the one-stop MaaS with their own private dataset. In this paper, we identify the model and data privacy leakage risks in MaaS fine-tuning, and propose a Split-and-Privatize (SAP) framework, which manage to mitigate the privacy issues by adapting the existing split learning architecture. The proposed SAP framework is sufficiently investigated by experiments, and the results indicate that it can enhance the empirical privacy by 62% at the cost of 1% model performance degradation on the Stanford Sentiment Treebank dataset.
△ Less
Submitted 24 December, 2023;
originally announced December 2023.
-
Towards Better Visualizing the Decision Basis of Networks via Unfold and Conquer Attribution Guidance
Authors:
Jung-Ho Hong,
Woo-Jeoung Nam,
Kyu-Sung Jeon,
Seong-Whan Lee
Abstract:
Revealing the transparency of Deep Neural Networks (DNNs) has been widely studied to describe the decision mechanisms of network inner structures. In this paper, we propose a novel post-hoc framework, Unfold and Conquer Attribution Guidance (UCAG), which enhances the explainability of the network decision by spatially scrutinizing the input features with respect to the model confidence. Addressing…
▽ More
Revealing the transparency of Deep Neural Networks (DNNs) has been widely studied to describe the decision mechanisms of network inner structures. In this paper, we propose a novel post-hoc framework, Unfold and Conquer Attribution Guidance (UCAG), which enhances the explainability of the network decision by spatially scrutinizing the input features with respect to the model confidence. Addressing the phenomenon of missing detailed descriptions, UCAG sequentially complies with the confidence of slices of the image, leading to providing an abundant and clear interpretation. Therefore, it is possible to enhance the representation ability of explanation by preserving the detailed descriptions of assistant input features, which are commonly overwhelmed by the main meaningful regions. We conduct numerous evaluations to validate the performance in several metrics: i) deletion and insertion, ii) (energy-based) pointing games, and iii) positive and negative density maps. Experimental results, including qualitative comparisons, demonstrate that our method outperforms the existing methods with the nature of clear and detailed explanations and applicability.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Learning Subject-Aware Cropping by Outpainting Professional Photos
Authors:
James Hong,
Lu Yuan,
Michaël Gharbi,
Matthew Fisher,
Kayvon Fatahalian
Abstract:
How to frame (or crop) a photo often depends on the image subject and its context; e.g., a human portrait. Recent works have defined the subject-aware image cropping task as a nuanced and practical version of image cropping. We propose a weakly-supervised approach (GenCrop) to learn what makes a high-quality, subject-aware crop from professional stock images. Unlike supervised prior work, GenCrop…
▽ More
How to frame (or crop) a photo often depends on the image subject and its context; e.g., a human portrait. Recent works have defined the subject-aware image cropping task as a nuanced and practical version of image cropping. We propose a weakly-supervised approach (GenCrop) to learn what makes a high-quality, subject-aware crop from professional stock images. Unlike supervised prior work, GenCrop requires no new manual annotations beyond the existing stock image collection. The key challenge in learning from this data, however, is that the images are already cropped and we do not know what regions were removed. Our insight is to combine a library of stock images with a modern, pre-trained text-to-image diffusion model. The stock image collection provides diversity and its images serve as pseudo-labels for a good crop, while the text-image diffusion model is used to out-paint (i.e., outward inpainting) realistic uncropped images. Using this procedure, we are able to automatically generate a large dataset of cropped-uncropped training pairs to train a cropping model. Despite being weakly-supervised, GenCrop is competitive with state-of-the-art supervised methods and significantly better than comparable weakly-supervised baselines on quantitative and qualitative evaluation metrics.
△ Less
Submitted 4 April, 2024; v1 submitted 19 December, 2023;
originally announced December 2023.
-
Networking for the Metaverse: The Standardization Landscape
Authors:
Cedric Westphal,
Jungha Hong,
Shin-Gak Kang,
Leonardo Chiariglione,
Tianji Jiang
Abstract:
New applications are being supported by current and future networks. In particular, it is expected that Metaverse applications will be deployed in the near future, as 5G and 6G network provide sufficient bandwidth and sufficiently low latency to provide a satisfying end-user experience. However, networks still need to evolve to better support this type of application. We present here a basic taxon…
▽ More
New applications are being supported by current and future networks. In particular, it is expected that Metaverse applications will be deployed in the near future, as 5G and 6G network provide sufficient bandwidth and sufficiently low latency to provide a satisfying end-user experience. However, networks still need to evolve to better support this type of application. We present here a basic taxonomy of the metaverse, which allows to identify some of the networking requirements for such an application; we also provide an overview of the current state of balthe standardization efforts in different standardization organizations, including ITU-T, 3GPP, IETF and MPAI.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
A compendium of logarithmic corrections in AdS/CFT
Authors:
Nikolay Bobev,
Marina David,
Junho Hong,
Valentin Reys,
Xuao Zhang
Abstract:
We study the logarithmic corrections to various CFT partition functions in the context of the AdS$_4$/CFT$_3$ correspondence for theories arising on the worldvolume of M2-branes. We utilize four-dimensional gauged supergravity and heat kernel methods and present general expressions for the logarithmic corrections to the gravitational on-shell action and black hole entropy for a number of different…
▽ More
We study the logarithmic corrections to various CFT partition functions in the context of the AdS$_4$/CFT$_3$ correspondence for theories arising on the worldvolume of M2-branes. We utilize four-dimensional gauged supergravity and heat kernel methods and present general expressions for the logarithmic corrections to the gravitational on-shell action and black hole entropy for a number of different supergravity backgrounds. We outline several subtle features of these calculations and contrast them with a similar analysis of logarithmic corrections performed directly in the eleven-dimensional uplift of a given four-dimensional supergravity background. We find results consistent with AdS/CFT provided that the infinite sum over KK modes on the internal space is regularized in a specific manner. This analysis leads to an explicit expression for the logarithmic correction to the Bekenstein-Hawking entropy of large Kerr-Newmann and Reissner-Nordström black holes in AdS$_4$. Our results also have important implications for effective field theory coupled to gravity in AdS$_4$ and for the existence of scale-separated AdS$_4$ vacua in string theory, which come in the form of new constraints on the field content and mass spectrum of matter fields.
△ Less
Submitted 5 April, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
Hilbert Coefficients and Sally Modules: A Survey of Vasconcelos' Contributions
Authors:
Jooyoun Hong,
Susan Morey
Abstract:
This paper surveys and summarizes Wolmer Vasconcelos' results surrounding multiplicities, Hilbert coefficients, and their extensions. We particularly focus on Vasconcelos' results regarding multiplicities and Chern coefficients, and other invariants which they bound. The Sally module is an important instrument introduced by Vasconcelos for this study, which naturally relates Hilbert coefficients t…
▽ More
This paper surveys and summarizes Wolmer Vasconcelos' results surrounding multiplicities, Hilbert coefficients, and their extensions. We particularly focus on Vasconcelos' results regarding multiplicities and Chern coefficients, and other invariants which they bound. The Sally module is an important instrument introduced by Vasconcelos for this study, which naturally relates Hilbert coefficients to reduction numbers.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Neutral Editing Framework for Diffusion-based Video Editing
Authors:
Sunjae Yoon,
Gwanhyeong Koo,
Ji Woo Hong,
Chang D. Yoo
Abstract:
Text-conditioned image editing has succeeded in various types of editing based on a diffusion framework. Unfortunately, this success did not carry over to a video, which continues to be challenging. Existing video editing systems are still limited to rigid-type editing such as style transfer and object overlay. To this end, this paper proposes Neutral Editing (NeuEdit) framework to enable complex…
▽ More
Text-conditioned image editing has succeeded in various types of editing based on a diffusion framework. Unfortunately, this success did not carry over to a video, which continues to be challenging. Existing video editing systems are still limited to rigid-type editing such as style transfer and object overlay. To this end, this paper proposes Neutral Editing (NeuEdit) framework to enable complex non-rigid editing by changing the motion of a person/object in a video, which has never been attempted before. NeuEdit introduces a concept of `neutralization' that enhances a tuning-editing process of diffusion-based editing systems in a model-agnostic manner by leveraging input video and text without any other auxiliary aids (e.g., visual masks, video captions). Extensive experiments on numerous videos demonstrate adaptability and effectiveness of the NeuEdit framework. The website of our work is available here: https://neuedit.github.io
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
High Absorptivity Nanotextured Powders for Additive Manufacturing
Authors:
Ottman A. Tertuliano,
Philip J. DePond,
Andrew C. Lee,
Jiho Hong,
David Doan,
Luc Capaldi,
Mark Brongersma,
X. Wendy Gu,
Manyalibo J. Matthews,
Wei Cai,
Adrian J. Lew
Abstract:
The widespread application of metal additive manufacturing (AM) is limited by the ability to control the complex interactions between the energy source and the feedstock material. Here we develop a generalizable process to introduce nanoscale grooves to the surface of metal powders which increases the powder absorptivity by up to 70% during laser powder bed fusion. Absorptivity enhancements in cop…
▽ More
The widespread application of metal additive manufacturing (AM) is limited by the ability to control the complex interactions between the energy source and the feedstock material. Here we develop a generalizable process to introduce nanoscale grooves to the surface of metal powders which increases the powder absorptivity by up to 70% during laser powder bed fusion. Absorptivity enhancements in copper, copper-silver, and tungsten enables energy efficient manufacturing, with printing of pure copper at relative densities up to 92% using laser energy densities as low as 82 J/mm^3. Simulations show the enhanced powder absorptivity results from plasmon-enabled light concentration in nanoscale grooves combined with multiple scattering events. The approach taken here demonstrates a general method to enhance the absorptivity and printability of reflective and refractory metal powders by changing the surface morphology of the feedstock without altering its composition.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
FedGeo: Privacy-Preserving User Next Location Prediction with Federated Learning
Authors:
Chung Park,
Taekyoon Choi,
Taesan Kim,
Mincheol Cho,
Junui Hong,
Minsung Choi,
Jaegul Choo
Abstract:
A User Next Location Prediction (UNLP) task, which predicts the next location that a user will move to given his/her trajectory, is an indispensable task for a wide range of applications. Previous studies using large-scale trajectory datasets in a single server have achieved remarkable performance in UNLP task. However, in real-world applications, legal and ethical issues have been raised regardin…
▽ More
A User Next Location Prediction (UNLP) task, which predicts the next location that a user will move to given his/her trajectory, is an indispensable task for a wide range of applications. Previous studies using large-scale trajectory datasets in a single server have achieved remarkable performance in UNLP task. However, in real-world applications, legal and ethical issues have been raised regarding privacy concerns leading to restrictions against sharing human trajectory datasets to any other server. In response, Federated Learning (FL) has emerged to address the personal privacy issue by collaboratively training multiple clients (i.e., users) and then aggregating them. While previous studies employed FL for UNLP, they are still unable to achieve reliable performance because of the heterogeneity of clients' mobility. To tackle this problem, we propose the Federated Learning for Geographic Information (FedGeo), a FL framework specialized for UNLP, which alleviates the heterogeneity of clients' mobility and guarantees personal privacy protection. Firstly, we incorporate prior global geographic adjacency information to the local client model, since the spatial correlation between locations is trained partially in each client who has only a heterogeneous subset of the overall trajectories in FL. We also introduce a novel aggregation method that minimizes the gap between client models to solve the problem of client drift caused by differences between client models when learning with their heterogeneous data. Lastly, we probabilistically exclude clients with extremely heterogeneous data from the FL process by focusing on clients who visit relatively diverse locations. We show that FedGeo is superior to other FL methods for model performance in UNLP task. We also validated our model in a real-world application using our own customers' mobile phones and the FL agent system.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer
Authors:
Junyuan Hong,
Jiachen T. Wang,
Chenhui Zhang,
Zhangheng Li,
Bo Li,
Zhangyang Wang
Abstract:
Large Language Models (LLMs) have emerged as dominant tools for various tasks, particularly when tailored for a specific target by prompt tuning. Nevertheless, concerns surrounding data privacy present obstacles due to the tuned prompts' dependency on sensitive private information. A practical solution is to host a local LLM and optimize a soft prompt privately using data. Yet, hosting a local mod…
▽ More
Large Language Models (LLMs) have emerged as dominant tools for various tasks, particularly when tailored for a specific target by prompt tuning. Nevertheless, concerns surrounding data privacy present obstacles due to the tuned prompts' dependency on sensitive private information. A practical solution is to host a local LLM and optimize a soft prompt privately using data. Yet, hosting a local model becomes problematic when model ownership is protected. Alternative methods, like sending data to the model's provider for training, intensify these privacy issues facing an untrusted provider. In this paper, we present a novel solution called Differentially-Private Offsite Prompt Tuning (DP-OPT) to address this challenge. Our approach involves tuning a discrete prompt on the client side and then applying it to the desired cloud models. We demonstrate that prompts suggested by LLMs themselves can be transferred without compromising performance significantly. To ensure that the prompts do not leak private information, we introduce the first private prompt generation mechanism, by a differentially-private (DP) ensemble of in-context learning with private demonstrations. With DP-OPT, generating privacy-preserving prompts by Vicuna-7b can yield competitive performance compared to non-private in-context learning on GPT3.5 or local private prompt tuning. Codes are available at https://github.com/VITA-Group/DP-OPT .
△ Less
Submitted 17 March, 2024; v1 submitted 26 November, 2023;
originally announced December 2023.
-
Who Leaked the Model? Tracking IP Infringers in Accountable Federated Learning
Authors:
Shuyang Yu,
Junyuan Hong,
Yi Zeng,
Fei Wang,
Ruoxi Jia,
Jiayu Zhou
Abstract:
Federated learning (FL) emerges as an effective collaborative learning framework to coordinate data and computation resources from massive and distributed clients in training. Such collaboration results in non-trivial intellectual property (IP) represented by the model parameters that should be protected and shared by the whole party rather than an individual user. Meanwhile, the distributed natur…
▽ More
Federated learning (FL) emerges as an effective collaborative learning framework to coordinate data and computation resources from massive and distributed clients in training. Such collaboration results in non-trivial intellectual property (IP) represented by the model parameters that should be protected and shared by the whole party rather than an individual user. Meanwhile, the distributed nature of FL endorses a malicious client the convenience to compromise IP through illegal model leakage to unauthorized third parties. To block such IP leakage, it is essential to make the IP identifiable in the shared model and locate the anonymous infringer who first leaks it. The collective challenges call for \emph{accountable federated learning}, which requires verifiable ownership of the model and is capable of revealing the infringer's identity upon leakage. In this paper, we propose Decodable Unique Watermarking (DUW) for complying with the requirements of accountable FL. Specifically, before a global model is sent to a client in an FL round, DUW encodes a client-unique key into the model by leveraging a backdoor-based watermark injection. To identify the infringer of a leaked model, DUW examines the model and checks if the triggers can be decoded as the corresponding keys. Extensive empirical results show that DUW is highly effective and robust, achieving over $99\%$ watermark success rate for Digits, CIFAR-10, and CIFAR-100 datasets under heterogeneous FL settings, and identifying the IP infringer with $100\%$ accuracy even after common watermark removal attempts.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Deep-learning-driven end-to-end metalens imaging
Authors:
Joonhyuk Seo,
Jaegang Jo,
Joohoon Kim,
Joonho Kang,
Chanik Kang,
Seongwon Moon,
Eunji Lee,
Jehyeong Hong,
Junsuk Rho,
Haejun Chung
Abstract:
Recent advances in metasurface lenses (metalenses) have shown great potential for opening a new era in compact imaging, photography, light detection and ranging (LiDAR), and virtual reality/augmented reality (VR/AR) applications. However, the fundamental trade-off between broadband focusing efficiency and operating bandwidth limits the performance of broadband metalenses, resulting in chromatic ab…
▽ More
Recent advances in metasurface lenses (metalenses) have shown great potential for opening a new era in compact imaging, photography, light detection and ranging (LiDAR), and virtual reality/augmented reality (VR/AR) applications. However, the fundamental trade-off between broadband focusing efficiency and operating bandwidth limits the performance of broadband metalenses, resulting in chromatic aberration, angular aberration, and a relatively low efficiency. In this study, a deep-learning-based image restoration framework is proposed to overcome these limitations and realize end-to-end metalens imaging, thereby achieving aberration-free full-color imaging for mass-produced metalenses with 10-mm diameter. Neural-network-assisted metalens imaging achieved a high resolution comparable to that of the ground truth image.
△ Less
Submitted 10 May, 2024; v1 submitted 5 December, 2023;
originally announced December 2023.
-
Visualization and Characterization of Agricultural Sprays Using Machine Learning based Digital Inline Holography
Authors:
Shyam Kumar M,
Christopher J. Hogan,
Steven A. Fredericks,
Jiarong Hong
Abstract:
Accurate characterization of agricultural sprays is crucial to predict in field performance of liquid applied crop protection products. Here we introduce a robust and efficient machine learning (ML) based Digital In-line Holography (DIH) to accurately characterize the droplet field for a wide range of agricultural spray nozzles. Compared to non-ML methods, our method enhances accuracy, generalizab…
▽ More
Accurate characterization of agricultural sprays is crucial to predict in field performance of liquid applied crop protection products. Here we introduce a robust and efficient machine learning (ML) based Digital In-line Holography (DIH) to accurately characterize the droplet field for a wide range of agricultural spray nozzles. Compared to non-ML methods, our method enhances accuracy, generalizability, and processing speed. Our approach employs two neural networks: a modified U-Net to obtain the 3D droplet field from the numerically reconstructed optical field, followed by a VGG16 classifier to reduce false positives from the U-Net prediction. The modified U-Net is trained using holograms generated using a single spray nozzle at three spray locations; center, half-span, and the spray edge to create training data with various number densities and droplet size ranges. VGG16 is trained via the minimum intensity projection of the droplet 3D point spread function. Data augmentation is used to increase the efficiency of classification and make the algorithm generalizable for different measurement settings. The model is validated via NIST traceable glass beads and six agricultural spray nozzles representing various spray characteristics. The results demonstrate a high accuracy rate, with over 90% droplet extraction and less than 5% false positives. Compared to traditional spray measurement techniques, our method offers a significant leap forward in spatial resolution and generalizability. In particular, our method can extract the real cumulative volume distribution of the NIST beads, where the laser diffraction is biased towards droplets moving at slower speeds. Additionally, the ML-based DIH enables the estimation of mass and momentum flux at different locations and the calculation of relative velocities of droplet pairs, which are difficult to obtain via conventional techniques.
△ Less
Submitted 13 November, 2023;
originally announced December 2023.
-
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way
Authors:
Kai Lv,
Shuo Zhang,
Tianle Gu,
Shuhao Xing,
Jiawei Hong,
Keyu Chen,
Xiaoran Liu,
Yuqing Yang,
Honglin Guo,
Tengxiao Liu,
Yu Sun,
Qipeng Guo,
Hang Yan,
Xipeng Qiu
Abstract:
Large language models (LLMs) are increasingly pivotal in a wide range of natural language processing tasks. Access to pre-trained models, courtesy of the open-source community, has made it possible to adapt these models to specific applications for enhanced performance. However, the substantial resources required for training these models necessitate efficient solutions. This paper introduces CoLL…
▽ More
Large language models (LLMs) are increasingly pivotal in a wide range of natural language processing tasks. Access to pre-trained models, courtesy of the open-source community, has made it possible to adapt these models to specific applications for enhanced performance. However, the substantial resources required for training these models necessitate efficient solutions. This paper introduces CoLLiE, an efficient library that facilitates collaborative training of large language models using 3D parallelism, parameter-efficient fine-tuning (PEFT) methods, and optimizers such as Lion, Adan, Sophia, LOMO and AdaLomo. With its modular design and comprehensive functionality, CoLLiE offers a balanced blend of efficiency, ease of use, and customization. CoLLiE has proven superior training efficiency in comparison with prevalent solutions in pre-training and fine-tuning scenarios. Furthermore, we provide an empirical evaluation of the correlation between model size and GPU memory consumption under different optimization methods, as well as an analysis of the throughput. Lastly, we carry out a comprehensive comparison of various optimizers and PEFT methods within the instruction-tuning context. CoLLiE is available at https://github.com/OpenLMLab/collie.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Authors:
Marwa Abdulhai,
Isadora White,
Charlie Snell,
Charles Sun,
Joey Hong,
Yuexiang Zhai,
Kelvin Xu,
Sergey Levine
Abstract:
Large language models (LLMs) provide excellent text-generation capabilities, but standard prompting and generation methods generally do not lead to intentional or goal-directed agents and might necessitate considerable prompt tuning. This becomes particularly apparent in multi-turn conversations: even the best current LLMs rarely ask clarifying questions, engage in explicit information gathering,…
▽ More
Large language models (LLMs) provide excellent text-generation capabilities, but standard prompting and generation methods generally do not lead to intentional or goal-directed agents and might necessitate considerable prompt tuning. This becomes particularly apparent in multi-turn conversations: even the best current LLMs rarely ask clarifying questions, engage in explicit information gathering, or take actions now that lead to better decisions after multiple turns. Reinforcement learning has the potential to leverage the powerful modeling capabilities of LLMs, as well as their internal representation of textual interactions, to create capable goal-directed language agents. This can enable intentional and temporally extended interactions, such as with humans, through coordinated persuasion and carefully crafted questions, or in goal-directed play through text games to bring about desired final outcomes. However, enabling this requires the community to develop stable and reliable reinforcement learning algorithms that can effectively train LLMs. Developing such algorithms requires tasks that can gauge progress on algorithm design, provide accessible and reproducible evaluations for multi-turn interactions, and cover a range of task properties and challenges in improving reinforcement learning algorithms. Our paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for LLMs, together with an open-source research framework containing a basic toolkit for getting started on multi-turn RL with offline value-based and policy-based RL methods. Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Learning to Simulate: Generative Metamodeling via Quantile Regression
Authors:
L. Jeff Hong,
Yanxi Hou,
Qingkai Zhang,
Xiaowei Zhang
Abstract:
Stochastic simulation models, while effective in capturing the dynamics of complex systems, are often too slow to run for real-time decision-making. Metamodeling techniques are widely used to learn the relationship between a summary statistic of the outputs (e.g., the mean or quantile) and the inputs of the simulator, so that it can be used in real time. However, this methodology requires the know…
▽ More
Stochastic simulation models, while effective in capturing the dynamics of complex systems, are often too slow to run for real-time decision-making. Metamodeling techniques are widely used to learn the relationship between a summary statistic of the outputs (e.g., the mean or quantile) and the inputs of the simulator, so that it can be used in real time. However, this methodology requires the knowledge of an appropriate summary statistic in advance, making it inflexible for many practical situations. In this paper, we propose a new metamodeling concept, called generative metamodeling, which aims to construct a "fast simulator of the simulator". This technique can generate random outputs substantially faster than the original simulation model, while retaining an approximately equal conditional distribution given the same inputs. Once constructed, a generative metamodel can instantaneously generate a large amount of random outputs as soon as the inputs are specified, thereby facilitating the immediate computation of any summary statistic for real-time decision-making. Furthermore, we propose a new algorithm -- quantile-regression-based generative metamodeling (QRGMM) -- and study its convergence and rate of convergence. Extensive numerical experiments are conducted to investigate the empirical performance of QRGMM, compare it with other state-of-the-art generative algorithms, and demonstrate its usefulness in practical real-time decision-making.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Three-dimensional internal flow evolution of an evaporating droplet and its role in particle deposition pattern
Authors:
Jiaqi Li,
Jiarong Hong
Abstract:
The internal flow within an evaporating sessile droplet is one of the driving mechanisms that lead to the variety of particle deposition patterns seen in applications such as inkjet printing, surface patterning, and blood stain analysis. Despite decades of research, the causal link between droplet internal flow and particle deposition patterns has not been fully established. In this study, we empl…
▽ More
The internal flow within an evaporating sessile droplet is one of the driving mechanisms that lead to the variety of particle deposition patterns seen in applications such as inkjet printing, surface patterning, and blood stain analysis. Despite decades of research, the causal link between droplet internal flow and particle deposition patterns has not been fully established. In this study, we employ a 3D imaging technique based on digital inline holography to quantitatively assess the evolution of internal flow fields and particle migration in three distinct types of wetting droplets: water, sucrose aqueous solution, and SDS aqueous solution droplets, throughout their entire evaporation process. Our imaging reveals the three-stage evolution of the 3D internal flow regimes driven by changes in the relative importance of capillary flow, Marangoni flow, and droplet boundary movement during evaporation, each exhibiting unique dynamics. The migration of particles from their initial locations to deposition can be divided into five categories, with particles depositing either at the contact line or inside the droplet. We observe the changing migration directions of particles due to competing Marangoni and capillary flows during droplet evaporation. We further develop an analytical model that predicts the droplet internal flow and deposition patterns and determines the dependence of the deposition mechanisms of particles on their initial locations and the evolving internal flow field. The model, validated using different types of droplets from our experiment and the literature, can be further expanded to other Newtonian and non-Newtonian droplets, which can potentially serve as a real-time assessment tool for particle deposition in various applications.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Stellar Loci. VII. Photometric Metallicities of 5 Million FGK Stars Based on GALEX GR6+7 AIS and Gaia EDR3
Authors:
Xue Lu,
Haibo Yuan,
Shuai Xu,
Ruoyi Zhang,
Kai Xiao,
Yang Huang,
Timothy C. Beers,
Jihye Hong
Abstract:
We combine photometric data from GALEX GR6+7 AIS and Gaia EDR3 with stellar parameters from the SAGA and PASTEL catalogs to construct high-quality training samples for dwarfs ($\rm 0.4< BP-RP<1.6$) and giants ($\rm 0.6< BP-RP <1.6$). We apply careful reddening corrections using empirical temperature- and extinction-dependent extinction coefficients. Using the two samples, we establish a relationsh…
▽ More
We combine photometric data from GALEX GR6+7 AIS and Gaia EDR3 with stellar parameters from the SAGA and PASTEL catalogs to construct high-quality training samples for dwarfs ($\rm 0.4< BP-RP<1.6$) and giants ($\rm 0.6< BP-RP <1.6$). We apply careful reddening corrections using empirical temperature- and extinction-dependent extinction coefficients. Using the two samples, we establish a relationship between stellar loci (NUV$-$BP vs. BP$-$RP colors), metallicity, and $\rm M_G$. For a given BP$-$RP color, a 1 dex change in [Fe/H] corresponds to an approximately 1 magnitude change in NUV$-$BP color for solar-type stars. These relationships are employed to estimate metallicities based on NUV$-$BP, BP$-$RP, and $\rm M_G$. Thanks to the strong metallicity dependence in the GALEX NUV-band, our models enable a typical photometric-metallicity precision of approximately $σ_{\rm [Fe/H]}$ = 0.11 dex for dwarfs and $σ_{\rm [Fe/H]}$ = 0.17 dex for giants, with an effective metallicity range extending down to [Fe/H] $= -3.0$ for dwarfs and [Fe/H] $= -4.0$ for giants. We also find that the NUV-band based photometric-metallicity estimate is not as strongly affected by carbon enhancement as previous photometric techniques. With the Gaia and GALEX data, we have estimated metallicities for about 5 million stars across almost the entire sky, including approximately 4.5 million dwarfs and 0.5 million giants. This work demonstrates the potential of the NUV-band for estimating photometric metallicities, and sets the groundwork for utilizing the NUV data from space telescopes such as the upcoming Chinese Space Station Telescope.
△ Less
Submitted 11 January, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
Exceptional times for the instantaneous propagation of superprocess
Authors:
Jieliang Hong,
Leonid Mytnik
Abstract:
For a Dawson-Watanabe superprocess $X$ on $\mathbb{R}^d$, it is shown in Perkins (1990) that if the underlying spatial motion belongs to a certain class of Lévy processes that admit jumps, then with probability one the closed support of $X_t$ is the whole space for almost all $t>0$ before extinction, the so-called ``instantaneous propagation'' property. In this paper for superprocesses on…
▽ More
For a Dawson-Watanabe superprocess $X$ on $\mathbb{R}^d$, it is shown in Perkins (1990) that if the underlying spatial motion belongs to a certain class of Lévy processes that admit jumps, then with probability one the closed support of $X_t$ is the whole space for almost all $t>0$ before extinction, the so-called ``instantaneous propagation'' property. In this paper for superprocesses on $\mathbb{R}^1$ whose spatial motion is the symmetric stable process of index $α\in (0,2/3)$, we prove that there exist exceptional times at which the support is compact and nonempty. Moreover, we show that the set of exceptional times is dense with full Hausdorff dimension. Besides, we prove that near extinction, the support of the superprocess is concentrated arbitrarily close to the distinction point, thus upgrading the corresponding results in Tribe (1992) from $α\in (0,1/2)$ to $α\in (0,2/3)$, and we further show that the set of such exceptional times also admits a full Hausdorff dimension.
△ Less
Submitted 6 December, 2023; v1 submitted 22 November, 2023;
originally announced November 2023.
-
Machine Learning based Post Event Analysis for Cybersecurity of Cyber-Physical System
Authors:
Kuchan Park,
Junho Hong,
Wencong Su,
HyoJong Lee
Abstract:
As Information and Communication Technology (ICT) equipment continues to be integrated into power systems, issues related to cybersecurity are increasingly emerging. Particularly noteworthy is the transition to digital substations, which is shifting operations from traditional hardwired-based systems to communication-based Supervisory Control and Data Acquisition (SCADA) system operations. These c…
▽ More
As Information and Communication Technology (ICT) equipment continues to be integrated into power systems, issues related to cybersecurity are increasingly emerging. Particularly noteworthy is the transition to digital substations, which is shifting operations from traditional hardwired-based systems to communication-based Supervisory Control and Data Acquisition (SCADA) system operations. These changes in the power system have increased the vulnerability of the system to cyber-attacks and emphasized its importance. This paper proposes a machine learning (ML) based post event analysis of the power system in order to respond to these cybersecurity issues. An artificial neural network (ANN) and other ML models are trained using transient fault measurements and cyber-attack data on substations. The trained models can successfully distinguish between power system faults and cyber-attacks. Furthermore, the results of the proposed ML-based methods can also identify 10 different fault types and the location where the event occurred.
△ Less
Submitted 7 March, 2024; v1 submitted 22 November, 2023;
originally announced November 2023.
-
Cracking the Code of Negative Transfer: A Cooperative Game Theoretic Approach for Cross-Domain Sequential Recommendation
Authors:
Chung Park,
Taesan Kim,
Taekyoon Choi,
Junui Hong,
Yelim Yu,
Mincheol Cho,
Kyunam Lee,
Sungil Ryu,
Hyungjun Yoon,
Minsung Choi,
Jaegul Choo
Abstract:
This paper investigates Cross-Domain Sequential Recommendation (CDSR), a promising method that uses information from multiple domains (more than three) to generate accurate and diverse recommendations, and takes into account the sequential nature of user interactions. The effectiveness of these systems often depends on the complex interplay among the multiple domains. In this dynamic landscape, th…
▽ More
This paper investigates Cross-Domain Sequential Recommendation (CDSR), a promising method that uses information from multiple domains (more than three) to generate accurate and diverse recommendations, and takes into account the sequential nature of user interactions. The effectiveness of these systems often depends on the complex interplay among the multiple domains. In this dynamic landscape, the problem of negative transfer arises, where heterogeneous knowledge between dissimilar domains leads to performance degradation due to differences in user preferences across these domains. As a remedy, we propose a new CDSR framework that addresses the problem of negative transfer by assessing the extent of negative transfer from one domain to another and adaptively assigning low weight values to the corresponding prediction losses. To this end, the amount of negative transfer is estimated by measuring the marginal contribution of each domain to model performance based on a cooperative game theory. In addition, a hierarchical contrastive learning approach that incorporates information from the sequence of coarse-level categories into that of fine-level categories (e.g., item level) when implementing contrastive learning was developed to mitigate negative transfer. Despite the potentially low relevance between domains at the fine-level, there may be higher relevance at the category level due to its generalised and broader preferences. We show that our model is superior to prior works in terms of model performance on two real-world datasets across ten different domains.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
SDN-Based Dynamic Cybersecurity Framework of IEC-61850 Communications in Smart Grid
Authors:
Mansi Girdhar,
Junho Hong,
Wencong Su,
Akila Herath,
Chen-Ching Liu
Abstract:
In recent years, critical infrastructure and power grids have experienced a series of cyber-attacks, leading to temporary, widespread blackouts of considerable magnitude. Since most substations are unmanned and have limited physical security protection, cyber breaches into power grid substations present a risk. Nowadays, software-defined network (SDN), a popular virtual network technology based on…
▽ More
In recent years, critical infrastructure and power grids have experienced a series of cyber-attacks, leading to temporary, widespread blackouts of considerable magnitude. Since most substations are unmanned and have limited physical security protection, cyber breaches into power grid substations present a risk. Nowadays, software-defined network (SDN), a popular virtual network technology based on the OpenFlow protocol is being widely used in the substation automation system. However, the susceptibility of SDN architecture to cyber-attacks has exhibited a notable increase in recent years, as indicated by research findings. This suggests a growing concern regarding the potential for cybersecurity breaches within the SDN framework. In this paper, we propose a hybrid intrusion detection system (IDS)-integrated SDN architecture for detecting and preventing the injection of malicious IEC 61850-based generic object-oriented substation event (GOOSE) messages in a digital substation. Additionally, this program locates the fault's location and, as a form of mitigation, disables a certain port. Furthermore, implementation examples are demonstrated and verified using a hardware-in-the-loop (HIL) testbed that mimics the functioning of a digital substation.
△ Less
Submitted 7 March, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
Staffing under Taylor's Law: A Unifying Framework for Bridging Square-root and Linear Safety Rules
Authors:
L. Jeff Hong,
Weihuan Huang,
Jiheng Zhang,
Xiaowei Zhang
Abstract:
Staffing rules serve as an essential management tool in service industries to attain target service levels. Traditionally, the square-root safety rule, based on the Poisson arrival assumption, has been commonly used. However, empirical findings suggest that arrival processes often exhibit an ``over-dispersion'' phenomenon, in which the variance of the arrival exceeds the mean. In this paper, we de…
▽ More
Staffing rules serve as an essential management tool in service industries to attain target service levels. Traditionally, the square-root safety rule, based on the Poisson arrival assumption, has been commonly used. However, empirical findings suggest that arrival processes often exhibit an ``over-dispersion'' phenomenon, in which the variance of the arrival exceeds the mean. In this paper, we develop a new doubly stochastic Poisson process model to capture a significant dispersion scaling law, known as Taylor's law, showing that the variance is a power function of the mean. We further examine how over-dispersion affects staffing, providing a closed-form staffing formula to ensure a desired service level. Interestingly, the additional staffing level beyond the nominal load is a power function of the nominal load, with the power exponent lying between $1/2$ (the square-root safety rule) and $1$ (the linear safety rule), depending on the degree of over-dispersion. Simulation studies and a large-scale call center case study indicate that our staffing rule outperforms classical alternatives.
△ Less
Submitted 19 November, 2023;
originally announced November 2023.
-
Bayesian Neural Networks: A Min-Max Game Framework
Authors:
Junping Hong,
Ercan Engin Kuruoglu
Abstract:
This paper is a preliminary study of the robustness and noise analysis of deep neural networks via a game theory formulation Bayesian Neural Networks (BNN) and the maximal coding rate distortion loss. BNN has been shown to provide some robustness to deep learning, and the minimax method used to be a natural conservative way to assist the Bayesian method. Inspired by the recent closed-loop transcri…
▽ More
This paper is a preliminary study of the robustness and noise analysis of deep neural networks via a game theory formulation Bayesian Neural Networks (BNN) and the maximal coding rate distortion loss. BNN has been shown to provide some robustness to deep learning, and the minimax method used to be a natural conservative way to assist the Bayesian method. Inspired by the recent closed-loop transcription neural network, we formulate the BNN via game theory between the deterministic neural network $f$ and the sampling network $f + ξ$ or $f + r*ξ$. Compared with previous BNN, BNN via game theory learns a solution space within a certain gap between the center $f$ and the sampling point $f + r*ξ$, and is a conservative choice with a meaningful prior setting compared with previous BNN. Furthermore, the minimum points between $f$ and $f + r*ξ$ become stable when the subspace dimension is large enough with a well-trained model $f$. With these, the model $f$ can have a high chance of recognizing the out-of-distribution data or noise data in the subspace rather than the prediction level, even if $f$ is in online training after a few iterations of true data. So far, our experiments are limited to MNIST and Fashion MNIST data sets, more experiments with realistic data sets and complicated neural network models should be implemented to validate the above arguments.
△ Less
Submitted 29 May, 2024; v1 submitted 18 November, 2023;
originally announced November 2023.
-
Stable Differentiable Causal Discovery
Authors:
Achille Nazaret,
Justin Hong,
Elham Azizi,
David Blei
Abstract:
Inferring causal relationships as directed acyclic graphs (DAGs) is an important but challenging problem. Differentiable Causal Discovery (DCD) is a promising approach to this problem, framing the search as a continuous optimization. But existing DCD methods are numerically unstable, with poor performance beyond tens of variables. In this paper, we propose Stable Differentiable Causal Discovery (S…
▽ More
Inferring causal relationships as directed acyclic graphs (DAGs) is an important but challenging problem. Differentiable Causal Discovery (DCD) is a promising approach to this problem, framing the search as a continuous optimization. But existing DCD methods are numerically unstable, with poor performance beyond tens of variables. In this paper, we propose Stable Differentiable Causal Discovery (SDCD), a new method that improves previous DCD methods in two ways: (1) It employs an alternative constraint for acyclicity; this constraint is more stable, both theoretically and empirically, and fast to compute. (2) It uses a training procedure tailored for sparse causal graphs, which are common in real-world scenarios. We first derive SDCD and prove its stability and correctness. We then evaluate it with both observational and interventional data and on both small-scale and large-scale settings. We find that SDCD outperforms existing methods in both convergence speed and accuracy and can scale to thousands of variables. We provide code at https://github.com/azizilab/sdcd.
△ Less
Submitted 27 June, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Topological and control theoretic properties of Hamilton-Jacobi equations via Lax-Oleinik commutators
Authors:
Piermarco Cannarsa,
Wei Cheng,
Jiahui Hong
Abstract:
In the context of weak KAM theory, we discuss the commutators $\{T^-_t\circ T^+_t\}_{t\geqslant0}$ and $\{T^+_t\circ T^-_t\}_{t\geqslant0}$ of Lax-Oleinik operators. We characterize the relation $T^-_t\circ T^+_t=Id$ for both small time and arbitrary time $t$. We show this relation characterizes controllability for evolutionary Hamilton-Jacobi equation. Based on our previous work on the cut locus…
▽ More
In the context of weak KAM theory, we discuss the commutators $\{T^-_t\circ T^+_t\}_{t\geqslant0}$ and $\{T^+_t\circ T^-_t\}_{t\geqslant0}$ of Lax-Oleinik operators. We characterize the relation $T^-_t\circ T^+_t=Id$ for both small time and arbitrary time $t$. We show this relation characterizes controllability for evolutionary Hamilton-Jacobi equation. Based on our previous work on the cut locus of viscosity solution, we refine our analysis of the cut time function $τ$ in terms of commutators $T^+_t\circ T^-_t-T^+_t\circ T^-_t$ and clarify the structure of the super/sub-level set of the cut time function $τ$.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.