-
FoMo: A Foundation Model for Mobile Traffic Forecasting with Diffusion Model
Authors:
Haoye Chai,
Shiyuan Zhang,
Xiaoqian Qi,
Yong Li
Abstract:
Mobile traffic forecasting allows operators to anticipate network dynamics and performance in advance, offering substantial potential for enhancing service quality and improving user experience. However, existing models are often task-oriented and are trained with tailored data, which limits their effectiveness in diverse mobile network tasks of Base Station (BS) deployment, resource allocation, e…
▽ More
Mobile traffic forecasting allows operators to anticipate network dynamics and performance in advance, offering substantial potential for enhancing service quality and improving user experience. However, existing models are often task-oriented and are trained with tailored data, which limits their effectiveness in diverse mobile network tasks of Base Station (BS) deployment, resource allocation, energy optimization, etc. and hinders generalization across different urban environments. Foundation models have made remarkable strides across various domains of NLP and CV due to their multi-tasking adaption and zero/few-shot learning capabilities. In this paper, we propose an innovative Foundation model for Mo}bile traffic forecasting (FoMo), aiming to handle diverse forecasting tasks of short/long-term predictions and distribution generation across multiple cities to support network planning and optimization. FoMo combines diffusion models and transformers, where various spatio-temporal masks are proposed to enable FoMo to learn intrinsic features of different tasks, and a contrastive learning strategy is developed to capture the correlations between mobile traffic and urban contexts, thereby improving its transfer learning capability. Extensive experiments on 9 real-world datasets demonstrate that FoMo outperforms current models concerning diverse forecasting tasks and zero/few-shot learning, showcasing a strong universality. We further deploy the FoMo on the JiuTian optimization platform of China Mobile, where we use the predicted mobile data to formulate network planning and optimization applications, including BS deployment, resource block scheduling, and BS sleep control.
△ Less
Submitted 20 October, 2024;
originally announced October 2024.
-
Fault-tolerant embedding of quantum circuits on hardware architectures via swap gates
Authors:
Shao-Hen Chiew,
Ezequiel Ignacio Rodriguez Chiacchio,
Vishal Sharma,
Jing Hao Chai,
Hui Khoon Ng
Abstract:
In near-term quantum computing devices, connectivity between qubits remain limited by architectural constraints. A computational circuit with given connectivity requirements necessary for multi-qubit gates have to be embedded within physical hardware with fixed connectivity. Long-distance gates have to be done by first routing the relevant qubits together. The simplest routing strategy involves th…
▽ More
In near-term quantum computing devices, connectivity between qubits remain limited by architectural constraints. A computational circuit with given connectivity requirements necessary for multi-qubit gates have to be embedded within physical hardware with fixed connectivity. Long-distance gates have to be done by first routing the relevant qubits together. The simplest routing strategy involves the use of swap gates to swap the information carried by two unconnected qubits to connected ones. Ideal swap gates just permute the qubits; real swap gates, however, have the added possibilities of causing simultaneous errors on the qubits involved and spreading errors across the circuit. A general swap scheme thus changes the error-propagation properties of a circuit, including those necessary for fault-tolerant functioning of a circuit. Here, we present a simple strategy to design the swap scheme needed to embed an abstract circuit onto a physical hardware with constrained connectivity, in a manner that preserves the fault-tolerant properties of the abstract circuit. The embedded circuit will, of course, be noisier, compared to a native implementation of the abstract circuit, but we show in the examples of embedding surface codes on heavy-hexagonal and hexagonal lattices that the deterioration is not severe. This then offers a straightforward solution to implementing circuits with fault-tolerance properties on current hardware.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World
Authors:
Weixiang Yan,
Haitian Liu,
Tengxiao Wu,
Qian Chen,
Wen Wang,
Haoyuan Chai,
Jiayi Wang,
Weishan Zhao,
Yixin Zhang,
Renjun Zhang,
Li Zhu,
Xuandong Zhao
Abstract:
LLMs have achieved significant performance progress in various NLP applications. However, LLMs still struggle to meet the strict requirements for accuracy and reliability in the medical field and face many challenges in clinical applications. Existing clinical diagnostic evaluation benchmarks for evaluating medical agents powered by LLMs have severe limitations. Firstly, most existing medical eval…
▽ More
LLMs have achieved significant performance progress in various NLP applications. However, LLMs still struggle to meet the strict requirements for accuracy and reliability in the medical field and face many challenges in clinical applications. Existing clinical diagnostic evaluation benchmarks for evaluating medical agents powered by LLMs have severe limitations. Firstly, most existing medical evaluation benchmarks face the risk of data leakage or contamination. Secondly, existing benchmarks often neglect the characteristics of multiple departments and specializations in modern medical practice. Thirdly, existing evaluation methods are limited to multiple-choice questions, which do not align with the real-world diagnostic scenarios. Lastly, existing evaluation methods lack comprehensive evaluations of end-to-end real clinical scenarios. These limitations in benchmarks in turn obstruct advancements of LLMs and agents for medicine. To address these limitations, we introduce ClinicalLab, a comprehensive clinical diagnosis agent alignment suite. ClinicalLab includes ClinicalBench, an end-to-end multi-departmental clinical diagnostic evaluation benchmark for evaluating medical agents and LLMs. ClinicalBench is based on real cases that cover 24 departments and 150 diseases. ClinicalLab also includes four novel metrics (ClinicalMetrics) for evaluating the effectiveness of LLMs in clinical diagnostic tasks. We evaluate 17 LLMs and find that their performance varies significantly across different departments. Based on these findings, in ClinicalLab, we propose ClinicalAgent, an end-to-end clinical agent that aligns with real-world clinical diagnostic practices. We systematically investigate the performance and applicable scenarios of variants of ClinicalAgent on ClinicalBench. Our findings demonstrate the importance of aligning with modern medical practices in designing medical agents.
△ Less
Submitted 9 October, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
PowerPeeler: A Precise and General Dynamic Deobfuscation Method for PowerShell Scripts
Authors:
Ruijie Li,
Chenyang Zhang,
Huajun Chai,
Lingyun Ying,
Haixin Duan,
Jun Tao
Abstract:
PowerShell is a powerful and versatile task automation tool. Unfortunately, it is also widely abused by cyber attackers. To bypass malware detection and hinder threat analysis, attackers often employ diverse techniques to obfuscate malicious PowerShell scripts. Existing deobfuscation tools suffer from the limitation of static analysis, which fails to simulate the real deobfuscation process accurat…
▽ More
PowerShell is a powerful and versatile task automation tool. Unfortunately, it is also widely abused by cyber attackers. To bypass malware detection and hinder threat analysis, attackers often employ diverse techniques to obfuscate malicious PowerShell scripts. Existing deobfuscation tools suffer from the limitation of static analysis, which fails to simulate the real deobfuscation process accurately.
In this paper, we propose PowerPeeler. To the best of our knowledge, it is the first dynamic PowerShell script deobfuscation approach at the instruction level. It utilizes expression-related Abstract Syntax Tree (AST) nodes to identify potential obfuscated script pieces. Then, PowerPeeler correlates the AST nodes with their corresponding instructions and monitors the script's entire execution process. Subsequently, PowerPeeler dynamically tracks the execution of these instructions and records their execution results. Finally, PowerPeeler stringifies these results to replace the corresponding obfuscated script pieces and reconstruct the deobfuscated script.
To evaluate the effectiveness of PowerPeeler, we collect 1,736,669 real-world malicious PowerShell samples with diversity obfuscation methods. We compare PowerPeeler with five state-of-the-art deobfuscation tools and GPT-4. The evaluation results demonstrate that PowerPeeler can effectively handle all well-known obfuscation methods. Additionally, the deobfuscation correctness rate of PowerPeeler reaches 95%, significantly surpassing that of other tools. PowerPeeler not only recovers the highest amount of sensitive data but also maintains a semantic consistency over 97%, which is also the best. Moreover, PowerPeeler effectively obtains the largest quantity of valid deobfuscated results within a limited time frame. Furthermore, PowerPeeler is extendable and can be used as a helpful tool for other cyber security solutions.
△ Less
Submitted 19 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning
Authors:
Liuzhi Zhou,
Yu He,
Kun Zhai,
Xiang Liu,
Sen Liu,
Xingjun Ma,
Guangnan Ye,
Yu-Gang Jiang,
Hongfeng Chai
Abstract:
Federated learning (FL) has emerged as a prominent approach for collaborative training of machine learning models across distributed clients while preserving data privacy. However, the quest to balance acceleration and stability becomes a significant challenge in FL, especially on the client-side. In this paper, we introduce FedCAda, an innovative federated client adaptive algorithm designed to ta…
▽ More
Federated learning (FL) has emerged as a prominent approach for collaborative training of machine learning models across distributed clients while preserving data privacy. However, the quest to balance acceleration and stability becomes a significant challenge in FL, especially on the client-side. In this paper, we introduce FedCAda, an innovative federated client adaptive algorithm designed to tackle this challenge. FedCAda leverages the Adam algorithm to adjust the correction process of the first moment estimate $m$ and the second moment estimate $v$ on the client-side and aggregate adaptive algorithm parameters on the server-side, aiming to accelerate convergence speed and communication efficiency while ensuring stability and performance. Additionally, we investigate several algorithms incorporating different adjustment functions. This comparative analysis revealed that due to the limited information contained within client models from other clients during the initial stages of federated learning, more substantial constraints need to be imposed on the parameters of the adaptive algorithm. As federated learning progresses and clients gather more global information, FedCAda gradually diminishes the impact on adaptive parameters. These findings provide insights for enhancing the robustness and efficiency of algorithmic improvements. Through extensive experiments on computer vision (CV) and natural language processing (NLP) datasets, we demonstrate that FedCAda outperforms the state-of-the-art methods in terms of adaptability, convergence, stability, and overall performance. This work contributes to adaptive algorithms for federated learning, encouraging further exploration.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation
Authors:
Kounianhua Du,
Jizheng Chen,
Renting Rui,
Huacan Chai,
Lingyue Fu,
Wei Xia,
Yasheng Wang,
Ruiming Tang,
Yong Yu,
Weinan Zhang
Abstract:
Utilizing large language models to generate codes has shown promising meaning in software development revolution. Despite the intelligence shown by the general large language models, their specificity in code generation can still be improved due to the syntactic gap and mismatched vocabulary existing among natural language and different programming languages. In this paper, we propose CodeGRAG, a…
▽ More
Utilizing large language models to generate codes has shown promising meaning in software development revolution. Despite the intelligence shown by the general large language models, their specificity in code generation can still be improved due to the syntactic gap and mismatched vocabulary existing among natural language and different programming languages. In this paper, we propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs. CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language, which can facilitate natural language based LLMs for better understanding of code syntax and serve as a bridge among different programming languages. To take the extracted structural knowledge into the foundation models, we propose 1) a hard meta-graph prompt template to transform the challenging graphical representation into informative knowledge for tuning-free models and 2) a soft prompting technique that injects the domain knowledge of programming languages into the model parameters via finetuning the models with the help of a pretrained GNN expert model. Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert. CodeGRAG improves the code generation ability of LLMs and can even offer performance gain for cross-lingual code generation. The implementation is available at https://anonymous.4open.science/r/Code-5970/.
△ Less
Submitted 2 October, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
SilverSight: A Multi-Task Chinese Financial Large Language Model Based on Adaptive Semantic Space Learning
Authors:
Yuhang Zhou,
Zeping Li,
Siyu Tian,
Yuchen Ni,
Sen Liu,
Guangnan Ye,
Hongfeng Chai
Abstract:
Large language models (LLMs) are increasingly being applied across various specialized fields, leveraging their extensive knowledge to empower a multitude of scenarios within these domains. However, each field encompasses a variety of specific tasks that require learning, and the diverse, heterogeneous data across these domains can lead to conflicts during model task transfer. In response to this…
▽ More
Large language models (LLMs) are increasingly being applied across various specialized fields, leveraging their extensive knowledge to empower a multitude of scenarios within these domains. However, each field encompasses a variety of specific tasks that require learning, and the diverse, heterogeneous data across these domains can lead to conflicts during model task transfer. In response to this challenge, our study introduces an Adaptive Semantic Space Learning (ASSL) framework, which utilizes the adaptive reorganization of data distributions within the semantic space to enhance the performance and selection efficacy of multi-expert models. Utilizing this framework, we trained a financial multi-task LLM named "SilverSight". Our research findings demonstrate that our framework can achieve results close to those obtained with full data training using only 10% of the data, while also exhibiting strong generalization capabilities.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
LDTR: Transformer-based Lane Detection with Anchor-chain Representation
Authors:
Zhongyu Yang,
Chen Shen,
Wei Shao,
Tengfei Xing,
Runbo Hu,
Pengfei Xu,
Hua Chai,
Ruini Xue
Abstract:
Despite recent advances in lane detection methods, scenarios with limited- or no-visual-clue of lanes due to factors such as lighting conditions and occlusion remain challenging and crucial for automated driving. Moreover, current lane representations require complex post-processing and struggle with specific instances. Inspired by the DETR architecture, we propose LDTR, a transformer-based model…
▽ More
Despite recent advances in lane detection methods, scenarios with limited- or no-visual-clue of lanes due to factors such as lighting conditions and occlusion remain challenging and crucial for automated driving. Moreover, current lane representations require complex post-processing and struggle with specific instances. Inspired by the DETR architecture, we propose LDTR, a transformer-based model to address these issues. Lanes are modeled with a novel anchor-chain, regarding a lane as a whole from the beginning, which enables LDTR to handle special lanes inherently. To enhance lane instance perception, LDTR incorporates a novel multi-referenced deformable attention module to distribute attention around the object. Additionally, LDTR incorporates two line IoU algorithms to improve convergence efficiency and employs a Gaussian heatmap auxiliary branch to enhance model representation capability during training. To evaluate lane detection models, we rely on Frechet distance, parameterized F1-score, and additional synthetic metrics. Experimental results demonstrate that LDTR achieves state-of-the-art performance on well-known datasets.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
RAGFormer: Learning Semantic Attributes and Topological Structure for Fraud Detection
Authors:
Haolin Li,
Shuyang Jiang,
Lifeng Zhang,
Siyuan Du,
Guangnan Ye,
Hongfeng Chai
Abstract:
Fraud detection remains a challenging task due to the complex and deceptive nature of fraudulent activities. Current approaches primarily concentrate on learning only one perspective of the graph: either the topological structure of the graph or the attributes of individual nodes. However, we conduct empirical studies to reveal that these two types of features, while nearly orthogonal, are each in…
▽ More
Fraud detection remains a challenging task due to the complex and deceptive nature of fraudulent activities. Current approaches primarily concentrate on learning only one perspective of the graph: either the topological structure of the graph or the attributes of individual nodes. However, we conduct empirical studies to reveal that these two types of features, while nearly orthogonal, are each independently effective. As a result, previous methods can not fully capture the comprehensive characteristics of the fraud graph. To address this dilemma, we present a novel framework called Relation-Aware GNN with transFormer~(RAGFormer) which simultaneously embeds both semantic and topological features into a target node. The simple yet effective network consists of a semantic encoder, a topology encoder, and an attention fusion module. The semantic encoder utilizes Transformer to learn semantic features and node interactions across different relations. We introduce Relation-Aware GNN as the topology encoder to learn topological features and node interactions within each relation. These two complementary features are interleaved through an attention fusion module to support prediction by both orthogonal features. Extensive experiments on two popular public datasets demonstrate that RAGFormer achieves state-of-the-art performance. The significant improvement of RAGFormer in an industrial credit card fraud detection dataset further validates the applicability of our method in real-world business scenarios.
△ Less
Submitted 18 May, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Are LLMs Rational Investors? A Study on Detecting and Reducing the Financial Bias in LLMs
Authors:
Yuhang Zhou,
Yuchen Ni,
Yunhui Gan,
Zhangyue Yin,
Xiang Liu,
Jian Zhang,
Sen Liu,
Xipeng Qiu,
Guangnan Ye,
Hongfeng Chai
Abstract:
Large Language Models (LLMs) are increasingly adopted in financial analysis for interpreting complex market data and trends. However, their use is challenged by intrinsic biases (e.g., risk-preference bias) and a superficial understanding of market intricacies, necessitating a thorough assessment of their financial insight. To address these issues, we introduce Financial Bias Indicators (FBI), a f…
▽ More
Large Language Models (LLMs) are increasingly adopted in financial analysis for interpreting complex market data and trends. However, their use is challenged by intrinsic biases (e.g., risk-preference bias) and a superficial understanding of market intricacies, necessitating a thorough assessment of their financial insight. To address these issues, we introduce Financial Bias Indicators (FBI), a framework with components like Bias Unveiler, Bias Detective, Bias Tracker, and Bias Antidote to identify, detect, analyze, and eliminate irrational biases in LLMs. By combining behavioral finance principles with bias examination, we evaluate 23 leading LLMs and propose a de-biasing method based on financial causal knowledge. Results show varying degrees of financial irrationality among models, influenced by their design and training. Models trained specifically on financial datasets may exhibit more irrationality, and even larger financial language models (FinLLMs) can show more bias than smaller, general models. We utilize four prompt-based methods incorporating causal debiasing, effectively reducing financial biases in these models. This work enhances the understanding of LLMs' bias in financial applications, laying the foundation for developing more reliable and rational financial analysis tools.
△ Less
Submitted 1 July, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Tab-Attention: Self-Attention-based Stacked Generalization for Imbalanced Credit Default Prediction
Authors:
Yandan Tan,
Hongbin Zhu,
JieWu,
Hongfeng Chai
Abstract:
Accurately credit default prediction faces challenges due to imbalanced data and low correlation between features and labels. Existing default prediction studies on the basis of gradient boosting decision trees (GBDT), deep learning techniques, and feature selection strategies can have varying degrees of success depending on the specific task. Motivated by this, we propose Tab-Attention, a novel s…
▽ More
Accurately credit default prediction faces challenges due to imbalanced data and low correlation between features and labels. Existing default prediction studies on the basis of gradient boosting decision trees (GBDT), deep learning techniques, and feature selection strategies can have varying degrees of success depending on the specific task. Motivated by this, we propose Tab-Attention, a novel self-attention-based stacked generalization method for credit default prediction. This approach ensembles the potential proprietary knowledge contributions from multi-view feature spaces, to cope with low feature correlation and imbalance. We organize multi-view feature spaces according to the latent linear or nonlinear strengths between features and labels. Meanwhile, the f1 score assists the model in imbalance training to find the optimal state for identifying minority default samples. Our Tab-Attention achieves superior Recall_1 and f1_1 of default intention recognition than existing GBDT-based models and advanced deep learning by about 32.92% and 16.05% on average, respectively, while maintaining outstanding overall performance and prediction performance for non-default samples. The proposed method could ensemble essential knowledge through the self-attention mechanism, which is of great significance for a more robust future prediction system.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
$R^3$-NL2GQL: A Model Coordination and Knowledge Graph Alignment Approach for NL2GQL
Authors:
Yuhang Zhou,
Yu He,
Siyu Tian,
Yuchen Ni,
Zhangyue Yin,
Xiang Liu,
Chuanjun Ji,
Sen Liu,
Xipeng Qiu,
Guangnan Ye,
Hongfeng Chai
Abstract:
While current tasks of converting natural language to SQL (NL2SQL) using Foundation Models have shown impressive achievements, adapting these approaches for converting natural language to Graph Query Language (NL2GQL) encounters hurdles due to the distinct nature of GQL compared to SQL, alongside the diverse forms of GQL. Moving away from traditional rule-based and slot-filling methodologies, we i…
▽ More
While current tasks of converting natural language to SQL (NL2SQL) using Foundation Models have shown impressive achievements, adapting these approaches for converting natural language to Graph Query Language (NL2GQL) encounters hurdles due to the distinct nature of GQL compared to SQL, alongside the diverse forms of GQL. Moving away from traditional rule-based and slot-filling methodologies, we introduce a novel approach, $R^3$-NL2GQL, integrating both small and large Foundation Models for ranking, rewriting, and refining tasks. This method leverages the interpretative strengths of smaller models for initial ranking and rewriting stages, while capitalizing on the superior generalization and query generation prowess of larger models for the final transformation of natural language queries into GQL formats. Addressing the scarcity of datasets in this emerging field, we have developed a bilingual dataset, sourced from graph database manuals and selected open-source Knowledge Graphs (KGs). Our evaluation of this methodology on this dataset demonstrates its promising efficacy and robustness.
△ Less
Submitted 1 July, 2024; v1 submitted 3 November, 2023;
originally announced November 2023.
-
Investigating Multilingual Coreference Resolution by Universal Annotations
Authors:
Haixia Chai,
Michael Strube
Abstract:
Multilingual coreference resolution (MCR) has been a long-standing and challenging task. With the newly proposed multilingual coreference dataset, CorefUD (Nedoluzhko et al., 2022), we conduct an investigation into the task by using its harmonized universal morphosyntactic and coreference annotations. First, we study coreference by examining the ground truth data at different linguistic levels, na…
▽ More
Multilingual coreference resolution (MCR) has been a long-standing and challenging task. With the newly proposed multilingual coreference dataset, CorefUD (Nedoluzhko et al., 2022), we conduct an investigation into the task by using its harmonized universal morphosyntactic and coreference annotations. First, we study coreference by examining the ground truth data at different linguistic levels, namely mention, entity and document levels, and across different genres, to gain insights into the characteristics of coreference across multiple languages. Second, we perform an error analysis of the most challenging cases that the SotA system fails to resolve in the CRAC 2022 shared task using the universal annotations. Last, based on this analysis, we extract features from universal morphosyntactic annotations and integrate these features into a baseline system to assess their potential benefits for the MCR task. Our results show that our best configuration of features improves the baseline by 0.9% F1 score.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Pre-Training on Large-Scale Generated Docking Conformations with HelixDock to Unlock the Potential of Protein-ligand Structure Prediction Models
Authors:
Lihang Liu,
Shanzhuo Zhang,
Donglong He,
Xianbin Ye,
Jingbo Zhou,
Xiaonan Zhang,
Yaoyao Jiang,
Weiming Diao,
Hang Yin,
Hua Chai,
Fan Wang,
Jingzhou He,
Liang Zheng,
Yonghui Li,
Xiaomin Fang
Abstract:
Protein-ligand structure prediction is an essential task in drug discovery, predicting the binding interactions between small molecules (ligands) and target proteins (receptors). Recent advances have incorporated deep learning techniques to improve the accuracy of protein-ligand structure prediction. Nevertheless, the experimental validation of docking conformations remains costly, it raises conce…
▽ More
Protein-ligand structure prediction is an essential task in drug discovery, predicting the binding interactions between small molecules (ligands) and target proteins (receptors). Recent advances have incorporated deep learning techniques to improve the accuracy of protein-ligand structure prediction. Nevertheless, the experimental validation of docking conformations remains costly, it raises concerns regarding the generalizability of these deep learning-based methods due to the limited training data. In this work, we show that by pre-training on a large-scale docking conformation generated by traditional physics-based docking tools and then fine-tuning with a limited set of experimentally validated receptor-ligand complexes, we can obtain a protein-ligand structure prediction model with outstanding performance. Specifically, this process involved the generation of 100 million docking conformations for protein-ligand pairings, an endeavor consuming roughly 1 million CPU core days. The proposed model, HelixDock, aims to acquire the physical knowledge encapsulated by the physics-based docking tools during the pre-training phase. HelixDock has been rigorously benchmarked against both physics-based and deep learning-based baselines, demonstrating its exceptional precision and robust transferability in predicting binding confirmation. In addition, our investigation reveals the scaling laws governing pre-trained protein-ligand structure prediction models, indicating a consistent enhancement in performance with increases in model parameters and the volume of pre-training data. Moreover, we applied HelixDock to several drug discovery-related tasks to validate its practical utility. HelixDock demonstrates outstanding capabilities on both cross-docking and structure-based virtual screening benchmarks.
△ Less
Submitted 22 May, 2024; v1 submitted 21 October, 2023;
originally announced October 2023.
-
Single-shot deterministic complex amplitude imaging with a single-layer metalens
Authors:
Liu Li,
Shuai Wang,
Feng Zhao,
Yixin Zhang,
Shun Wen,
Huichao Chai,
Yunhui Gao,
Wenhui Wang,
Liangcai Cao,
Yuanmu Yang
Abstract:
Conventional imaging systems can only capture light intensity. Meanwhile, the lost phase information may be critical for a variety of applications such as label-free microscopy and optical metrology. Existing phase retrieval techniques typically require a bulky setup, multi-frame measurements, or prior information of the target scene. Here, we proposed an extremely compact system for complex ampli…
▽ More
Conventional imaging systems can only capture light intensity. Meanwhile, the lost phase information may be critical for a variety of applications such as label-free microscopy and optical metrology. Existing phase retrieval techniques typically require a bulky setup, multi-frame measurements, or prior information of the target scene. Here, we proposed an extremely compact system for complex amplitude imaging, leveraging the extreme versatility of a single-layer metalens to generate spatially-multiplexed and polarization-phase-shifted point spread functions. Combining the metalens with a polarization camera, the system can simultaneously record four polarization shearing interference patterns along both in-plane directions, thus allowing the deterministic reconstruction of the complex amplitude light field in a single shot. Using an incoherent light-emitting diode as the illumination, we experimentally demonstrated speckle-noise-free complex amplitude imaging for both static and moving objects with tailored magnification ratio and field-of-view. The miniaturized and robust system may open the door for complex amplitude imaging in portable devices for point-of-care applications.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
GRASS: Unified Generation Model for Speech-to-Semantic Tasks
Authors:
Aobo Xia,
Shuyu Lei,
Yushu Yang,
Xiang Guo,
Hua Chai
Abstract:
This paper explores the instruction fine-tuning technique for speech-to-semantic tasks by introducing a unified end-to-end (E2E) framework that generates target text conditioned on a task-related prompt for audio data. We pre-train the model using large and diverse data, where instruction-speech pairs are constructed via a text-to-speech (TTS) system. Extensive experiments demonstrate that our pro…
▽ More
This paper explores the instruction fine-tuning technique for speech-to-semantic tasks by introducing a unified end-to-end (E2E) framework that generates target text conditioned on a task-related prompt for audio data. We pre-train the model using large and diverse data, where instruction-speech pairs are constructed via a text-to-speech (TTS) system. Extensive experiments demonstrate that our proposed model achieves state-of-the-art (SOTA) results on many benchmarks covering speech named entity recognition, speech sentiment analysis, speech question answering, and more, after fine-tuning. Furthermore, the proposed model achieves competitive performance in zero-shot and few-shot scenarios. To facilitate future work on instruction fine-tuning for speech-to-semantic tasks, we release our instruction dataset and code.
△ Less
Submitted 11 September, 2023; v1 submitted 6 September, 2023;
originally announced September 2023.
-
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models
Authors:
Lingyue Fu,
Huacan Chai,
Shuang Luo,
Kounianhua Du,
Weiming Zhang,
Longteng Fan,
Jiayi Lei,
Renting Rui,
Jianghao Lin,
Yuchen Fang,
Yifan Liu,
Jingkuan Wang,
Siyuan Qi,
Kangning Zhang,
Weinan Zhang,
Yong Yu
Abstract:
With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers. Evaluating the programming capabilities of LLMs is crucial as it reflects the multifaceted abilities of LLMs, and it has numerous downstream applications. In this paper, we propose CodeApex, a bilingual benchmark data…
▽ More
With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers. Evaluating the programming capabilities of LLMs is crucial as it reflects the multifaceted abilities of LLMs, and it has numerous downstream applications. In this paper, we propose CodeApex, a bilingual benchmark dataset focusing on the programming comprehension, code generation, and code correction abilities of LLMs. Programming comprehension task tests LLMs on multiple-choice exam questions covering conceptual understanding, commonsense reasoning, and multi-hop reasoning. The code generation task evaluates LLMs through completing C++ functions based on provided descriptions and prototypes. The code correction task asks LLMs to fix real-world erroneous code segments with different error messages. We evaluate 12 widely used LLMs, including both general-purpose and specialized models. GPT-4 exhibits the best programming capabilities, achieving approximate accuracy of 69%, 54%, and 66% on the three tasks, respectively. Compared to human performance, there is still significant room for improvement in LLM programming. We hope that CodeApex can serve as a reference for evaluating the coding capabilities of LLMs, further promoting their development and growth.
△ Less
Submitted 11 March, 2024; v1 submitted 5 September, 2023;
originally announced September 2023.
-
CANet: Curved Guide Line Network with Adaptive Decoder for Lane Detection
Authors:
Zhongyu Yang,
Chen Shen,
Wei Shao,
Tengfei Xing,
Runbo Hu,
Pengfei Xu,
Hua Chai,
Ruini Xue
Abstract:
Lane detection is challenging due to the complicated on road scenarios and line deformation from different camera perspectives. Lots of solutions were proposed, but can not deal with corner lanes well. To address this problem, this paper proposes a new top-down deep learning lane detection approach, CANET. A lane instance is first responded by the heat-map on the U-shaped curved guide line at glob…
▽ More
Lane detection is challenging due to the complicated on road scenarios and line deformation from different camera perspectives. Lots of solutions were proposed, but can not deal with corner lanes well. To address this problem, this paper proposes a new top-down deep learning lane detection approach, CANET. A lane instance is first responded by the heat-map on the U-shaped curved guide line at global semantic level, thus the corresponding features of each lane are aggregated at the response point. Then CANET obtains the heat-map response of the entire lane through conditional convolution, and finally decodes the point set to describe lanes via adaptive decoder. The experimental results show that CANET reaches SOTA in different metrics. Our code will be released soon.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.
-
Universal Adversarial Backdoor Attacks to Fool Vertical Federated Learning in Cloud-Edge Collaboration
Authors:
Peng Chen,
Xin Du,
Zhihui Lu,
Hongfeng Chai
Abstract:
Vertical federated learning (VFL) is a cloud-edge collaboration paradigm that enables edge nodes, comprising resource-constrained Internet of Things (IoT) devices, to cooperatively train artificial intelligence (AI) models while retaining their data locally. This paradigm facilitates improved privacy and security for edges and IoT devices, making VFL an essential component of Artificial Intelligen…
▽ More
Vertical federated learning (VFL) is a cloud-edge collaboration paradigm that enables edge nodes, comprising resource-constrained Internet of Things (IoT) devices, to cooperatively train artificial intelligence (AI) models while retaining their data locally. This paradigm facilitates improved privacy and security for edges and IoT devices, making VFL an essential component of Artificial Intelligence of Things (AIoT) systems. Nevertheless, the partitioned structure of VFL can be exploited by adversaries to inject a backdoor, enabling them to manipulate the VFL predictions. In this paper, we aim to investigate the vulnerability of VFL in the context of binary classification tasks. To this end, we define a threat model for backdoor attacks in VFL and introduce a universal adversarial backdoor (UAB) attack to poison the predictions of VFL. The UAB attack, consisting of universal trigger generation and clean-label backdoor injection, is incorporated during the VFL training at specific iterations. This is achieved by alternately optimizing the universal trigger and model parameters of VFL sub-problems. Our work distinguishes itself from existing studies on designing backdoor attacks for VFL, as those require the knowledge of auxiliary information not accessible within the split VFL architecture. In contrast, our approach does not necessitate any additional data to execute the attack. On the LendingClub and Zhongyuan datasets, our approach surpasses existing state-of-the-art methods, achieving up to 100\% backdoor task performance while maintaining the main task performance. Our results in this paper make a major advance to revealing the hidden backdoor risks of VFL, hence paving the way for the future development of secure AIoT.
△ Less
Submitted 22 April, 2023;
originally announced April 2023.
-
Graph Signal Sampling for Inductive One-Bit Matrix Completion: a Closed-form Solution
Authors:
Chao Chen,
Haoyu Geng,
Gang Zeng,
Zhaobing Han,
Hua Chai,
Xiaokang Yang,
Junchi Yan
Abstract:
Inductive one-bit matrix completion is motivated by modern applications such as recommender systems, where new users would appear at test stage with the ratings consisting of only ones and no zeros. We propose a unified graph signal sampling framework which enjoys the benefits of graph signal analysis and processing. The key idea is to transform each user's ratings on the items to a function (sign…
▽ More
Inductive one-bit matrix completion is motivated by modern applications such as recommender systems, where new users would appear at test stage with the ratings consisting of only ones and no zeros. We propose a unified graph signal sampling framework which enjoys the benefits of graph signal analysis and processing. The key idea is to transform each user's ratings on the items to a function (signal) on the vertices of an item-item graph, then learn structural graph properties to recover the function from its values on certain vertices -- the problem of graph signal sampling. We propose a class of regularization functionals that takes into account discrete random label noise in the graph vertex domain, then develop the GS-IMC approach which biases the reconstruction towards functions that vary little between adjacent vertices for noise reduction. Theoretical result shows that accurate reconstructions can be achieved under mild conditions. For the online setting, we develop a Bayesian extension, i.e., BGS-IMC which considers continuous random Gaussian noise in the graph Fourier domain and builds upon a prediction-correction update algorithm to obtain the unbiased and minimum-variance reconstruction. Both GS-IMC and BGS-IMC have closed-form solutions and thus are highly scalable in large data. Experiments show that our methods achieve state-of-the-art performance on public benchmarks.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Flying Trot Control Method for Quadruped Robot Based on Trajectory Planning
Authors:
Hongge Wang,
Hui Chai,
Bin Chen,
Aizhen Xie,
Rui Song,
Bo Su
Abstract:
An intuitive control method for the flying trot, which combines offline trajectory planning with real-time balance control, is presented. The motion features of running animals in the vertical direction were analysed using the spring-load-inverted-pendulum (SLIP) model, and the foot trajectory of the robot was planned, so the robot could run similar to an animal capable of vertical flight, accordi…
▽ More
An intuitive control method for the flying trot, which combines offline trajectory planning with real-time balance control, is presented. The motion features of running animals in the vertical direction were analysed using the spring-load-inverted-pendulum (SLIP) model, and the foot trajectory of the robot was planned, so the robot could run similar to an animal capable of vertical flight, according to the given height and speed of the trunk. To improve the robustness of running, a posture control method based on a foot acceleration adjustment is proposed. A novel kinematic based CoM observation method and CoM regulation method is present to enhance the stability of locomotion. To reduce the impact force when the robot interacts with the environment, the virtual model control method is used in the control of the foot trajectory to achieve active compliance. By selecting the proper parameters for the virtual model, the oscillation motion of the virtual model and the planning motion of the support foot are synchronized to avoid the large disturbance caused by the oscillation motion of the virtual model in relation to the robot motion. The simulation and experiment using the quadruped robot Billy are reported. In the experiment, the maximum speed of the robot could reach 4.73 times the body length per second, which verified the feasibility of the control method.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Optimizing resource efficiencies for scalable full-stack quantum computers
Authors:
Marco Fellous-Asiani,
Jing Hao Chai,
Yvain Thonnart,
Hui Khoon Ng,
Robert S. Whitney,
Alexia Auffèves
Abstract:
In the race to build scalable quantum computers, minimizing the resource consumption of their full stack to achieve a target performance becomes crucial. It mandates a synergy of fundamental physics and engineering: the former for the microscopic aspects of computing performance, and the latter for the macroscopic resource consumption. For this we propose a holistic methodology dubbed Metric-Noise…
▽ More
In the race to build scalable quantum computers, minimizing the resource consumption of their full stack to achieve a target performance becomes crucial. It mandates a synergy of fundamental physics and engineering: the former for the microscopic aspects of computing performance, and the latter for the macroscopic resource consumption. For this we propose a holistic methodology dubbed Metric-Noise-Resource (MNR) able to quantify and optimize all aspects of the full-stack quantum computer, bringing together concepts from quantum physics (e.g., noise on the qubits), quantum information (e.g., computing architecture and type of error correction), and enabling technologies (e.g., cryogenics, control electronics, and wiring). This holistic approach allows us to define and study resource efficiencies as ratios between performance and resource cost. As a proof of concept, we use MNR to minimize the power consumption of a full-stack quantum computer, performing noisy or fault-tolerant computing with a target performance for the task of interest. Comparing this with a classical processor performing the same task, we identify a quantum energy advantage in regimes of parameters distinct from the commonly considered quantum computational advantage. This provides a previously overlooked practical argument for building quantum computers. While our illustration uses highly idealized parameters inspired by superconducting qubits with concatenated error correction, the methodology is universal -- it applies to other qubits and error-correcting codes -- and provides experimenters with guidelines to build energy-efficient quantum processors. In some regimes of high energy consumption, it can reduce this consumption by orders of magnitudes. Overall, our methodology lays the theoretical foundation for resource-efficient quantum technologies.
△ Less
Submitted 16 October, 2023; v1 submitted 12 September, 2022;
originally announced September 2022.
-
On the fault-tolerance threshold for surface codes with general noise
Authors:
Jing Hao Chai,
Hui Khoon Ng
Abstract:
Fault-tolerant quantum computing based on surface codes has emerged as a popular route to large-scale quantum computers capable of accurate computation even in the presence of noise. Its popularity is, in part, because the fault-tolerance or accuracy threshold for surface codes is believed to be less stringent than competing schemes. This threshold is the noise level below which computational accu…
▽ More
Fault-tolerant quantum computing based on surface codes has emerged as a popular route to large-scale quantum computers capable of accurate computation even in the presence of noise. Its popularity is, in part, because the fault-tolerance or accuracy threshold for surface codes is believed to be less stringent than competing schemes. This threshold is the noise level below which computational accuracy can be increased by increasing physical resources for noise removal, and is an important engineering target for realising quantum devices. The current conclusions about surface code thresholds are, however, drawn largely from studies of probabilistic noise. While a natural assumption, current devices experience noise beyond such a model, raising the question of whether conventional statements about the thresholds apply. Here, we attempt to extend past proof techniques to derive the fault-tolerance threshold for surface codes subjected to general noise with no particular structure. Surprisingly, we found no nontrivial threshold, i.e., there is no guarantee the surface code prescription works for general noise. While this is not a proof that the scheme fails, we argue that current proof techniques are likely unable to provide an answer. A genuinely new idea is needed, to reaffirm the feasibility of surface code quantum computing.
△ Less
Submitted 1 July, 2022;
originally announced July 2022.
-
S4OD: Semi-Supervised learning for Single-Stage Object Detection
Authors:
Yueming Zhang,
Xingxu Yao,
Chao Liu,
Feng Chen,
Xiaolin Song,
Tengfei Xing,
Runbo Hu,
Hua Chai,
Pengfei Xu,
Guoshan Zhang
Abstract:
Single-stage detectors suffer from extreme foreground-background class imbalance, while two-stage detectors do not. Therefore, in semi-supervised object detection, two-stage detectors can deliver remarkable performance by only selecting high-quality pseudo labels based on classification scores. However, directly applying this strategy to single-stage detectors would aggravate the class imbalance w…
▽ More
Single-stage detectors suffer from extreme foreground-background class imbalance, while two-stage detectors do not. Therefore, in semi-supervised object detection, two-stage detectors can deliver remarkable performance by only selecting high-quality pseudo labels based on classification scores. However, directly applying this strategy to single-stage detectors would aggravate the class imbalance with fewer positive samples. Thus, single-stage detectors have to consider both quality and quantity of pseudo labels simultaneously. In this paper, we design a dynamic self-adaptive threshold (DSAT) strategy in classification branch, which can automatically select pseudo labels to achieve an optimal trade-off between quality and quantity. Besides, to assess the regression quality of pseudo labels in single-stage detectors, we propose a module to compute the regression uncertainty of boxes based on Non-Maximum Suppression. By leveraging only 10% labeled data from COCO, our method achieves 35.0% AP on anchor-free detector (FCOS) and 32.9% on anchor-based detector (RetinaNet).
△ Less
Submitted 9 April, 2022;
originally announced April 2022.
-
Intelligent Sensing Scheduling for Mobile Target Tracking Wireless Sensor Networks
Authors:
Longyu Zhou,
Supeng Leng,
Qiang Liu,
Haoye Chai,
Jihua Zhou
Abstract:
Edge computing has emerged as a prospective paradigm to meet ever-increasing computation demands in Mobile Target Tracking Wireless Sensor Networks (MTT-WSN). This paradigm can offload time-sensitive tasks to sink nodes to improve computing efficiency. Nevertheless, it is difficult to execute dynamic and critical tasks in the MTT-WSN network. Besides, the network cannot ensure consecutive tracking…
▽ More
Edge computing has emerged as a prospective paradigm to meet ever-increasing computation demands in Mobile Target Tracking Wireless Sensor Networks (MTT-WSN). This paradigm can offload time-sensitive tasks to sink nodes to improve computing efficiency. Nevertheless, it is difficult to execute dynamic and critical tasks in the MTT-WSN network. Besides, the network cannot ensure consecutive tracking due to the limited energy. To address the problems, this paper proposes a new hierarchical target tracking structure based on Edge Intelligence (EI) technology. The structure integrates the computing resource of both mobile nodes and edge servers to provide efficient computation capability for real-time target tracking. Based on the proposed structure, we formulate an energy optimization model with the constrains of system execution latency and trajectory prediction accuracy. Moreover, we propose a long-term dynamic resource allocation algorithm to obtain the optimal resource allocation solution for the ac- curate and consecutive tracking. Simulation results demonstrate that our algorithm outperforms the deep Q-learning over 14.5% in terms of system energy consumption. It can also obtain a significant enhancement in tracking accuracy compared with the non-cooperative scheme.
△ Less
Submitted 4 August, 2021;
originally announced August 2021.
-
Secure and Efficient Blockchain based Knowledge Sharing for Intelligent Connected Vehicles
Authors:
Haoye Chai,
Supeng Leng,
Fan Wu,
Jianhua He
Abstract:
The emergence of Intelligent Connected Vehicles (ICVs) shows great potential for future intelligent traffic systems, enhancing both traffic safety and road efficiency. However, the ICVs relying on data driven perception and driving models face many challenges, including the lack of comprehensive knowledge to deal with complicated driving context. In this paper, we are motivated to investigate coop…
▽ More
The emergence of Intelligent Connected Vehicles (ICVs) shows great potential for future intelligent traffic systems, enhancing both traffic safety and road efficiency. However, the ICVs relying on data driven perception and driving models face many challenges, including the lack of comprehensive knowledge to deal with complicated driving context. In this paper, we are motivated to investigate cooperative knowledge sharing for ICVs. We propose a secure and efficient directed acyclic graph (DAG) blockchain based knowledge sharing framework, aiming to cater for the micro-transaction based vehicular networks. The framework can realize both local and cross-regional knowledge sharing. Then, the framework is applied to autonomous driving applications, wherein machine learning based models for autonomous driving control can be shared. A lightweight tip selection algorithm (TSA) is proposed for the DAG based knowledge sharing framework to achieve consensus and identity verification for cross-regional vehicles. To enhance model accuracy as well as minimizing bandwidth consumption, an adaptive asynchronous distributed learning (ADL) based scheme is proposed for model uploading and downloading. Experiment results show that the blockchain based knowledge sharing is secure, and it can resist attacks from malicious users. In addition, the proposed adaptive ADL scheme can enhance driving safety related performance compared to several existing algorithms.
△ Less
Submitted 2 November, 2021; v1 submitted 3 August, 2021;
originally announced August 2021.
-
Guided Data Discovery in Interactive Visualizations via Active Search
Authors:
Shayan Monadjemi,
Sunwoo Ha,
Quan Nguyen,
Henry Chai,
Roman Garnett,
Alvitta Ottley
Abstract:
Recent advances in visual analytics have enabled us to learn from user interactions and uncover analytic goals. These innovations set the foundation for actively guiding users during data exploration. Providing such guidance will become more critical as datasets grow in size and complexity, precluding exhaustive investigation. Meanwhile, the machine learning community also struggles with datasets…
▽ More
Recent advances in visual analytics have enabled us to learn from user interactions and uncover analytic goals. These innovations set the foundation for actively guiding users during data exploration. Providing such guidance will become more critical as datasets grow in size and complexity, precluding exhaustive investigation. Meanwhile, the machine learning community also struggles with datasets growing in size and complexity, precluding exhaustive labeling. Active learning is a broad family of algorithms developed for actively guiding models during training. We will consider the intersection of these analogous research thrusts. First, we discuss the nuances of matching the choice of an active learning algorithm to the task at hand. This is critical for performance, a fact we demonstrate in a simulation study. We then present results of a user study for the particular task of data discovery guided by an active learning algorithm specifically designed for this task.
△ Less
Submitted 15 July, 2022; v1 submitted 16 October, 2020;
originally announced October 2020.
-
Limitations in quantum computing from resource constraints
Authors:
Marco Fellous-Asiani,
Jing Hao Chai,
Robert S. Whitney,
Alexia Auffèves,
Hui Khoon Ng
Abstract:
Fault-tolerant schemes can use error correction to make a quantum computation arbitrarily ac- curate, provided that errors per physical component are smaller than a certain threshold and in- dependent of the computer size. However in current experiments, physical resource limitations like energy, volume or available bandwidth induce error rates that typically grow as the computer grows. Taking int…
▽ More
Fault-tolerant schemes can use error correction to make a quantum computation arbitrarily ac- curate, provided that errors per physical component are smaller than a certain threshold and in- dependent of the computer size. However in current experiments, physical resource limitations like energy, volume or available bandwidth induce error rates that typically grow as the computer grows. Taking into account these constraints, we show that the amount of error correction can be opti- mized, leading to a maximum attainable computational accuracy. We find this maximum for generic situations where noise is scale-dependent. By inverting the logic, we provide experimenters with a tool to finding the minimum resources required to run an algorithm with a given computational accuracy. When combined with a full-stack quantum computing model, this provides the basis for energetic estimates of future large-scale quantum computers.
△ Less
Submitted 8 August, 2021; v1 submitted 3 July, 2020;
originally announced July 2020.
-
Robust Identification of Gene-Environment Interactions under High-Dimensional Accelerated Failure Time Models
Authors:
Qingzhao Zhang,
Hao Chai,
Shuangge Ma
Abstract:
For complex diseases, beyond the main effects of genetic (G) and environmental (E) factors, gene-environment (G-E) interactions also play an important role. Many of the existing G-E interaction methods conduct marginal analysis, which may not appropriately describe disease biology. Joint analysis methods have been developed, with most of the existing loss functions constructed based on likelihood.…
▽ More
For complex diseases, beyond the main effects of genetic (G) and environmental (E) factors, gene-environment (G-E) interactions also play an important role. Many of the existing G-E interaction methods conduct marginal analysis, which may not appropriately describe disease biology. Joint analysis methods have been developed, with most of the existing loss functions constructed based on likelihood. In practice, data contamination is not uncommon. Development of robust methods for interaction analysis that can accommodate data contamination is very limited. In this study, we consider censored survival data and adopt an accelerated failure time (AFT) model. An exponential squared loss is adopted to achieve robustness. A sparse group penalization approach, which respects the "main effects, interactions" hierarchy, is adopted for estimation and identification. Consistency properties are rigorously established. Simulation shows that the proposed method outperforms direct competitors. In data analysis, the proposed method makes biologically sensible findings.
△ Less
Submitted 5 March, 2020;
originally announced March 2020.
-
An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos
Authors:
Sicheng Zhao,
Yunsheng Ma,
Yang Gu,
Jufeng Yang,
Tengfei Xing,
Pengfei Xu,
Runbo Hu,
Hua Chai,
Kurt Keutzer
Abstract:
Emotion recognition in user-generated videos plays an important role in human-centered computing. Existing methods mainly employ traditional two-stage shallow pipeline, i.e. extracting visual and/or audio features and training classifiers. In this paper, we propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs). Specifically, we develop a deep Vis…
▽ More
Emotion recognition in user-generated videos plays an important role in human-centered computing. Existing methods mainly employ traditional two-stage shallow pipeline, i.e. extracting visual and/or audio features and training classifiers. In this paper, we propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs). Specifically, we develop a deep Visual-Audio Attention Network (VAANet), a novel architecture that integrates spatial, channel-wise, and temporal attentions into a visual 3D CNN and temporal attentions into an audio 2D CNN. Further, we design a special classification loss, i.e. polarity-consistent cross-entropy loss, based on the polarity-emotion hierarchy constraint to guide the attention generation. Extensive experiments conducted on the challenging VideoEmotion-8 and Ekman-6 datasets demonstrate that the proposed VAANet outperforms the state-of-the-art approaches for video emotion recognition. Our source code is released at: https://github.com/maysonma/VAANet.
△ Less
Submitted 12 February, 2020;
originally announced March 2020.
-
Multi-source Distilling Domain Adaptation
Authors:
Sicheng Zhao,
Guangzhi Wang,
Shanghang Zhang,
Yang Gu,
Yaxian Li,
Zhichao Song,
Pengfei Xu,
Runbo Hu,
Hua Chai,
Kurt Keutzer
Abstract:
Deep neural networks suffer from performance decay when there is domain shift between the labeled source domain and unlabeled target domain, which motivates the research on domain adaptation (DA). Conventional DA methods usually assume that the labeled data is sampled from a single source distribution. However, in practice, labeled data may be collected from multiple sources, while naive applicati…
▽ More
Deep neural networks suffer from performance decay when there is domain shift between the labeled source domain and unlabeled target domain, which motivates the research on domain adaptation (DA). Conventional DA methods usually assume that the labeled data is sampled from a single source distribution. However, in practice, labeled data may be collected from multiple sources, while naive application of the single-source DA algorithms may lead to suboptimal solutions. In this paper, we propose a novel multi-source distilling domain adaptation (MDDA) network, which not only considers the different distances among multiple sources and the target, but also investigates the different similarities of the source samples to the target ones. Specifically, the proposed MDDA includes four stages: (1) pre-train the source classifiers separately using the training data from each source; (2) adversarially map the target into the feature space of each source respectively by minimizing the empirical Wasserstein distance between source and target; (3) select the source training samples that are closer to the target to fine-tune the source classifiers; and (4) classify each encoded target feature by corresponding source classifier, and aggregate different predictions using respective domain weight, which corresponds to the discrepancy between each source and target. Extensive experiments are conducted on public DA benchmarks, and the results demonstrate that the proposed MDDA significantly outperforms the state-of-the-art approaches. Our source code is released at: https://github.com/daoyuan98/MDDA.
△ Less
Submitted 7 February, 2020; v1 submitted 22 November, 2019;
originally announced November 2019.
-
Multi-source Domain Adaptation for Semantic Segmentation
Authors:
Sicheng Zhao,
Bo Li,
Xiangyu Yue,
Yang Gu,
Pengfei Xu,
Runbo Hu,
Hua Chai,
Kurt Keutzer
Abstract:
Simulation-to-real domain adaptation for semantic segmentation has been actively studied for various applications such as autonomous driving. Existing methods mainly focus on a single-source setting, which cannot easily handle a more practical scenario of multiple sources with different distributions. In this paper, we propose to investigate multi-source domain adaptation for semantic segmentation…
▽ More
Simulation-to-real domain adaptation for semantic segmentation has been actively studied for various applications such as autonomous driving. Existing methods mainly focus on a single-source setting, which cannot easily handle a more practical scenario of multiple sources with different distributions. In this paper, we propose to investigate multi-source domain adaptation for semantic segmentation. Specifically, we design a novel framework, termed Multi-source Adversarial Domain Aggregation Network (MADAN), which can be trained in an end-to-end manner. First, we generate an adapted domain for each source with dynamic semantic consistency while aligning at the pixel-level cycle-consistently towards the target. Second, we propose sub-domain aggregation discriminator and cross-domain cycle discriminator to make different adapted domains more closely aggregated. Finally, feature-level alignment is performed between the aggregated domain and target domain while training the segmentation network. Extensive experiments from synthetic GTA and SYNTHIA to real Cityscapes and BDDS datasets demonstrate that the proposed MADAN model outperforms state-of-the-art approaches. Our source code is released at: https://github.com/Luodian/MADAN.
△ Less
Submitted 27 October, 2019;
originally announced October 2019.
-
BINOCULARS for Efficient, Nonmyopic Sequential Experimental Design
Authors:
Shali Jiang,
Henry Chai,
Javier Gonzalez,
Roman Garnett
Abstract:
Finite-horizon sequential experimental design (SED) arises naturally in many contexts, including hyperparameter tuning in machine learning among more traditional settings. Computing the optimal policy for such problems requires solving Bellman equations, which are generally intractable. Most existing work resorts to severely myopic approximations by limiting the decision horizon to only a single t…
▽ More
Finite-horizon sequential experimental design (SED) arises naturally in many contexts, including hyperparameter tuning in machine learning among more traditional settings. Computing the optimal policy for such problems requires solving Bellman equations, which are generally intractable. Most existing work resorts to severely myopic approximations by limiting the decision horizon to only a single time-step, which can underweight exploration in favor of exploitation. We present BINOCULARS: Batch-Informed NOnmyopic Choices, Using Long-horizons for Adaptive, Rapid SED, a general framework for deriving efficient, nonmyopic approximations to the optimal experimental policy. Our key idea is simple and surprisingly effective: we first compute a one-step optimal batch of experiments, then select a single point from this batch to evaluate. We realize BINOCULARS for Bayesian optimization and Bayesian quadrature -- two notable SED problems with radically different objectives -- and demonstrate that BINOCULARS significantly outperforms myopic alternatives in real-world scenarios.
△ Less
Submitted 9 February, 2020; v1 submitted 10 September, 2019;
originally announced September 2019.
-
ROAM: Recurrently Optimizing Tracking Model
Authors:
Tianyu Yang,
Pengfei Xu,
Runbo Hu,
Hua Chai,
Antoni B. Chan
Abstract:
In this paper, we design a tracking model consisting of response generation and bounding box regression, where the first component produces a heat map to indicate the presence of the object at different positions and the second part regresses the relative bounding box shifts to anchors mounted on sliding-window locations. Thanks to the resizable convolutional filters used in both components to ada…
▽ More
In this paper, we design a tracking model consisting of response generation and bounding box regression, where the first component produces a heat map to indicate the presence of the object at different positions and the second part regresses the relative bounding box shifts to anchors mounted on sliding-window locations. Thanks to the resizable convolutional filters used in both components to adapt to the shape changes of objects, our tracking model does not need to enumerate different sized anchors, thus saving model parameters. To effectively adapt the model to appearance variations, we propose to offline train a recurrent neural optimizer to update tracking model in a meta-learning setting, which can converge the model in a few gradient steps. This improves the convergence speed of updating the tracking model while achieving better performance. We extensively evaluate our trackers, ROAM and ROAM++, on the OTB, VOT, LaSOT, GOT-10K and TrackingNet benchmark and our methods perform favorably against state-of-the-art algorithms.
△ Less
Submitted 24 March, 2020; v1 submitted 27 July, 2019;
originally announced July 2019.
-
POI Semantic Model with a Deep Convolutional Structure
Authors:
Ji Zhao,
Meiyu Yu,
Huan Chen,
Boning Li,
Lingyu Zhang,
Qi Song,
Li Ma,
Hua Chai,
Jieping Ye
Abstract:
When using the electronic map, POI retrieval is the initial and important step, whose quality directly affects the user experience. Similarity between user query and POI information is the most critical feature in POI retrieval. An accurate similarity calculation is challenging since the mismatch between a query and a retrieval text may exist in the case of a mistyped query or an alias inquiry. In…
▽ More
When using the electronic map, POI retrieval is the initial and important step, whose quality directly affects the user experience. Similarity between user query and POI information is the most critical feature in POI retrieval. An accurate similarity calculation is challenging since the mismatch between a query and a retrieval text may exist in the case of a mistyped query or an alias inquiry. In this paper, we propose a POI latent semantic model based on deep networks, which can effectively extract query features and POI information features for the similarity calculation. Our model describes the semantic information of complex texts at multiple layers, and achieves multi-field matches by modeling POI's name and detailed address respectively. Our model is evaluated by the POI retrieval ranking datasets, including the labeled data of relevance and real-world user click data in POI retrieval. Results show that our model significantly outperforms our competitors in POI retrieval ranking tasks. The proposed algorithm has become a critical component of an online system serving millions of people everyday.
△ Less
Submitted 18 March, 2019;
originally announced March 2019.
-
Automated Model Selection with Bayesian Quadrature
Authors:
Henry Chai,
Jean-Francois Ton,
Roman Garnett,
Michael A. Osborne
Abstract:
We present a novel technique for tailoring Bayesian quadrature (BQ) to model selection. The state-of-the-art for comparing the evidence of multiple models relies on Monte Carlo methods, which converge slowly and are unreliable for computationally expensive models. Previous research has shown that BQ offers sample efficiency superior to Monte Carlo in computing the evidence of an individual model.…
▽ More
We present a novel technique for tailoring Bayesian quadrature (BQ) to model selection. The state-of-the-art for comparing the evidence of multiple models relies on Monte Carlo methods, which converge slowly and are unreliable for computationally expensive models. Previous research has shown that BQ offers sample efficiency superior to Monte Carlo in computing the evidence of an individual model. However, applying BQ directly to model comparison may waste computation producing an overly-accurate estimate for the evidence of a clearly poor model. We propose an automated and efficient algorithm for computing the most-relevant quantity for model selection: the posterior probability of a model. Our technique maximizes the mutual information between this quantity and observations of the models' likelihoods, yielding efficient acquisition of samples across disparate model spaces when likelihood observations are limited. Our method produces more-accurate model posterior estimates using fewer model likelihood evaluations than standard Bayesian quadrature and Monte Carlo estimators, as we demonstrate on synthetic and real-world examples.
△ Less
Submitted 1 March, 2019; v1 submitted 25 February, 2019;
originally announced February 2019.
-
Traffic-aware Threshold Adjustment for NFV Scaling using DDPG
Authors:
Hua Chai
Abstract:
Current solutions mostly focus on how to predict traffic, rather than observing traffic characteristics in a specific NFV scenario. So, most of them use a uniform threshold to scale in/out. In real NFV scenario, each VNF may serve the one or more flows, and the characteristics of these flows are completely different, a uniform threshold used in this scenario is not suitable, because each VNF has a…
▽ More
Current solutions mostly focus on how to predict traffic, rather than observing traffic characteristics in a specific NFV scenario. So, most of them use a uniform threshold to scale in/out. In real NFV scenario, each VNF may serve the one or more flows, and the characteristics of these flows are completely different, a uniform threshold used in this scenario is not suitable, because each VNF has a distinct processing logic depending on incident network traffic and events. Even if certain VNFs share packet processing functionality such as packet header analysis, the differences in upper-layer processing and implementation can exhibit unique resource usage patterns.
We proposes a dynamic threshold scaling mechanism that can tailor thresholds according to each VNF's characteristic. As setting thresholds is a per-VNF task, and requires a deep understanding of workload trends and the diversity of each VNF, so we have added tailor-made features to the traditional dynamic mechanism. Besides, we also reserve resources by predicting workload and add an emergency module to cope with anomaly traffic, that is to say we develop a hybrid scaling policy combining proactive and reactive scaling together. Moreover, the sharp rise of network traffic not only can be caused by large amount of new incoming flows, but also can be induced by the growing of existing flows. If the traffic arises mainly due to the growing of existing flows, then only rerouting new flows can not alleviate the overload quickly and SLAs may be violated \cite{zhang2016co}. The only method to avoid SLA violations is to migrate flows and associated NF internal states quickly and safely from existing instances to new scaled instances, so state migration is an important part of the scaling procedure. We achieved the flow migration in scaling process on openNF to guarantee the accuracy and timeline of scaling.
△ Less
Submitted 20 November, 2018;
originally announced November 2018.
-
Improving Quadrature for Constrained Integrands
Authors:
Henry Chai,
Roman Garnett
Abstract:
We present an improved Bayesian framework for performing inference of affine transformations of constrained functions. We focus on quadrature with nonnegative functions, a common task in Bayesian inference. We consider constraints on the range of the function of interest, such as nonnegativity or boundedness. Although our framework is general, we derive explicit approximation schemes for these con…
▽ More
We present an improved Bayesian framework for performing inference of affine transformations of constrained functions. We focus on quadrature with nonnegative functions, a common task in Bayesian inference. We consider constraints on the range of the function of interest, such as nonnegativity or boundedness. Although our framework is general, we derive explicit approximation schemes for these constraints, and argue for the use of a log transformation for functions with high dynamic range such as likelihood surfaces. We propose a novel method for optimizing hyperparameters in this framework: we optimize the marginal likelihood in the original space, as opposed to in the transformed space. The result is a model that better explains the actual data. Experiments on synthetic and real-world data demonstrate our framework achieves superior estimates using less wall-clock time than existing Bayesian quadrature procedures.
△ Less
Submitted 27 February, 2019; v1 submitted 13 February, 2018;
originally announced February 2018.
-
Leveraging Long and Short-term Information in Content-aware Movie Recommendation
Authors:
Wei Zhao,
Haixia Chai,
Benyou Wang,
Jianbo Ye,
Min Yang,
Zhou Zhao,
Xiaojun Chen
Abstract:
Movie recommendation systems provide users with ranked lists of movies based on individual's preferences and constraints. Two types of models are commonly used to generate ranking results: long-term models and session-based models. While long-term models represent the interactions between users and movies that are supposed to change slowly across time, session-based models encode the information o…
▽ More
Movie recommendation systems provide users with ranked lists of movies based on individual's preferences and constraints. Two types of models are commonly used to generate ranking results: long-term models and session-based models. While long-term models represent the interactions between users and movies that are supposed to change slowly across time, session-based models encode the information of users' interests and changing dynamics of movies' attributes in short terms. In this paper, we propose an LSIC model, leveraging Long and Short-term Information in Content-aware movie recommendation using adversarial training. In the adversarial process, we train a generator as an agent of reinforcement learning which recommends the next movie to a user sequentially. We also train a discriminator which attempts to distinguish the generated list of movies from the real records. The poster information of movies is integrated to further improve the performance of movie recommendation, which is specifically essential when few ratings are available. The experiments demonstrate that the proposed model has robust superiority over competitors and sets the state-of-the-art. We will release the source code of this work after publication.
△ Less
Submitted 26 June, 2018; v1 submitted 25 December, 2017;
originally announced December 2017.
-
Identifying Gene-Environment Interactions with A Least Relative Error Approach
Authors:
Yangguang Zang,
Yinjun Zhao,
Qingzhao Zhang,
Hao Chai,
Sanguo Zhang,
Shuangge Ma
Abstract:
For complex diseases, the interactions between genetic and environmental risk factors can have important implications beyond the main effects. Many of the existing interaction analyses conduct marginal analysis and cannot accommodate the joint effects of multiple main effects and interactions. In this study, we conduct joint analysis which can simultaneously accommodate a large number of effects.…
▽ More
For complex diseases, the interactions between genetic and environmental risk factors can have important implications beyond the main effects. Many of the existing interaction analyses conduct marginal analysis and cannot accommodate the joint effects of multiple main effects and interactions. In this study, we conduct joint analysis which can simultaneously accommodate a large number of effects. Significantly different from the existing studies, we adopt loss functions based on relative errors, which offer a useful alternative to the "classic" methods such as the least squares and least absolute deviation. Further to accommodate censoring in the response variable, we adopt a weighted approach. Penalization is used for identification and regularized estimation. Computationally, we develop an effective algorithm which combines the majorize-minimization and coordinate descent. Simulation shows that the proposed approach has satisfactory performance. We also analyze lung cancer prognosis data with gene expression measurements.
△ Less
Submitted 29 May, 2016;
originally announced May 2016.
-
Least-bias state estimation with incomplete unbiased measurements
Authors:
J. Rehacek,
Z. Hradil,
Y. S. Teo,
L. L. Sanchez-Soto,
H. K. Ng,
J. H. Chai,
B. -G. Englert
Abstract:
Measuring incomplete sets of mutually unbiased bases constitutes a sensible approach to the tomography of high-dimensional quantum systems. The unbiased nature of these bases optimizes the uncertainty hypervolume. However, imposing unbiasedness on the probabilities for the unmeasured bases does not generally yield the estimator with the largest von Neumann entropy, a popular figure of merit in thi…
▽ More
Measuring incomplete sets of mutually unbiased bases constitutes a sensible approach to the tomography of high-dimensional quantum systems. The unbiased nature of these bases optimizes the uncertainty hypervolume. However, imposing unbiasedness on the probabilities for the unmeasured bases does not generally yield the estimator with the largest von Neumann entropy, a popular figure of merit in this context. Furthermore, this imposition typically leads to mock density matrices that are not even positive definite. This provides a strong argument against perfunctory applications of linear estimation strategies. We propose to use instead the physical state estimators that maximize the Shannon entropy of the unmeasured outcomes, which quantifies our lack of knowledge fittingly and gives physically meaningful statistical predictions.
△ Less
Submitted 25 September, 2015;
originally announced September 2015.
-
A Robust Approach for Identifying Gene-Environment Interactions for Prognosis
Authors:
Hao Chai,
Qingzhao Zhang,
Yu Jiang,
Guohua Wang,
Sanguo Zhang,
Shuangge Ma
Abstract:
For many complex diseases, prognosis is of essential importance. It has been shown that, beyond the main effects of genetic (G) and environmental (E) risk factors, the gene-environment (G$\times$E) interactions also play a critical role. In practice, the prognosis outcome data can be contaminated, and most of the existing methods are not robust to data contamination. In the literature, it has been…
▽ More
For many complex diseases, prognosis is of essential importance. It has been shown that, beyond the main effects of genetic (G) and environmental (E) risk factors, the gene-environment (G$\times$E) interactions also play a critical role. In practice, the prognosis outcome data can be contaminated, and most of the existing methods are not robust to data contamination. In the literature, it has been shown that even a single contaminated observation can lead to severely biased model estimation. In this study, we describe prognosis using an accelerated failure time (AFT) model. An exponential squared loss is proposed to accommodate possible data contamination. A penalization approach is adopted for regularized estimation and marker selection. The proposed method is realized using an effective coordinate descent (CD) and minorization maximization (MM) algorithm. Simulation shows that without contamination, the proposed method has performance comparable to or better than the unrobust alternative. With contamination, it outperforms the unrobust alternative and, under certain scenarios, can be superior to the robust method based on quantile regression. The proposed method is applied to the analysis of TCGA (The Cancer Genome Atlas) lung cancer data. It identifies interactions different from those using the alternatives. The identified marker have important implications and satisfactory stability.
△ Less
Submitted 13 May, 2015;
originally announced May 2015.
-
Studies on Point Estimators for Incomplete Tomography of Qutrits
Authors:
Jing Hao Chai
Abstract:
This is a Bachelor's thesis on point estimators for incomplete tomography of qutrits as of 2014, submitted to the National University of Singapore. The main content of the thesis focuses on various methods of estimation such as maximum entropy and average estimator and show that they are quite different. Numerical simulations of these methods show however that these estimators perform very close t…
▽ More
This is a Bachelor's thesis on point estimators for incomplete tomography of qutrits as of 2014, submitted to the National University of Singapore. The main content of the thesis focuses on various methods of estimation such as maximum entropy and average estimator and show that they are quite different. Numerical simulations of these methods show however that these estimators perform very close to one another. Therefore on this basis, there is no reason to favor one method over another.
△ Less
Submitted 23 March, 2015;
originally announced March 2015.
-
Integrability of the Gross-Pitaevskii Equation with Feshbach Resonance management
Authors:
Dun Zhao,
Hua-Yue Chai,
Hong-Gang Luo
Abstract:
In this paper we study the integrability of a class of Gross-Pitaevskii equations managed by Feshbach resonance in an expulsive parabolic external potential. By using WTC test, we find a condition under which the Gross-Pitaevskii equation is completely integrable. Under the present model, this integrability condition is completely consistent with that proposed by Serkin, Hasegawa, and Belyaeva […
▽ More
In this paper we study the integrability of a class of Gross-Pitaevskii equations managed by Feshbach resonance in an expulsive parabolic external potential. By using WTC test, we find a condition under which the Gross-Pitaevskii equation is completely integrable. Under the present model, this integrability condition is completely consistent with that proposed by Serkin, Hasegawa, and Belyaeva [V. N. Serkin et al., Phys. Rev. Lett. 98, 074102 (2007)]. Furthermore, this integrability can also be explicitly shown by a transformation, which can convert the Gross-Pitaevskii equation into the well-known standard nonlinear Schrödinger equation. By this transformation, each exact solution of the standard nonlinear Schrödinger equation can be converted into that of the Gross-Pitaevskii equation, which builds a systematical connection between the canonical solitons and the so-called nonautonomous ones. The finding of this transformation has a significant contribution to understanding the essential properties of the nonautonomous solitions and the dynamics of the Bose-Einstein condensates by using the Feshbach resonance technique.
△ Less
Submitted 28 August, 2008;
originally announced August 2008.