Skip to main content

Showing 1–50 of 82 results for author: Nie, X

  1. arXiv:2410.14142  [pdf, ps, other

    cs.IT

    Secure Collaborative Computation Offloading and Resource Allocation in Cache-Assisted Ultra-Dense IoT Networks With Multi-Slope Channels

    Authors: Tianqing Zhou, Bobo Wang, Dong Qin, Xuefang Nie, Nan Jiang, Chunguo Li

    Abstract: Cache-assisted ultra-dense mobile edge computing (MEC) networks are a promising solution for meeting the increasing demands of numerous Internet-of-Things mobile devices (IMDs). To address the complex interferences caused by small base stations (SBSs) deployed densely in such networks, this paper explores the combination of orthogonal frequency division multiple access (OFDMA), non-orthogonal mult… ▽ More

    Submitted 21 October, 2024; v1 submitted 17 October, 2024; originally announced October 2024.

  2. arXiv:2410.13333  [pdf, other

    cs.DC

    Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization

    Authors: Haoyang Li, Fangcheng Fu, Hao Ge, Sheng Lin, Xuanyu Wang, Jiawen Niu, Yujie Wang, Hailin Zhang, Xiaonan Nie, Bin Cui

    Abstract: As the scale of models and training data continues to grow, there is an expanding reliance on more GPUs to train large-scale models, which inevitably increases the likelihood of encountering dynamic stragglers that some devices lag behind in performance occasionally. However, hybrid parallel training, one of the de facto paradigms to train large models, is typically sensitive to the stragglers.… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  3. arXiv:2409.06135  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis

    Authors: Qi Yang, Binjie Mao, Zili Wang, Xing Nie, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan, Shiming Xiang

    Abstract: Foley is a term commonly used in filmmaking, referring to the addition of daily sound effects to silent films or videos to enhance the auditory experience. Video-to-Audio (V2A), as a particular type of automatic foley task, presents inherent challenges related to audio-visual synchronization. These challenges encompass maintaining the content consistency between the input video and the generated a… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: 14 pages, 11 figures

  4. arXiv:2409.01327  [pdf, other

    cs.CV

    SPDiffusion: Semantic Protection Diffusion for Multi-concept Text-to-image Generation

    Authors: Yang Zhang, Rui Zhang, Xuecheng Nie, Haochen Li, Jikun Chen, Yifan Hao, Xin Zhang, Luoqi Liu, Ling Li

    Abstract: Recent text-to-image models have achieved remarkable success in generating high-quality images. However, when tasked with multi-concept generation which creates images containing multiple characters or objects, existing methods often suffer from attribute confusion, resulting in severe text-image inconsistency. We found that attribute confusion occurs when a certain region of the latent features a… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  5. arXiv:2409.01143  [pdf, other

    cs.DC

    FlashFlex: Accommodating Large Language Model Training over Heterogeneous Environment

    Authors: Ran Yan, Youhe Jiang, Wangcheng Tao, Xiaonan Nie, Bin Cui, Binhang Yuan

    Abstract: Training large language model (LLM) is a computationally intensive task, which is typically conducted in data centers with homogeneous high-performance GPUs. This paper explores an alternative approach by deploying the training computation across heterogeneous GPUs to enable better flexibility and efficiency for heterogeneous resource utilization. To achieve this goal, we propose a novel system, F… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  6. arXiv:2409.00997  [pdf, other

    cs.CL

    DataSculpt: Crafting Data Landscapes for Long-Context LLMs through Multi-Objective Partitioning

    Authors: Keer Lu, Xiaonan Nie, Zheng Liang, Da Pan, Shusen Zhang, Keshi Zhao, Weipeng Chen, Zenan Zhou, Guosheng Dong, Bin Cui, Wentao Zhang

    Abstract: In recent years, Large Language Models (LLMs) have demonstrated significant improvements across a variety of tasks, one of which is the long-context capability. The key to improving long-context performance lies in effective data organization and management strategies that integrate data from multiple domains and optimize the context window during training. Through extensive experimental analysis,… ▽ More

    Submitted 2 October, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

  7. arXiv:2408.15079  [pdf, other

    cs.CL cs.AI

    BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline

    Authors: Guosheng Dong, Da Pan, Yiding Sun, Shusen Zhang, Zheng Liang, Xin Wu, Yanjun Shen, Fan Yang, Haoze Sun, Tianpeng Li, Mingan Lin, Jianhua Xu, Yufan Zhang, Xiaonan Nie, Lei Su, Bingning Wang, Wentao Zhang, Jiaxin Mao, Zenan Zhou, Weipeng Chen

    Abstract: The general capabilities of Large Language Models (LLM) highly rely on the composition and selection on extensive pretraining datasets, treated as commercial secrets by several institutions. To mitigate this issue, we open-source the details of a universally applicable data processing pipeline and validate its effectiveness and potential by introducing a competitive LLM baseline. Specifically, the… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 19 pages, 6 figures

  8. arXiv:2408.14158  [pdf, other

    cs.DC cs.AI

    Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

    Authors: Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei , et al. (27 additional authors not shown)

    Abstract: The rapid progress in Deep Learning (DL) and Large Language Models (LLMs) has exponentially increased demands of computational power and bandwidth. This, combined with the high costs of faster computing chips and interconnects, has significantly inflated High Performance Computing (HPC) construction costs. To address these challenges, we introduce the Fire-Flyer AI-HPC architecture, a synergistic… ▽ More

    Submitted 31 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: This is the preprint version of the paper accepted for presentation at the 2024 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'24). \c{opyright} 2024 IEEE. Personal use of this material is permitted. For other uses, permission from IEEE must be obtained. Please refer to IEEE Xplore for the final published version

  9. arXiv:2407.14532  [pdf, other

    cs.DC cs.LG

    A Scenario-Oriented Benchmark for Assessing AIOps Algorithms in Microservice Management

    Authors: Yongqian Sun, Jiaju Wang, Zhengdan Li, Xiaohui Nie, Minghua Ma, Shenglin Zhang, Yuhe Ji, Lu Zhang, Wen Long, Hengmao Chen, Yongnan Luo, Dan Pei

    Abstract: AIOps algorithms play a crucial role in the maintenance of microservice systems. Many previous benchmarks' performance leaderboard provides valuable guidance for selecting appropriate algorithms. However, existing AIOps benchmarks mainly utilize offline datasets to evaluate algorithms. They cannot consistently evaluate the performance of algorithms using real-time datasets, and the operation scena… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Codes are available at https://github.com/MicroServo/microservo, datasets are available at https://github.com/MicroServo/hot-plugging

  10. arXiv:2407.12820  [pdf, other

    cs.CL cs.AI cs.LG

    PQCache: Product Quantization-based KVCache for Long Context LLM Inference

    Authors: Hailin Zhang, Xiaodong Ji, Yilin Chen, Fangcheng Fu, Xupeng Miao, Xiaonan Nie, Weipeng Chen, Bin Cui

    Abstract: As the field of Large Language Models (LLMs) continues to evolve, the context length in inference is steadily growing. Key-Value Cache (KVCache), a crucial component in LLM inference, has now become the primary memory bottleneck due to limited GPU memory. Current methods selectively determine suitable keys and values for self-attention computation in LLMs to address the issue. However, they either… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  11. arXiv:2407.12117  [pdf, other

    cs.LG cs.DC

    Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs

    Authors: Pinxue Zhao, Hailin Zhang, Fangcheng Fu, Xiaonan Nie, Qibin Liu, Fang Yang, Yuanbo Peng, Dian Jiao, Shuaipeng Li, Jinbao Xue, Yangyu Tao, Bin Cui

    Abstract: Nowadays, Large Language Models (LLMs) have been trained using extended context lengths to foster more creative applications. However, long context training poses great challenges considering the constraint of GPU memory. It not only leads to substantial activation memory consumption during training, but also incurs considerable memory fragmentation. To facilitate long context training, existing f… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  12. arXiv:2407.05389  [pdf, other

    cs.CV cs.AI

    Image-Conditional Diffusion Transformer for Underwater Image Enhancement

    Authors: Xingyang Nie, Su Pan, Xiaoyu Zhai, Shifei Tao, Fengzhong Qu, Biao Wang, Huilin Ge, Guojie Xiao

    Abstract: Underwater image enhancement (UIE) has attracted much attention owing to its importance for underwater operation and marine engineering. Motivated by the recent advance in generative models, we propose a novel UIE method based on image-conditional diffusion transformer (ICDT). Our method takes the degraded underwater image as the conditional input and converts it into latent space where ICDT is ap… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  13. arXiv:2406.11739  [pdf, other

    cs.CV

    V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

    Authors: Jiaqi Wang, Yuhang Zang, Pan Zhang, Tao Chu, Yuhang Cao, Zeyi Sun, Ziyu Liu, Xiaoyi Dong, Tong Wu, Dahua Lin, Zeming Chen, Zhi Wang, Lingchen Meng, Wenhao Yao, Jianwei Yang, Sihong Wu, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou , et al. (9 additional authors not shown)

    Abstract: Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  14. arXiv:2406.09961  [pdf, other

    cs.SE cs.CL cs.CV

    ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

    Authors: Chufan Shi, Cheng Yang, Yaxin Liu, Bo Shui, Junjie Wang, Mohan Jing, Linran Xu, Xinyu Zhu, Siheng Li, Yuxiang Zhang, Gongye Liu, Xiaomei Nie, Deng Cai, Yujiu Yang

    Abstract: We introduce a new benchmark, ChartMimic, aimed at assessing the visually-grounded code generation capabilities of large multimodal models (LMMs). ChartMimic utilizes information-intensive visual charts and textual instructions as inputs, requiring LMMs to generate the corresponding code for chart rendering. ChartMimic includes 1,000 human-curated (figure, instruction, code) triplets, which repres… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Data and code are available at https://github.com/ChartMimic/ChartMimic

  15. arXiv:2406.09201  [pdf, other

    cs.CV

    Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024

    Authors: Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou, Boning Wang, Yansong Peng, Hebei Li

    Abstract: In this technical report, we present our findings from the research conducted on the Vast Vocabulary Visual Detection (V3Det) dataset for Supervised Vast Vocabulary Visual Detection task. How to deal with complex categories and detection boxes has become a difficulty in this track. The original supervised detector is not suitable for this task. We have designed a series of improvements, including… ▽ More

    Submitted 21 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Journal ref: Second Place in CVPR 2024 Vast Vocabulary Visual Detection Challenge

  16. arXiv:2405.17188  [pdf, other

    cs.CV

    The SkatingVerse Workshop & Challenge: Methods and Results

    Authors: Jian Zhao, Lei Jin, Jianshu Li, Zheng Zhu, Yinglei Teng, Jiaojiao Zhao, Sadaf Gulshad, Zheng Wang, Bo Zhao, Xiangbo Shu, Yunchao Wei, Xuecheng Nie, Xiaojie Jin, Xiaodan Liang, Shin'ichi Satoh, Yandong Guo, Cewu Lu, Junliang Xing, Jane Shen Shengmei

    Abstract: The SkatingVerse Workshop & Challenge aims to encourage research in developing novel and accurate methods for human action understanding. The SkatingVerse dataset used for the SkatingVerse Challenge has been publicly released. There are two subsets in the dataset, i.e., the training subset and testing subset. The training subsets consists of 19,993 RGB video sequences, and the testing subsets cons… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  17. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  18. arXiv:2405.00263  [pdf, other

    cs.CL cs.AI cs.LG

    Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge

    Authors: Bin Xiao, Chunan Shi, Xiaonan Nie, Fan Yang, Xiangwei Deng, Lei Su, Weipeng Chen, Bin Cui

    Abstract: Large language models (LLMs) suffer from low efficiency as the mismatch between the requirement of auto-regressive decoding and the design of most contemporary GPUs. Specifically, billions to trillions of parameters must be loaded to the GPU cache through its limited memory bandwidth for computation, but only a small batch of tokens is actually computed. Consequently, the GPU spends most of its ti… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  19. arXiv:2403.14910  [pdf, other

    cs.CV

    Defying Imbalanced Forgetting in Class Incremental Learning

    Authors: Shixiong Xu, Gaofeng Meng, Xing Nie, Bolin Ni, Bin Fan, Shiming Xiang

    Abstract: We observe a high level of imbalance in the accuracy of different classes in the same old task for the first time. This intriguing phenomenon, discovered in replay-based Class Incremental Learning (CIL), highlights the imbalanced forgetting of learned classes, as their accuracy is similar before the occurrence of catastrophic forgetting. This discovery remains previously unidentified due to the re… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: AAAI2024

  20. arXiv:2403.12768  [pdf, other

    cs.HC

    ContextVis: Envision Contextual Learning and Interaction with Generative Models

    Authors: Bo Shui, Chufan Shi, Yujiu Yang, Xiaomei Nie

    Abstract: ContextVis introduces a workflow by integrating generative models to create contextual learning materials. It aims to boost knowledge acquisition through the creation of resources with contextual cues. A case study on vocabulary learning demonstrates the effectiveness of generative models in developing educational resources that enrich language understanding and aid memory retention. The system co… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted by HCII 2024

  21. arXiv:2403.06243  [pdf, other

    cs.CV

    BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video Deflickering

    Authors: Xinmin Qiu, Congying Han, Zicheng Zhang, Bonan Li, Tiande Guo, Pingyu Wang, Xuecheng Nie

    Abstract: Developing blind video deflickering (BVD) algorithms to enhance video temporal consistency, is gaining importance amid the flourish of image processing and video generation. However, the intricate nature of video data complicates the training of deep learning methods, leading to high resource consumption and instability, notably under severe lighting flicker. This underscores the critical need for… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  22. arXiv:2402.02045  [pdf, other

    cs.CV

    MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning

    Authors: Zhe Li, Laurence T. Yang, Bocheng Ren, Xin Nie, Zhangyang Gao, Cheng Tan, Stan Z. Li

    Abstract: The scarcity of annotated data has sparked significant interest in unsupervised pre-training methods that leverage medical reports as auxiliary signals for medical visual representation learning. However, existing research overlooks the multi-granularity nature of medical visual representation and lacks suitable contrastive learning techniques to improve the models' generalizability across differe… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  23. arXiv:2401.02954  [pdf, other

    cs.CL cs.AI cs.LG

    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    Authors: DeepSeek-AI, :, Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, Jianzhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li , et al. (63 additional authors not shown)

    Abstract: The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  24. arXiv:2312.06462  [pdf, other

    cs.CV cs.AI cs.SD eess.AS

    Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

    Authors: Qi Yang, Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan, Shiming Xiang

    Abstract: Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral… ▽ More

    Submitted 7 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 Highlight. 13 pages, 10 figures

  25. arXiv:2312.01663  [pdf, other

    cs.CV cs.AI

    Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training

    Authors: Runze He, Shaofei Huang, Xuecheng Nie, Tianrui Hui, Luoqi Liu, Jiao Dai, Jizhong Han, Guanbin Li, Si Liu

    Abstract: In this paper, we target the adaptive source driven 3D scene editing task by proposing a CustomNeRF model that unifies a text description or a reference image as the editing prompt. However, obtaining desired editing results conformed with the editing prompt is nontrivial since there exist two significant challenges, including accurate editing of only foreground regions and multi-view consistency… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 14 pages, 13 figures, project website: https://customnerf.github.io/

  26. arXiv:2310.18698  [pdf, other

    cs.CV cs.LG

    Triplet Attention Transformer for Spatiotemporal Predictive Learning

    Authors: Xuesong Nie, Xi Chen, Haoyuan Jin, Zhihang Zhu, Yunfeng Yan, Donglian Qi

    Abstract: Spatiotemporal predictive learning offers a self-supervised learning paradigm that enables models to learn both spatial and temporal patterns by predicting future sequences based on historical sequences. Mainstream methods are dominated by recurrent units, yet they are limited by their lack of parallelization and often underperform in real-world scenarios. To improve prediction quality while maint… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: Accepted to WACV 2024

  27. arXiv:2310.07637  [pdf, other

    cs.AI cs.NI

    OpsEval: A Comprehensive IT Operations Benchmark Suite for Large Language Models

    Authors: Yuhe Liu, Changhua Pei, Longlong Xu, Bohan Chen, Mingze Sun, Zhirui Zhang, Yongqian Sun, Shenglin Zhang, Kun Wang, Haiming Zhang, Jianhui Li, Gaogang Xie, Xidao Wen, Xiaohui Nie, Minghua Ma, Dan Pei

    Abstract: Information Technology (IT) Operations (Ops), particularly Artificial Intelligence for IT Operations (AIOps), is the guarantee for maintaining the orderly and stable operation of existing information systems. According to Gartner's prediction, the use of AI technology for automated IT operations has become a new trend. Large language models (LLMs) that have exhibited remarkable capabilities in NLP… ▽ More

    Submitted 23 August, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

  28. Improving Automatic Parallel Training via Balanced Memory Workload Optimization

    Authors: Yujie Wang, Youhe Jiang, Xupeng Miao, Fangcheng Fu, Shenhan Zhu, Xiaonan Nie, Yaofeng Tu, Bin Cui

    Abstract: Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving as the foundation for advanced large-scale deep learning (DL) models. However, efficiently training these models across multiple GPUs remains a complex challenge due to the abundance of parallelism options. Existing DL systems either require manual efforts… ▽ More

    Submitted 24 February, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2211.13878

  29. arXiv:2306.08460  [pdf, ps, other

    cs.LG cs.AI

    Improving Generalization in Meta-Learning via Meta-Gradient Augmentation

    Authors: Ren Wang, Haoliang Sun, Qi Wei, Xiushan Nie, Yuling Ma, Yilong Yin

    Abstract: Meta-learning methods typically follow a two-loop framework, where each loop potentially suffers from notorious overfitting, hindering rapid adaptation and generalization to new tasks. Existing schemes solve it by enhancing the mutual-exclusivity or diversity of training samples, but these data manipulation strategies are data-dependent and insufficiently flexible. This work alleviates overfitting… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

  30. arXiv:2305.17431  [pdf, other

    cs.CV cs.AI

    Towards Consistent Video Editing with Text-to-Image Diffusion Models

    Authors: Zicheng Zhang, Bonan Li, Xuecheng Nie, Congying Han, Tiande Guo, Luoqi Liu

    Abstract: Existing works have advanced Text-to-Image (TTI) diffusion models for video editing in a one-shot learning manner. Despite their low requirements of data and computation, these methods might produce results of unsatisfied consistency with text prompt as well as temporal sequence, limiting their applications in the real world. In this paper, we propose to address the above issues with a novel EI… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

  31. arXiv:2305.09940   

    cs.DC

    OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning

    Authors: Youhe Jiang, Fangcheng Fu, Xupeng Miao, Xiaonan Nie, Bin Cui

    Abstract: Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the distributed training of ultra-large models. However, directly deploying these systems often leads to sub-optimal training efficiency due to the complex model architecture… ▽ More

    Submitted 17 May, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: An older version is in existence, and the article has been updated there. The URL for the updated version is arXiv:2209.13258

  32. arXiv:2305.04517  [pdf, other

    cs.CV

    DiffBFR: Bootstrapping Diffusion Model Towards Blind Face Restoration

    Authors: Xinmin Qiu, Congying Han, Zicheng Zhang, Bonan Li, Tiande Guo, Xuecheng Nie

    Abstract: Blind face restoration (BFR) is important while challenging. Prior works prefer to exploit GAN-based frameworks to tackle this task due to the balance of quality and efficiency. However, these methods suffer from poor stability and adaptability to long-tail distribution, failing to simultaneously retain source identity and restore detail. We propose DiffBFR to introduce Diffusion Probabilistic Mod… ▽ More

    Submitted 8 August, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

  33. arXiv:2304.03946  [pdf, other

    cs.DC cs.LG

    FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement

    Authors: Xiaonan Nie, Xupeng Miao, Zilong Wang, Zichao Yang, Jilong Xue, Lingxiao Ma, Gang Cao, Bin Cui

    Abstract: With the increasing data volume, there is a trend of using large-scale pre-trained models to store the knowledge into an enormous number of model parameters. The training of these models is composed of lots of dense algebras, requiring a huge amount of hardware resources. Recently, sparsely-gated Mixture-of-Experts (MoEs) are becoming more popular and have demonstrated impressive pretraining scala… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

    Comments: Accepted by SIGMOD 2023

    Journal ref: Proc. ACM Manag. Data, Vol. 1, No. 1, Article 110. Publication date: May 2023

  34. arXiv:2303.06353  [pdf, ps, other

    cs.IT

    Secure and Multi-Step Computation Offloading and Resource Allocation in Ultra-Dense Multi-Task NOMA-Enabled IoT Networks

    Authors: Tianqing Zhou, Yanyan Fu, Dong Qin, Xuefang Nie, Nan Jiang, Chunguo Li

    Abstract: Ultra-dense networks are widely regarded as a promising solution to explosively growing applications of Internet-of-Things (IoT) mobile devices (IMDs). However, complicated and severe interferences need to be tackled properly in such networks. To this end, both orthogonal multiple access (OMA) and non-orthogonal multiple access (NOMA) are utilized at first. Then, in order to attain a goal of green… ▽ More

    Submitted 11 March, 2023; originally announced March 2023.

  35. arXiv:2303.03231  [pdf, other

    cs.CV

    StyO: Stylize Your Face in Only One-Shot

    Authors: Bonan Li, Zicheng Zhang, Xuecheng Nie, Congying Han, Yinhan Hu, Tiande Guo

    Abstract: This paper focuses on face stylization with a single artistic target. Existing works for this task often fail to retain the source content while achieving geometry variation. Here, we present a novel StyO model, ie. Stylize the face in only One-shot, to solve the above problem. In particular, StyO exploits a disentanglement and recombination strategy. It first disentangles the content and style of… ▽ More

    Submitted 6 March, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

  36. arXiv:2303.02868  [pdf, other

    cs.LG cs.DC

    Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent

    Authors: Xiaonan Nie, Yi Liu, Fangcheng Fu, Jinbao Xue, Dian Jiao, Xupeng Miao, Yangyu Tao, Bin Cui

    Abstract: Recent years have witnessed the unprecedented achievements of large-scale pre-trained models, especially the Transformer models. Many products and services in Tencent Inc., such as WeChat, QQ, and Tencent Advertisement, have been opted in to gain the power of pre-trained models. In this work, we present Angel-PTM, a productive deep learning system designed for pre-training and fine-tuning Transfor… ▽ More

    Submitted 5 March, 2023; originally announced March 2023.

  37. arXiv:2212.02222  [pdf, other

    cs.GT cs.AI cs.LG

    Real-time Bidding Strategy in Display Advertising: An Empirical Analysis

    Authors: Mengjuan Liu, Zhengning Hu, Zhi Lai, Daiwei Zheng, Xuyun Nie

    Abstract: Bidding strategies that help advertisers determine bidding prices are receiving increasing attention as more and more ad impressions are sold through real-time bidding systems. This paper first describes the problem and challenges of optimizing bidding strategies for individual advertisers in real-time bidding display advertising. Then, several representative bidding strategies are introduced, esp… ▽ More

    Submitted 30 November, 2022; originally announced December 2022.

  38. arXiv:2211.13878  [pdf, other

    cs.LG cs.DB cs.DC

    Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

    Authors: Xupeng Miao, Yujie Wang, Youhe Jiang, Chunan Shi, Xiaonan Nie, Hailin Zhang, Bin Cui

    Abstract: Transformer models have achieved state-of-the-art performance on various domains of applications and gradually becomes the foundations of the advanced large deep learning (DL) models. However, how to train these models over multiple GPUs efficiently is still challenging due to a large number of parallelism choices. Existing DL systems either rely on manual efforts to make distributed training plan… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

    Journal ref: VLDB 2023

  39. arXiv:2211.09013  [pdf, other

    cs.CV cs.IT

    Masked Reconstruction Contrastive Learning with Information Bottleneck Principle

    Authors: Ziwen Liu, Bonan Li, Congying Han, Tiande Guo, Xuecheng Nie

    Abstract: Contrastive learning (CL) has shown great power in self-supervised learning due to its ability to capture insight correlations among large-scale data. Current CL models are biased to learn only the ability to discriminate positive and negative pairs due to the discriminative task setting. However, this bias would lead to ignoring its sufficiency for other downstream tasks, which we call the discri… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

  40. arXiv:2210.01886  [pdf, other

    cs.CV cs.AI

    Multi-view Human Body Mesh Translator

    Authors: Xiangjian Jiang, Xuecheng Nie, Zitian Wang, Luoqi Liu, Si Liu

    Abstract: Existing methods for human mesh recovery mainly focus on single-view frameworks, but they often fail to produce accurate results due to the ill-posed setup. Considering the maturity of the multi-view motion capture system, in this paper, we propose to solve the prior ill-posed problem by leveraging multiple images from different views, thus significantly enhancing the quality of recovered meshes.… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

    Comments: 9 pages

  41. arXiv:2209.13258  [pdf, other

    cs.DC

    OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning

    Authors: Youhe Jiang, Fangcheng Fu, Xupeng Miao, Xiaonan Nie, Bin Cui

    Abstract: Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the distributed training of ultra-large models. However, directly deploying these systems often leads to sub-optimal training efficiency due to the complex model architecture… ▽ More

    Submitted 18 May, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

    Comments: Accepted by IJCAI 2023

  42. arXiv:2208.02646  [pdf, other

    cs.CV

    DropKey

    Authors: Bonan Li, Yinhan Hu, Xuecheng Nie, Congying Han, Xiangjian Jiang, Tiande Guo, Luoqi Liu

    Abstract: In this paper, we focus on analyzing and improving the dropout technique for self-attention layers of Vision Transformer, which is important while surprisingly ignored by prior works. In particular, we conduct researches on three core questions: First, what to drop in self-attention layers? Different from dropping attention weights in literature, we propose to move dropout operations forward ahead… ▽ More

    Submitted 11 April, 2023; v1 submitted 4 August, 2022; originally announced August 2022.

    Comments: Accepted by CVPR2023

  43. arXiv:2207.14381  [pdf, other

    cs.CV

    Pro-tuning: Unified Prompt Tuning for Vision Tasks

    Authors: Xing Nie, Bolin Ni, Jianlong Chang, Gaomeng Meng, Chunlei Huo, Zhaoxiang Zhang, Shiming Xiang, Qi Tian, Chunhong Pan

    Abstract: In computer vision, fine-tuning is the de-facto approach to leverage pre-trained vision models to perform downstream tasks. However, deploying it in practice is quite challenging, due to adopting parameter inefficient global update and heavily relying on high-quality downstream data. Recently, prompt-based learning, which adds a task-relevant prompt to adapt the downstream tasks to pre-trained mod… ▽ More

    Submitted 22 August, 2022; v1 submitted 28 July, 2022; originally announced July 2022.

  44. Actionable and Interpretable Fault Localization for Recurring Failures in Online Service Systems

    Authors: Zeyan Li, Nengwen Zhao, Mingjie Li, Xianglin Lu, Lixin Wang, Dongdong Chang, Xiaohui Nie, Li Cao, Wenzhi Zhang, Kaixin Sui, Yanhua Wang, Xu Du, Guoqiang Duan, Dan Pei

    Abstract: Fault localization is challenging in an online service system due to its monitoring data's large volume and variety and complex dependencies across or within its components (e.g., services or databases). Furthermore, engineers require fault localization solutions to be actionable and interpretable, which existing research approaches cannot satisfy. Therefore, the common industry practice is that,… ▽ More

    Submitted 4 September, 2022; v1 submitted 18 July, 2022; originally announced July 2022.

    Comments: Accepted by ESEC/FSE 2022

  45. Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition

    Authors: Mingjie Li, Zeyan Li, Kanglin Yin, Xiaohui Nie, Wenchi Zhang, Kaixin Sui, Dan Pei

    Abstract: Fault diagnosis is critical in many domains, as faults may lead to safety threats or economic losses. In the field of online service systems, operators rely on enormous monitoring data to detect and mitigate failures. Quickly recognizing a small set of root cause indicators for the underlying fault can save much time for failure mitigation. In this paper, we formulate the root cause analysis probl… ▽ More

    Submitted 12 June, 2022; originally announced June 2022.

    Comments: Accepted at KDD 2022 Applied Data Science Track

  46. arXiv:2204.02509  [pdf, other

    cs.CV

    Depth-Guided Sparse Structure-from-Motion for Movies and TV Shows

    Authors: Sheng Liu, Xiaohan Nie, Raffay Hamid

    Abstract: Existing approaches for Structure from Motion (SfM) produce impressive 3-D reconstruction results especially when using imagery captured with large parallax. However, to create engaging video-content in movies and TV shows, the amount by which a camera can be moved while filming a particular shot is often limited. The resulting small-motion parallax between video frames makes standard geometry-bas… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

  47. arXiv:2203.14685  [pdf, other

    cs.DC

    HetuMoE: An Efficient Trillion-scale Mixture-of-Expert Distributed Training System

    Authors: Xiaonan Nie, Pinxue Zhao, Xupeng Miao, Tong Zhao, Bin Cui

    Abstract: As giant dense models advance quality but require large amounts of GPU budgets for training, the sparsely gated Mixture-of-Experts (MoE), a kind of conditional computation architecture, is proposed to scale models while keeping their computation constant. Specifically, the input tokens are routed by the gate network and only activates part of the expert network. Existing MoE training systems only… ▽ More

    Submitted 17 November, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

  48. arXiv:2203.11700  [pdf, other

    cs.LG cs.AI

    Exploring Linear Feature Disentanglement For Neural Networks

    Authors: Tiantian He, Zhibin Li, Yongshun Gong, Yazhou Yao, Xiushan Nie, Yilong Yin

    Abstract: Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs). Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space. This phenomenon ignites our interest in exploring whether all features need to be transf… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

  49. arXiv:2203.09736  [pdf, other

    cs.CV

    Series Photo Selection via Multi-view Graph Learning

    Authors: Jin Huang, Lu Zhang, Yongshun Gong, Jian Zhang, Xiushan Nie, Yilong Yin

    Abstract: Series photo selection (SPS) is an important branch of the image aesthetics quality assessment, which focuses on finding the best one from a series of nearly identical photos. While a great progress has been observed, most of the existing SPS approaches concentrate solely on extracting features from the original image, neglecting that multiple views, e.g, saturation level, color histogram and dept… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

  50. arXiv:2203.07697  [pdf, other

    cs.CV

    Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation

    Authors: Zitian Wang, Xuecheng Nie, Xiaochao Qu, Yunpeng Chen, Si Liu

    Abstract: In this paper, we present a novel Distribution-Aware Single-stage (DAS) model for tackling the challenging multi-person 3D pose estimation problem. Different from existing top-down and bottom-up methods, the proposed DAS model simultaneously localizes person positions and their corresponding body joints in the 3D camera space in a one-pass manner. This leads to a simplified pipeline with enhanced… ▽ More

    Submitted 22 March, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: To appear in CVPR 2022. Code will be released