Skip to main content

Showing 1–50 of 84 results for author: Ji, L

  1. arXiv:2410.10573  [pdf, other

    cs.CV

    Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification

    Authors: Jiaxiang Gou, Luping Ji, Pei Liu, Mao Ye

    Abstract: Whole Slide Image (WSI) classification has very significant applications in clinical pathology, e.g., tumor identification and cancer diagnosis. Currently, most research attention is focused on Multiple Instance Learning (MIL) using static datasets. One of the most obvious weaknesses of these methods is that they cannot efficiently preserve and utilize previously learned knowledge. With any new da… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: 16 pages, 10 tables, 11 figures

  2. arXiv:2410.03439  [pdf, other

    cs.CL

    ToolGen: Unified Tool Retrieval and Calling via Generation

    Authors: Renxi Wang, Xudong Han, Lei Ji, Shu Wang, Timothy Baldwin, Haonan Li

    Abstract: As large language models (LLMs) advance, their inability to autonomously execute tasks by directly interacting with external tools remains a critical limitation. Traditional methods rely on inputting tool descriptions as context, which is constrained by context length and requires separate, often inefficient, retrieval mechanisms. We introduce ToolGen, a paradigm shift that integrates tool knowled… ▽ More

    Submitted 8 October, 2024; v1 submitted 4 October, 2024; originally announced October 2024.

    ACM Class: I.2.7

  3. arXiv:2409.12059  [pdf, other

    cs.CL cs.AI cs.LG

    Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking

    Authors: Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji

    Abstract: Large Language Model can reasonably understand and generate human expressions but may lack of thorough thinking and reasoning mechanisms. Recently there have been several studies which enhance the thinking ability of language models but most of them are not data-driven or training-based. In this paper, we are motivated by the cognitive mechanism in the natural world, and design a novel model archi… ▽ More

    Submitted 27 September, 2024; v1 submitted 18 September, 2024; originally announced September 2024.

    Comments: 9 pages, 5 figures

  4. arXiv:2409.09369  [pdf, other

    cs.CV

    Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology

    Authors: Pei Liu, Luping Ji, Jiaxiang Gou, Bo Fu, Mao Ye

    Abstract: Histopathology Whole-Slide Images (WSIs) provide an important tool to assess cancer prognosis in computational pathology (CPATH). While existing survival analysis (SA) approaches have made exciting progress, they are generally limited to adopting highly-expressive architectures and only coarse-grained patient-level labels to learn prognostic visual representations from gigapixel WSIs. Such learnin… ▽ More

    Submitted 26 September, 2024; v1 submitted 14 September, 2024; originally announced September 2024.

    Comments: 24 pages, 11 tables, 6 figures

  5. arXiv:2409.07416  [pdf, other

    cs.IR cs.AI cs.LG

    Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation

    Authors: Luo Ji, Gao Liu, Mingyang Yin, Hongxia Yang, Jingren Zhou

    Abstract: Modern listwise recommendation systems need to consider both long-term user perceptions and short-term interest shifts. Reinforcement learning can be applied on recommendation to study such a problem but is also subject to large search space, sparse user feedback and long interactive latency. Motivated by recent progress in hierarchical reinforcement learning, we propose a novel framework called m… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 18 pages, 4 figures

  6. arXiv:2409.07341  [pdf, other

    cs.LG cs.AI cs.RO

    Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence

    Authors: Luo Ji, Runji Lin

    Abstract: Interactive artificial intelligence in the motion control field is an interesting topic, especially when universal knowledge is adaptive to multiple tasks and universal environments. Despite there being increasing efforts in the field of Reinforcement Learning (RL) with the aid of transformers, most of them might be limited by the offline training pipeline, which prohibits exploration and generali… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 12 pages, 6 figures

  7. arXiv:2409.06624  [pdf, other

    cs.CL cs.AI cs.LG

    A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio

    Authors: Ningyuan Xi, Yetao Wu, Kun Fan, Teng Chen, Qingqing Gu, Peng Yu, Jinxian Qu, Chenxi Liu, Zhonglin Jiang, Yong Chen, Luo Ji

    Abstract: Large Language Models (LLM) often needs to be Continual Pre-Trained (CPT) to obtain the unfamiliar language skill or adapt into new domains. The huge training cost of CPT often asks for cautious choice of key hyper-parameters such as the mixture ratio of extra language or domain corpus. However, there is no systematic study which bridge the gap between the optimal mixture ratio and the actual mode… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: 11 pages, 4 figures

  8. arXiv:2409.06601  [pdf, other

    cs.CL cs.LG

    Alleviating Hallucinations in Large Language Models with Scepticism Modeling

    Authors: Yetao Wu, Yihong Wang, Teng Chen, Chenxi Liu, Ningyuan Xi, Qingqing Gu, Hongyang Lei, Zhonglin Jiang, Yong Chen, Luo Ji

    Abstract: Hallucinations is a major challenge for large language models (LLMs), prevents adoption in diverse fields. Uncertainty estimation could be used for alleviating the damages of hallucinations. The skeptical emotion of human could be useful for enhancing the ability of self estimation. Inspirited by this observation, we proposed a new approach called Skepticism Modeling (SM). This approach is formali… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: 11 pages, 6 figures

  9. arXiv:2409.05929  [pdf, other

    cs.LG cs.AI

    Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models

    Authors: Hongyang Lei, Xiaolong Cheng, Dan Wang, Qi Qin, Huazhen Huang, Yetao Wu, Qingqing Gu, Zhonglin Jiang, Yong Chen, Luo Ji

    Abstract: Recent Large Multi-Modal Models (LMMs) have made significant advancements in multi-modal alignment by employing lightweight connection modules to facilitate the representation and fusion of knowledge from existing pre-trained uni-modal models. However, these methods still rely on modality-specific and direction-specific connectors, leading to compartmentalized knowledge representations and reduced… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: work in progress

  10. arXiv:2409.01908  [pdf, other

    stat.ME cs.LG q-fin.ST stat.AP stat.ML

    Bayesian CART models for aggregate claim modeling

    Authors: Yaojun Zhang, Lanpeng Ji, Georgios Aivaliotis, Charles C. Taylor

    Abstract: This paper proposes three types of Bayesian CART (or BCART) models for aggregate claim amount, namely, frequency-severity models, sequential models and joint models. We propose a general framework for the BCART models applicable to data with multivariate responses, which is particularly useful for the joint BCART models with a bivariate response: the number of claims and aggregate claim amount. To… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  11. arXiv:2408.03060  [pdf

    cs.CV cs.GR

    MGFs: Masked Gaussian Fields for Meshing Building based on Multi-View Images

    Authors: Tengfei Wang, Zongqian Zhan, Rui Xia, Linxia Ji, Xin Wang

    Abstract: Over the last few decades, image-based building surface reconstruction has garnered substantial research interest and has been applied across various fields, such as heritage preservation, architectural planning, etc. Compared to the traditional photogrammetric and NeRF-based solutions, recently, Gaussian fields-based methods have exhibited significant potential in generating surface meshes due to… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  12. arXiv:2407.07289  [pdf, other

    cs.CV

    Deformable Feature Alignment and Refinement for Moving Infrared Dim-small Target Detection

    Authors: Dengyan Luo, Yanping Xiang, Hu Wang, Luping Ji, Shuai Li, Mao Ye

    Abstract: The detection of moving infrared dim-small targets has been a challenging and prevalent research topic. The current state-of-the-art methods are mainly based on ConvLSTM to aggregate information from adjacent frames to facilitate the detection of the current frame. However, these methods implicitly utilize motion information only in the training stage and fail to explicitly explore motion compensa… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  13. Triple-domain Feature Learning with Frequency-aware Memory Enhancement for Moving Infrared Small Target Detection

    Authors: Weiwei Duan, Luping Ji, Shengjia Chen, Sicheng Zhu, Mao Ye

    Abstract: As a sub-field of object detection, moving infrared small target detection presents significant challenges due to tiny target sizes and low contrast against backgrounds. Currently-existing methods primarily rely on the features extracted only from spatio-temporal domain. Frequency domain has hardly been concerned yet, although it has been widely applied in image processing. To extend feature sourc… ▽ More

    Submitted 5 September, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: This paper has accepted IEEE TGRS

    Journal ref: IEEE Transactions on Geoscience and Remote Sensing 2024

  14. arXiv:2405.15343  [pdf, other

    cs.CV

    Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features

    Authors: Lichuan Ji, Yingqi Lin, Zhenhua Huang, Yan Han, Xiaogang Xu, Jiafei Wu, Chong Wang, Zhe Liu

    Abstract: The development of AI-Generated Content (AIGC) has empowered the creation of remarkably realistic AI-generated videos, such as those involving Sora. However, the widespread adoption of these models raises concerns regarding potential misuse, including face video scams and copyright disputes. Addressing these concerns requires the development of robust tools capable of accurately determining video… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  15. arXiv:2405.07652  [pdf, other

    cs.HC cs.AI

    G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios

    Authors: Zeyu Wang, Yuanchun Shi, Yuntao Wang, Yuchen Yao, Kun Yan, Yuhan Wang, Lei Ji, Xuhai Xu, Chun Yu

    Abstract: Modern information querying systems are progressively incorporating multimodal inputs like vision and audio. However, the integration of gaze -- a modality deeply linked to user intent and increasingly accessible via gaze-tracking wearables -- remains underexplored. This paper introduces a novel gaze-facilitated information querying paradigm, named G-VOILA, which synergizes users' gaze, visual fie… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 25 pages, 12 figures

  16. arXiv:2405.04405  [pdf, other

    cs.LG

    Weakly-Supervised Residual Evidential Learning for Multi-Instance Uncertainty Estimation

    Authors: Pei Liu, Luping Ji

    Abstract: Uncertainty estimation (UE), as an effective means of quantifying predictive uncertainty, is crucial for safe and reliable decision-making, especially in high-risk scenarios. Existing UE schemes usually assume that there are completely-labeled samples to support fully-supervised learning. In practice, however, many UE tasks often have no sufficiently-labeled data to use, such as the Multiple Insta… ▽ More

    Submitted 9 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  17. arXiv:2401.11430  [pdf, other

    cs.CV

    Exploring Diffusion Time-steps for Unsupervised Representation Learning

    Authors: Zhongqi Yue, Jiankun Wang, Qianru Sun, Lei Ji, Eric I-Chao Chang, Hanwang Zhang

    Abstract: Representation learning is all about discovering the hidden modular attributes that generate the data faithfully. We explore the potential of Denoising Diffusion Probabilistic Model (DM) in unsupervised learning of the modular attributes. We build a theoretical framework that connects the diffusion time-steps and the hidden attributes, which serves as an effective inductive bias for unsupervised l… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: Accepted by ICLR 2024

  18. arXiv:2401.09454  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Voila-A: Aligning Vision-Language Models with User's Gaze Attention

    Authors: Kun Yan, Lei Ji, Zeyu Wang, Yuntao Wang, Nan Duan, Shuai Ma

    Abstract: In recent years, the integration of vision and language understanding has led to significant advancements in artificial intelligence, particularly through Vision-Language Models (VLMs). However, existing VLMs face challenges in handling real-world applications with complex scenes and multiple objects, as well as aligning their focus with the diverse attention patterns of human users. In this paper… ▽ More

    Submitted 22 December, 2023; originally announced January 2024.

  19. arXiv:2312.17072  [pdf, other

    cs.IR cs.LG

    An Adaptive Framework of Geographical Group-Specific Network on O2O Recommendation

    Authors: Luo Ji, Jiayu Mao, Hailong Shi, Qian Li, Yunfei Chu, Hongxia Yang

    Abstract: Online to offline recommendation strongly correlates with the user and service's spatiotemporal information, therefore calling for a higher degree of model personalization. The traditional methodology is based on a uniform model structure trained by collected centralized data, which is unlikely to capture all user patterns over different geographical areas or time periods. To tackle this challenge… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: 7 pages, 4 figures, Accepted by ECIR 2024

  20. arXiv:2312.13108  [pdf, other

    cs.CV

    ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation

    Authors: Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou

    Abstract: Graphical User Interface (GUI) automation holds significant promise for assisting users with complex tasks, thereby boosting human productivity. Existing works leveraging Large Language Model (LLM) or LLM-based AI agents have shown capabilities in automating tasks on Android and Web platforms. However, these tasks are primarily aimed at simple device usage and entertainment operations. This paper… ▽ More

    Submitted 1 January, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: Project Page: https://showlab.github.io/assistgui/

  21. arXiv:2310.18652  [pdf, other

    cs.CL cs.AI cs.CV

    EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images

    Authors: Seongsu Bae, Daeun Kyung, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei Ji, Eric I-Chao Chang, Tackeun Kim, Edward Choi

    Abstract: Electronic Health Records (EHRs), which contain patients' medical histories in various multi-modal formats, often overlook the potential for joint reasoning across imaging and table modalities underexplored in current EHR Question Answering (QA) systems. In this paper, we introduce EHRXQA, a novel multi-modal question answering dataset combining structured EHRs and chest X-ray images. To develop o… ▽ More

    Submitted 25 December, 2023; v1 submitted 28 October, 2023; originally announced October 2023.

    Comments: Accepted at NeurIPS 2023 Datasets and Benchmarks Track (10 pages for main text, 4 pages for references, 39 pages for supplementary materials)

  22. arXiv:2310.11285  [pdf, ps, other

    cs.DM

    Construction of optimal flag codes by MRD codes

    Authors: Shuangqing Liu, Shuhui Yu, Lijun Ji

    Abstract: Flag codes have received a lot of attention due to its application in random network coding. In 2021, Alonso-González et al. constructed optimal $(n,\mathcal{A})$-Optimum distance flag codes(ODFC) for $\mathcal {A}\subseteq \{1,2,\ldots,k,n-k,\ldots,n-1\}$ with $k\in \mathcal A$ and $k\mid n$. In this paper, we introduce a new construction of $(n,\mathcal A)_q$-ODFCs by maximum rank-metric codes,… ▽ More

    Submitted 11 October, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: 23 pages

    MSC Class: 94B99

  23. arXiv:2309.16609  [pdf, other

    cs.CL

    Qwen Technical Report

    Authors: Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan , et al. (23 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Q… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 59 pages, 5 figures

  24. arXiv:2309.07141  [pdf

    eess.SP cs.AI cs.LG

    Design of Recognition and Evaluation System for Table Tennis Players' Motor Skills Based on Artificial Intelligence

    Authors: Zhuo-yong Shi, Ye-tao Jia, Ke-xin Zhang, Ding-han Wang, Long-meng Ji, Yong Wu

    Abstract: With the rapid development of electronic science and technology, the research on wearable devices is constantly updated, but for now, it is not comprehensive for wearable devices to recognize and analyze the movement of specific sports. Based on this, this paper improves wearable devices of table tennis sport, and realizes the pattern recognition and evaluation of table tennis players' motor skill… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: 34pages, 16figures

    MSC Class: 93-01 ACM Class: G.1; H.4

  25. arXiv:2308.15016  [pdf, other

    cs.CV

    C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model

    Authors: Longbin Ji, Pengfei Wei, Yi Ren, Jinglin Liu, Chen Zhang, Xiang Yin

    Abstract: Co-speech gesture generation is crucial for automatic digital avatar animation. However, existing methods suffer from issues such as unstable training and temporal inconsistency, particularly in generating high-fidelity and comprehensive gestures. Additionally, these methods lack effective control over speaker identity and temporal editing of the generated gestures. Focusing on capturing temporal… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: 12 pages, 6 figures, 7 tables

  26. arXiv:2307.07893  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Anomaly Detection in Automated Fibre Placement: Learning with Data Limitations

    Authors: Assef Ghamisi, Todd Charter, Li Ji, Maxime Rivard, Gil Lund, Homayoun Najjaran

    Abstract: Conventional defect detection systems in Automated Fibre Placement (AFP) typically rely on end-to-end supervised learning, necessitating a substantial number of labelled defective samples for effective training. However, the scarcity of such labelled data poses a challenge. To overcome this limitation, we present a comprehensive framework for defect detection and localization in Automated Fibre Pl… ▽ More

    Submitted 14 August, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

    Journal ref: Frontiers in Manufacturing Technology, 2024, 4, 1277152

  27. arXiv:2307.07409  [pdf, other

    cs.CL cs.AI eess.IV

    KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization

    Authors: Gangwoo Kim, Hajung Kim, Lei Ji, Seongsu Bae, Chanhwi Kim, Mujeen Sung, Hyunjae Kim, Kun Yan, Eric Chang, Jaewoo Kang

    Abstract: In this paper, we introduce CheXOFA, a new pre-trained vision-language model (VLM) for the chest X-ray domain. Our model is initially pre-trained on various multimodal datasets within the general domain before being transferred to the chest X-ray domain. Following a prominent VLM, we unify various domain-specific tasks into a simple sequence-to-sequence schema. It enables the model to effectively… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: Published at BioNLP workshop @ ACL 2023

  28. Pseudo-Bag Mixup Augmentation for Multiple Instance Learning-Based Whole Slide Image Classification

    Authors: Pei Liu, Luping Ji, Xinyu Zhang, Feng Ye

    Abstract: Given the special situation of modeling gigapixel images, multiple instance learning (MIL) has become one of the most important frameworks for Whole Slide Image (WSI) classification. In current practice, most MIL networks often face two unavoidable problems in training: i) insufficient WSI data and ii) the sample memorization inclination inherent in neural networks. These problems may hinder MIL m… ▽ More

    Submitted 2 November, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

    Comments: 12 pages, 6 figures, 10 tables

  29. arXiv:2306.15255  [pdf, other

    cs.CV cs.CL

    GroundNLQ @ Ego4D Natural Language Queries Challenge 2023

    Authors: Zhijian Hou, Lei Ji, Difei Gao, Wanjun Zhong, Kun Yan, Chao Li, Wing-Kwong Chan, Chong-Wah Ngo, Nan Duan, Mike Zheng Shou

    Abstract: In this report, we present our champion solution for Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2023. Essentially, to accurately ground in a video, an effective egocentric feature extractor and a powerful grounding model are required. Motivated by this, we leverage a two-stage pre-training strategy to train egocentric feature extractors and the grounding model on video narrations, and… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: 5 pages, 2 figures, 4 tables, the champion solution for Ego4D Natural Language Queries Challenge in CVPR 2023

  30. arXiv:2306.08640  [pdf, other

    cs.CV

    AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn

    Authors: Difei Gao, Lei Ji, Luowei Zhou, Kevin Qinghong Lin, Joya Chen, Zihan Fan, Mike Zheng Shou

    Abstract: Recent research on Large Language Models (LLMs) has led to remarkable advancements in general NLP AI assistants. Some studies have further explored the use of LLMs for planning and invoking models or APIs to address more general multi-modal user queries. Despite this progress, complex visual-based tasks still remain challenging due to the diverse nature of visual tasks. This diversity is reflected… ▽ More

    Submitted 28 June, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: Project page: https://showlab.github.io/assistgpt/

  31. arXiv:2305.15627  [pdf, ps, other

    cs.DM math.CO

    New constructions of cyclic subspace codes

    Authors: Shuhui Yu, Lijun Ji

    Abstract: A subspace of a finite field is called a Sidon space if the product of any two of its nonzero elements is unique up to a scalar multiplier from the base field. Sidon spaces, introduced by Roth et al. (IEEE Trans Inf Theory 64(6): 4412-4422, 2018), have a close connection with optimal full-length orbit codes. In this paper, we present two constructions of Sidon spaces. The union of Sidon spaces fro… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  32. ProtoDiv: Prototype-guided Division of Consistent Pseudo-bags for Whole-slide Image Classification

    Authors: Rui Yang, Pei Liu, Luping Ji

    Abstract: Due to the limitations of inadequate Whole-Slide Image (WSI) samples with weak labels, pseudo-bag-based multiple instance learning (MIL) appears as a vibrant prospect in WSI classification. However, the pseudo-bag dividing scheme, often crucial for classification performance, is still an open topic worth exploring. Therefore, this paper proposes a novel scheme, ProtoDiv, using a bag prototype to g… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: 12 pages, 5 figures, and 3 tables

    Journal ref: Computer Methods and Programs in Biomedicine, 108161 (2024)

  33. arXiv:2303.16434  [pdf, other

    cs.AI cs.CL

    TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

    Authors: Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, Yun Wang, Linjun Shou, Ming Gong, Nan Duan

    Abstract: Artificial Intelligence (AI) has made incredible progress recently. On the one hand, advanced foundation models like ChatGPT can offer powerful conversation, in-context learning and code generation abilities on a broad range of open-domain tasks. They can also generate high-level solution outlines for domain-specific tasks based on the common sense knowledge they have acquired. However, they still… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  34. arXiv:2303.01923  [pdf, other

    stat.ML cs.LG q-fin.ST stat.AP

    Bayesian CART models for insurance claims frequency

    Authors: Yaojun Zhang, Lanpeng Ji, Georgios Aivaliotis, Charles Taylor

    Abstract: Accuracy and interpretability of a (non-life) insurance pricing model are essential qualities to ensure fair and transparent premiums for policy-holders, that reflect their risk. In recent years, the classification and regression trees (CARTs) and their ensembles have gained popularity in the actuarial literature, since they offer good prediction performance and are relatively easily interpretable… ▽ More

    Submitted 1 December, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

    Comments: 46 pages

    MSC Class: 62P05

  35. arXiv:2302.04438  [pdf, other

    stat.ML cs.LG

    An information-theoretic learning model based on importance sampling

    Authors: Jiangshe Zhang, Lizhen Ji, Fei Gao, Mengyao Li

    Abstract: A crucial assumption underlying the most current theory of machine learning is that the training distribution is identical to the test distribution. However, this assumption may not hold in some real-world applications. In this paper, we develop a learning model based on principles of information theory by minimizing the worst-case loss at prescribed levels of uncertainty. We reformulate the empir… ▽ More

    Submitted 22 February, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: 7 pages, 4 figures

  36. arXiv:2302.04421  [pdf, other

    stat.ML cs.LG

    Information Theoretical Importance Sampling Clustering

    Authors: Jiangshe Zhang, Lizhen Ji, Meng Wang

    Abstract: A current assumption of most clustering methods is that the training data and future data are taken from the same distribution. However, this assumption may not hold in most real-world scenarios. In this paper, we propose an information theoretical importance sampling based approach for clustering problems (ITISC) which minimizes the worst case of expected distortions under the constraint of distr… ▽ More

    Submitted 30 May, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: 15 pages, 9 figures

  37. arXiv:2212.09522  [pdf, other

    cs.CV

    MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering

    Authors: Difei Gao, Luowei Zhou, Lei Ji, Linchao Zhu, Yi Yang, Mike Zheng Shou

    Abstract: To build Video Question Answering (VideoQA) systems capable of assisting humans in daily activities, seeking answers from long-form videos with diverse and complex events is a must. Existing multi-modal VQA models achieve promising performance on images or short video clips, especially with the recent success of large-scale multi-modal pre-training. However, when extending these methods to long-fo… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

  38. arXiv:2212.07047  [pdf, other

    cs.CV

    Shared Coupling-bridge for Weakly Supervised Local Feature Learning

    Authors: Jiayuan Sun, Jiewen Zhu, Luping Ji

    Abstract: Sparse local feature extraction is usually believed to be of important significance in typical vision tasks such as simultaneous localization and mapping, image matching and 3D reconstruction. At present, it still has some deficiencies needing further improvement, mainly including the discrimination power of extracted local descriptors, the localization accuracy of detected keypoints, and the effi… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: 15 pages

  39. AdvMIL: Adversarial Multiple Instance Learning for the Survival Analysis on Whole-Slide Images

    Authors: Pei Liu, Luping Ji, Feng Ye, Bo Fu

    Abstract: The survival analysis on histological whole-slide images (WSIs) is one of the most important means to estimate patient prognosis. Although many weakly-supervised deep learning models have been developed for gigapixel WSIs, their potential is generally restricted by classical survival analysis rules and fully-supervised learning requirements. As a result, these models provide patients only with a c… ▽ More

    Submitted 5 April, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: 15 pages, 10 figures, 8 tables

    Journal ref: Medical Image Analysis, 103020 (2023)

  40. arXiv:2211.08776  [pdf, other

    cs.CV cs.IR

    An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022

    Authors: Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing-Kwong Chan, Chong-Wah Ngo, Zheng Shou, Nan Duan

    Abstract: This technical report describes the CONE approach for Ego4D Natural Language Queries (NLQ) Challenge in ECCV 2022. We leverage our model CONE, an efficient window-centric COarse-to-fiNE alignment framework. Specifically, CONE dynamically slices the long video into candidate windows via a sliding window approach. Centering at windows, CONE (1) learns the inter-window (coarse-grained) semantic varia… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: Technical report for ECCV 2022 Ego4D workshop, 4 pages, 2 figures, 2 tables. arXiv admin note: substantial text overlap with arXiv:2209.10918

  41. arXiv:2210.07815  [pdf, other

    cs.IR cs.LG

    Intra-session Context-aware Feed Recommendation in Live Systems

    Authors: Luo Ji, Gao Liu, Mingyang Yin, Hongxia Yang

    Abstract: Feed recommendation allows users to constantly browse items until feel uninterested and leave the session, which differs from traditional recommendation scenarios. Within a session, user's decision to continue browsing or not substantially affects occurrences of later clicks. However, such type of exposure bias is generally ignored or not explicitly modeled in most feed recommendation studies. In… ▽ More

    Submitted 11 January, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

    Comments: 5 pages, 4 figures, CIKM 2022 short paper

  42. arXiv:2210.04522  [pdf, other

    cs.CV

    HORIZON: High-Resolution Semantically Controlled Panorama Synthesis

    Authors: Kun Yan, Lei Ji, Chenfei Wu, Jian Liang, Ming Zhou, Nan Duan, Shuai Ma

    Abstract: Panorama synthesis endeavors to craft captivating 360-degree visual landscapes, immersing users in the heart of virtual worlds. Nevertheless, contemporary panoramic synthesis techniques grapple with the challenge of semantically guiding the content generation process. Although recent breakthroughs in visual synthesis have unlocked the potential for semantic control in 2D flat images, a direct appl… ▽ More

    Submitted 27 January, 2024; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: AAAI 2024 main conference

  43. arXiv:2209.10918  [pdf, other

    cs.CV cs.CL cs.IR

    CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding

    Authors: Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing-Kwong Chan, Chong-Wah Ngo, Zheng Shou, Nan Duan

    Abstract: This paper tackles an emerging and challenging problem of long video temporal grounding~(VTG) that localizes video moments related to a natural language (NL) query. Compared with short videos, long videos are also highly demanded but less explored, which brings new challenges in higher inference computation cost and weaker multi-modal alignment. To address these challenges, we propose CONE, an eff… ▽ More

    Submitted 29 May, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

    Comments: ACL 2023 Camera Ready. 14 pages, 7 figures, 4 tables

  44. arXiv:2206.05782  [pdf, other

    eess.IV cs.CV cs.LG

    DSCA: A Dual-Stream Network with Cross-Attention on Whole-Slide Image Pyramids for Cancer Prognosis

    Authors: Pei Liu, Bo Fu, Feng Ye, Rui Yang, Bin Xu, Luping Ji

    Abstract: The cancer prognosis on gigapixel Whole-Slide Images (WSIs) has always been a challenging task. To further enhance WSI visual representations, existing methods have explored image pyramids, instead of single-resolution images, in WSIs. In spite of this, they still face two major problems: high computational cost and the unnoticed semantical gap in multi-resolution feature fusion. To tackle these p… ▽ More

    Submitted 28 March, 2023; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: 12 pages, 6 figures, 7 tables

    Journal ref: Expert Systems with Applications, 120280 (2023)

  45. arXiv:2204.03828  [pdf, other

    cs.IT cs.MM cs.PF

    From PHY to QoE: A Parameterized Framework Design

    Authors: Hao Wang, Lei Ji, Zhenxing Gao

    Abstract: The rapid development of 5G communication technology has given birth to various real-time broadband communication services, such as augmented reality (AR), virtual reality (VR) and cloud games. Compared with traditional services, consumers tend to focus more on their subjective experience when utilizing these services. In the meantime, the problem of power consumption is particularly prominent in… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

  46. Deep Unified Representation for Heterogeneous Recommendation

    Authors: Chengqiang Lu, Mingyang Yin, Shuheng Shen, Luo Ji, Qi Liu, Hongxia Yang

    Abstract: Recommendation system has been a widely studied task both in academia and industry. Previous works mainly focus on homogeneous recommendation and little progress has been made for heterogeneous recommender systems. However, heterogeneous recommendations, e.g., recommending different types of items including products, videos, celebrity shopping notes, among many others, are dominant nowadays. State… ▽ More

    Submitted 26 January, 2022; v1 submitted 15 January, 2022; originally announced January 2022.

    Comments: 12 pages, 4 figures, accepted by the ACM Web Conference 2022 (WWW '22)

  47. arXiv:2112.01368  [pdf, other

    cs.CL cs.AI cs.LG

    ScaleVLAD: Improving Multimodal Sentiment Analysis via Multi-Scale Fusion of Locally Descriptors

    Authors: Huaishao Luo, Lei Ji, Yanyong Huang, Bin Wang, Shenggong Ji, Tianrui Li

    Abstract: Fusion technique is a key research topic in multimodal sentiment analysis. The recent attention-based fusion demonstrates advances over simple operation-based fusion. However, these fusion works adopt single-scale, i.e., token-level or utterance-level, unimodal representation. Such single-scale fusion is suboptimal because that different modality should be aligned with different granularities. Thi… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

  48. arXiv:2111.12417  [pdf, other

    cs.CV cs.AI

    NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion

    Authors: Chenfei Wu, Jian Liang, Lei Ji, Fan Yang, Yuejian Fang, Daxin Jiang, Nan Duan

    Abstract: This paper presents a unified multimodal pre-trained model called NÜWA that can generate new or manipulate existing visual data (i.e., images and videos) for various visual synthesis tasks. To cover language, image, and video at the same time for different scenarios, a 3D transformer encoder-decoder framework is designed, which can not only deal with videos as 3D data but also adapt to texts and i… ▽ More

    Submitted 24 November, 2021; originally announced November 2021.

  49. arXiv:2111.06061  [pdf, other

    cs.LG cs.AI

    Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI

    Authors: Jiangchao Yao, Shengyu Zhang, Yang Yao, Feng Wang, Jianxin Ma, Jianwei Zhang, Yunfei Chu, Luo Ji, Kunyang Jia, Tao Shen, Anpeng Wu, Fengda Zhang, Ziqi Tan, Kun Kuang, Chao Wu, Fei Wu, Jingren Zhou, Hongxia Yang

    Abstract: Influenced by the great success of deep learning via cloud computing and the rapid development of edge chips, research in artificial intelligence (AI) has shifted to both of the computing paradigms, i.e., cloud computing and edge computing. In recent years, we have witnessed significant progress in developing more advanced AI models on cloud servers that surpass traditional deep learning models ow… ▽ More

    Submitted 23 May, 2022; v1 submitted 11 November, 2021; originally announced November 2021.

    Comments: 20 pages, Transactions on Knowledge and Data Engineering

  50. arXiv:2110.00335  [pdf, other

    cs.CV

    Geometry Attention Transformer with Position-aware LSTMs for Image Captioning

    Authors: Chi Wang, Yulin Shen, Luping Ji

    Abstract: In recent years, transformer structures have been widely applied in image captioning with impressive performance. For good captioning results, the geometry and position relations of different visual objects are often thought of as crucial information. Aiming to further promote image captioning by transformers, this paper proposes an improved Geometry Attention Transformer (GAT) model. In order to… ▽ More

    Submitted 1 October, 2021; originally announced October 2021.

    Comments: To be submitted