subscribe to arXiv mailings

Towards understanding evolution of science through language model series

Authors: Junjie Dong, Zhuoqi Lyu, Qing Ke

Abstract: We introduce AnnualBERT, a series of language models designed specifically to capture the temporal evolution of scientific text. Deviating from the prevailing paradigms of subword tokenizations and "one model to rule them all", AnnualBERT adopts whole words as tokens and is composed of a base RoBERTa model pretrained from scratch on the full-text of 1.7 million arXiv papers published until 2008 an… ▽ More We introduce AnnualBERT, a series of language models designed specifically to capture the temporal evolution of scientific text. Deviating from the prevailing paradigms of subword tokenizations and "one model to rule them all", AnnualBERT adopts whole words as tokens and is composed of a base RoBERTa model pretrained from scratch on the full-text of 1.7 million arXiv papers published until 2008 and a collection of progressively trained models on arXiv papers at an annual basis. We demonstrate the effectiveness of AnnualBERT models by showing that they not only have comparable performances in standard tasks but also achieve state-of-the-art performances on domain-specific NLP tasks as well as link prediction tasks in the arXiv citation network. We then utilize probing tasks to quantify the models' behavior in terms of representation learning and forgetting as time progresses. Our approach enables the pretrained models to not only improve performances on scientific text processing tasks but also to provide insights into the development of scientific discourse over time. The series of the models is available at https://huggingface.co/jd445/AnnualBERTs. △ Less

Submitted 15 September, 2024; originally announced September 2024.

arXiv:2409.04004 [pdf, other]

One-Shot Diffusion Mimicker for Handwritten Text Generation

Authors: Gang Dai, Yifan Zhang, Quhui Ke, Qiangya Guo, Shuangping Huang

Abstract: Existing handwritten text generation methods often require more than ten handwriting samples as style references. However, in practical applications, users tend to prefer a handwriting generation model that operates with just a single reference sample for its convenience and efficiency. This approach, known as "one-shot generation", significantly simplifies the process but poses a significant chal… ▽ More Existing handwritten text generation methods often require more than ten handwriting samples as style references. However, in practical applications, users tend to prefer a handwriting generation model that operates with just a single reference sample for its convenience and efficiency. This approach, known as "one-shot generation", significantly simplifies the process but poses a significant challenge due to the difficulty of accurately capturing a writer's style from a single sample, especially when extracting fine details from the characters' edges amidst sparse foreground and undesired background noise. To address this problem, we propose a One-shot Diffusion Mimicker (One-DM) to generate handwritten text that can mimic any calligraphic style with only one reference sample. Inspired by the fact that high-frequency information of the individual sample often contains distinct style patterns (e.g., character slant and letter joining), we develop a novel style-enhanced module to improve the style extraction by incorporating high-frequency components from a single sample. We then fuse the style features with the text content as a merged condition for guiding the diffusion model to produce high-quality handwritten text images. Extensive experiments demonstrate that our method can successfully generate handwriting scripts with just one sample reference in multiple languages, even outperforming previous methods using over ten samples. Our source code is available at https://github.com/dailenson/One-DM. △ Less

Submitted 11 September, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

Comments: To appear in ECCV 2024

arXiv:2408.03627 [pdf]

doi 10.1109/TGRS.2021.3066195

Weakly Contrastive Learning via Batch Instance Discrimination and Feature Clustering for Small Sample SAR ATR

Authors: Yikui Zhai, Wenlve Zhou, Bing Sun, Jingwen Li, Qirui Ke, Zilu Ying, Junying Gan, Chaoyun Mai, Ruggero Donida Labati, Vincenzo Piuri, Fabio Scotti

Abstract: In recent years, impressive performance of deep learning technology has been recognized in Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR). Since a large amount of annotated data is required in this technique, it poses a trenchant challenge to the issue of obtaining a high recognition rate through less labeled data. To overcome this problem, inspired by the contrastive learning,… ▽ More In recent years, impressive performance of deep learning technology has been recognized in Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR). Since a large amount of annotated data is required in this technique, it poses a trenchant challenge to the issue of obtaining a high recognition rate through less labeled data. To overcome this problem, inspired by the contrastive learning, we proposed a novel framework named Batch Instance Discrimination and Feature Clustering (BIDFC). In this framework, different from that of the objective of general contrastive learning methods, embedding distance between samples should be moderate because of the high similarity between samples in the SAR images. Consequently, our flexible framework is equipped with adjustable distance between embedding, which we term as weakly contrastive learning. Technically, instance labels are assigned to the unlabeled data in per batch and random augmentation and training are performed few times on these augmented data. Meanwhile, a novel Dynamic-Weighted Variance loss (DWV loss) function is also posed to cluster the embedding of enhanced versions for each sample. Experimental results on the moving and stationary target acquisition and recognition (MSTAR) database indicate a 91.25% classification accuracy of our method fine-tuned on only 3.13% training data. Even though a linear evaluation is performed on the same training data, the accuracy can still reach 90.13%. We also verified the effectiveness of BIDFC in OpenSarShip database, indicating that our method can be generalized to other datasets. Our code is avaliable at: https://github.com/Wenlve-Zhou/BIDFC-master. △ Less

Submitted 7 August, 2024; originally announced August 2024.

arXiv:2407.06064 [pdf, other]

Pan-denoising: Guided Hyperspectral Image Denoising via Weighted Represent Coefficient Total Variation

Authors: Shuang Xu, Qiao Ke, Jiangjun Peng, Xiangyong Cao, Zixiang Zhao

Abstract: This paper introduces a novel paradigm for hyperspectral image (HSI) denoising, which is termed \textit{pan-denoising}. In a given scene, panchromatic (PAN) images capture similar structures and textures to HSIs but with less noise. This enables the utilization of PAN images to guide the HSI denoising process. Consequently, pan-denoising, which incorporates an additional prior, has the potential t… ▽ More This paper introduces a novel paradigm for hyperspectral image (HSI) denoising, which is termed \textit{pan-denoising}. In a given scene, panchromatic (PAN) images capture similar structures and textures to HSIs but with less noise. This enables the utilization of PAN images to guide the HSI denoising process. Consequently, pan-denoising, which incorporates an additional prior, has the potential to uncover underlying structures and details beyond the internal information modeling of traditional HSI denoising methods. However, the proper modeling of this additional prior poses a significant challenge. To alleviate this issue, the paper proposes a novel regularization term, Panchromatic Weighted Representation Coefficient Total Variation (PWRCTV). It employs the gradient maps of PAN images to automatically assign different weights of TV regularization for each pixel, resulting in larger weights for smooth areas and smaller weights for edges. This regularization forms the basis of a pan-denoising model, which is solved using the Alternating Direction Method of Multipliers. Extensive experiments on synthetic and real-world datasets demonstrate that PWRCTV outperforms several state-of-the-art methods in terms of metrics and visual quality. Furthermore, an HSI classification experiment confirms that PWRCTV, as a preprocessing method, can enhance the performance of downstream classification tasks. The code and data are available at https://github.com/shuangxu96/PWRCTV. △ Less

Submitted 9 September, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, vol. 62, art. no. 5528714, 2024

arXiv:2406.13327 [pdf, other]

Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition

Authors: Anqi Zhu, Qiuhong Ke, Mingming Gong, James Bailey

Abstract: While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored. In this paper, we argue that relying solely on aligning label-level semantics and global skeleton features is insufficient to effectively transfer locally consistent visual knowledge from seen to unseen classes. To address this limitation, w… ▽ More While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored. In this paper, we argue that relying solely on aligning label-level semantics and global skeleton features is insufficient to effectively transfer locally consistent visual knowledge from seen to unseen classes. To address this limitation, we introduce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales. PURLS introduces a new prompting module and a novel partitioning module to generate aligned textual and visual representations across different levels. The former leverages a pre-trained GPT-3 to infer refined descriptions of the global and local (body-part-based and temporal-interval-based) movements from the original action labels. The latter employs an adaptive sampling strategy to group visual features from all body joint movements that are semantically relevant to a given description. Our approach is evaluated on various skeleton/language backbones and three large-scale datasets, i.e., NTU-RGB+D 60, NTU-RGB+D 120, and a newly curated dataset Kinetics-skeleton 200. The results showcase the universality and superior performance of PURLS, surpassing prior skeleton-based solutions and standard baselines from other domains. The source codes can be accessed at https://github.com/azzh1/PURLS. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2405.20633 [pdf, other]

Skeleton-OOD: An End-to-End Skeleton-Based Model for Robust Out-of-Distribution Human Action Detection

Authors: Jing Xu, Anqi Zhu, Jingyu Lin, Qiuhong Ke, Cunjian Chen

Abstract: Human action recognition is crucial in computer vision systems. However, in real-world scenarios, human actions often fall outside the distribution of training data, requiring a model to both recognize in-distribution (ID) actions and reject out-of-distribution (OOD) ones. Despite its importance, there has been limited research on OOD detection in human actions. Existing works on OOD detection mai… ▽ More Human action recognition is crucial in computer vision systems. However, in real-world scenarios, human actions often fall outside the distribution of training data, requiring a model to both recognize in-distribution (ID) actions and reject out-of-distribution (OOD) ones. Despite its importance, there has been limited research on OOD detection in human actions. Existing works on OOD detection mainly focus on image data with RGB structure, and many methods are post-hoc in nature. While these methods are convenient and computationally efficient, they often lack sufficient accuracy, fail to consider the exposure of OOD samples, and ignore the application in skeleton structure data. To address these challenges, we propose a novel end-to-end skeleton-based model called Skeleton-OOD, which is committed to improving the effectiveness of OOD tasks while ensuring the accuracy of ID recognition. Through extensive experiments conducted on NTU-RGB+D 60, NTU-RGB+D 120, and Kinetics-400 datasets, Skeleton-OOD demonstrates the superior performance of our proposed approach compared to state-of-the-art methods. Our findings underscore the effectiveness of classic OOD detection techniques in the context of skeleton-based action recognition tasks, offering promising avenues for future research in this field. Code is available at https://github.com/YilliaJing/Skeleton-OOD.git. △ Less

Submitted 10 October, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.11336 [pdf, other]

UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against Both Textual Filters and Visual Checkers

Authors: Duo Peng, Qiuhong Ke, Jun Liu

Abstract: Text-to-Image (T2I) models have raised security concerns due to their potential to generate inappropriate or harmful images. In this paper, we propose UPAM, a novel framework that investigates the robustness of T2I models from the attack perspective. Unlike most existing attack methods that focus on deceiving textual defenses, UPAM aims to deceive both textual and visual defenses in T2I models. UP… ▽ More Text-to-Image (T2I) models have raised security concerns due to their potential to generate inappropriate or harmful images. In this paper, we propose UPAM, a novel framework that investigates the robustness of T2I models from the attack perspective. Unlike most existing attack methods that focus on deceiving textual defenses, UPAM aims to deceive both textual and visual defenses in T2I models. UPAM enables gradient-based optimization, offering greater effectiveness and efficiency than previous methods. Given that T2I models might not return results due to defense mechanisms, we introduce a Sphere-Probing Learning (SPL) scheme to support gradient optimization even when no results are returned. Additionally, we devise a Semantic-Enhancing Learning (SEL) scheme to finetune UPAM for generating target-aligned images. Our framework also ensures attack stealthiness. Extensive experiments demonstrate UPAM's effectiveness and efficiency. △ Less

Submitted 25 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

Comments: Accepted by ICML2024

ACM Class: I.2.6

arXiv:2405.05791 [pdf, other]

Sequential Amodal Segmentation via Cumulative Occlusion Learning

Authors: Jiayang Ao, Qiuhong Ke, Krista A. Ehinger

Abstract: To fully understand the 3D context of a single image, a visual system must be able to segment both the visible and occluded regions of objects, while discerning their occlusion order. Ideally, the system should be able to handle any object and not be restricted to segmenting a limited set of object classes, especially in robotic applications. Addressing this need, we introduce a diffusion model wi… ▽ More To fully understand the 3D context of a single image, a visual system must be able to segment both the visible and occluded regions of objects, while discerning their occlusion order. Ideally, the system should be able to handle any object and not be restricted to segmenting a limited set of object classes, especially in robotic applications. Addressing this need, we introduce a diffusion model with cumulative occlusion learning designed for sequential amodal segmentation of objects with uncertain categories. This model iteratively refines the prediction using the cumulative mask strategy during diffusion, effectively capturing the uncertainty of invisible regions and adeptly reproducing the complex distribution of shapes and occlusion orders of occluded objects. It is akin to the human capability for amodal perception, i.e., to decipher the spatial ordering among objects and accurately predict complete contours for occluded objects in densely layered visual scenes. Experimental results across three amodal datasets show that our method outperforms established baselines. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2401.06445 [pdf, other]

Directed network comparison using motifs

Authors: Chenwei Xie, Qiao Ke, Haoyu Chen, Chuang Liu, Xiu-Xiu Zhan

Abstract: Analyzing and characterizing the differences between networks is a fundamental and challenging problem in network science. Previously, most network comparison methods that rely on topological properties have been restricted to measuring differences between two undirected networks. However, many networks, such as biological networks, social networks, and transportation networks, exhibit inherent di… ▽ More Analyzing and characterizing the differences between networks is a fundamental and challenging problem in network science. Previously, most network comparison methods that rely on topological properties have been restricted to measuring differences between two undirected networks. However, many networks, such as biological networks, social networks, and transportation networks, exhibit inherent directionality and higher-order attributes that should not be ignored when comparing networks. Therefore, we propose a motif-based directed network comparison method that captures local, global, and higher-order differences between two directed networks. Specifically, we first construct a motif distribution vector for each node, which captures the information of a node's involvement in different directed motifs. Then, the dissimilarity between two directed networks is defined on the basis of a matrix which is composed of the motif distribution vector of every node and Jensen-Shannon divergence. The performance of our method is evaluated via the comparison of six real directed networks with their null models as well as their perturbed networks based on edge perturbation. Our method is superior to the state-of-the-art baselines and is robust with different parameter settings. △ Less

Submitted 12 January, 2024; originally announced January 2024.

arXiv:2401.01510 [pdf, other]

Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering

Authors: Haopeng Li, Qiuhong Ke, Mingming Gong, Tom Drummond

Abstract: While significant advancements have been made in video question answering (VideoQA), the potential benefits of enhancing model generalization through tailored difficulty scheduling have been largely overlooked in existing research. This paper seeks to bridge that gap by incorporating VideoQA into a curriculum learning (CL) framework that progressively trains models from simpler to more complex dat… ▽ More While significant advancements have been made in video question answering (VideoQA), the potential benefits of enhancing model generalization through tailored difficulty scheduling have been largely overlooked in existing research. This paper seeks to bridge that gap by incorporating VideoQA into a curriculum learning (CL) framework that progressively trains models from simpler to more complex data. Recognizing that conventional self-paced CL methods rely on training loss for difficulty measurement, which might not accurately reflect the intricacies of video-question pairs, we introduce the concept of uncertainty-aware CL. Here, uncertainty serves as the guiding principle for dynamically adjusting the difficulty. Furthermore, we address the challenge posed by uncertainty by presenting a probabilistic modeling approach for VideoQA. Specifically, we conceptualize VideoQA as a stochastic computation graph, where the hidden representations are treated as stochastic variables. This yields two distinct types of uncertainty: one related to the inherent uncertainty in the data and another pertaining to the model's confidence. In practice, we seamlessly integrate the VideoQA model into our framework and conduct comprehensive experiments. The findings affirm that our approach not only achieves enhanced performance but also effectively quantifies uncertainty in the context of VideoQA. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2401.01505 [pdf, other]

Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports

Authors: Haopeng Li, Andong Deng, Qiuhong Ke, Jun Liu, Hossein Rahmani, Yulan Guo, Bernt Schiele, Chen Chen

Abstract: Reasoning over sports videos for question answering is an important task with numerous applications, such as player training and information retrieval. However, this task has not been explored due to the lack of relevant datasets and the challenging nature it presents. Most datasets for video question answering (VideoQA) focus mainly on general and coarse-grained understanding of daily-life videos… ▽ More Reasoning over sports videos for question answering is an important task with numerous applications, such as player training and information retrieval. However, this task has not been explored due to the lack of relevant datasets and the challenging nature it presents. Most datasets for video question answering (VideoQA) focus mainly on general and coarse-grained understanding of daily-life videos, which is not applicable to sports scenarios requiring professional action understanding and fine-grained motion analysis. In this paper, we introduce the first dataset, named Sports-QA, specifically designed for the sports VideoQA task. The Sports-QA dataset includes various types of questions, such as descriptions, chronologies, causalities, and counterfactual conditions, covering multiple sports. Furthermore, to address the characteristics of the sports VideoQA task, we propose a new Auto-Focus Transformer (AFT) capable of automatically focusing on particular scales of temporal information for question answering. We conduct extensive experiments on Sports-QA, including baseline studies and the evaluation of different methods. The results demonstrate that our AFT achieves state-of-the-art performance. △ Less

Submitted 14 February, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

arXiv:2311.03943 [pdf, other]

CLIP Guided Image-perceptive Prompt Learning for Image Enhancement

Authors: Weiwen Chen, Qiuhong Ke, Zinuo Li

Abstract: Image enhancement is a significant research area in the fields of computer vision and image processing. In recent years, many learning-based methods for image enhancement have been developed, where the Look-up-table (LUT) has proven to be an effective tool. In this paper, we delve into the potential of Contrastive Language-Image Pre-Training (CLIP) Guided Prompt Learning, proposing a simple struct… ▽ More Image enhancement is a significant research area in the fields of computer vision and image processing. In recent years, many learning-based methods for image enhancement have been developed, where the Look-up-table (LUT) has proven to be an effective tool. In this paper, we delve into the potential of Contrastive Language-Image Pre-Training (CLIP) Guided Prompt Learning, proposing a simple structure called CLIP-LUT for image enhancement. We found that the prior knowledge of CLIP can effectively discern the quality of degraded images, which can provide reliable guidance. To be specific, We initially learn image-perceptive prompts to distinguish between original and target images using CLIP model, in the meanwhile, we introduce a very simple network by incorporating a simple baseline to predict the weights of three different LUT as enhancement network. The obtained prompts are used to steer the enhancement network like a loss function and improve the performance of model. We demonstrate that by simply combining a straightforward method with CLIP, we can obtain satisfactory results. △ Less

Submitted 22 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: A trial work to the image enhancement

arXiv:2308.14413 [pdf]

doi 10.1039/D2NH00573E

Abnormal behavior of preferred formation of cationic vacancy from the interior in γ-GeSe monolayer with the stereo-chemical antibonding lone-pair state

Authors: Changmeng Huan, Yongqing Cai, Devesh R. Kripalani, Kun Zhou, Qingqing Ke

Abstract: Two-dimensional (2D) materials tend to have the preferably formation of vacancies at the outer surface. Here, contrary to the normal notion, we reveal a type of vacancy that thermodynamically initiates from the interior part of the 2D backbone of germanium selenide (γ-GeSe). Interestingly, the Ge-vacancy (VGe) in the interior part of γ-GeSe possesses the lowest formation energy amongst the various… ▽ More Two-dimensional (2D) materials tend to have the preferably formation of vacancies at the outer surface. Here, contrary to the normal notion, we reveal a type of vacancy that thermodynamically initiates from the interior part of the 2D backbone of germanium selenide (γ-GeSe). Interestingly, the Ge-vacancy (VGe) in the interior part of γ-GeSe possesses the lowest formation energy amongst the various types of defects considered. We also find a low diffusion barrier (1.04 eV) of VGe which is a half of those of sulfur vacancy in MoS2. The facile formation of mobile VGe is rooted in the antibonding coupling of the lone-pair Ge 4s and Se 4p states near the valence band maximum, which also exists in other gamma-phase MX (M=Sn, Ge; X=S, Te). The VGe is accompanied by a shallow acceptor level in the band gap and induces strong infrared light absorption and p-type conductivity. The VGe located in the middle cationic Ge sublattice is well protected by the surface Se layers-a feature that is absent in other atomically thin materials. Our work suggests that the unique well-buried inner VGe, with the potential of forming structurally protected ultrathin conducting filaments, may render the GeSe layer an ideal platform for quantum emitting, memristive, and neuromorphic applications. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Journal ref: Nanoscale Horiz. 8, 404-411 (2023)

arXiv:2308.13893 [pdf, other]

Unsupervised Domain Adaptation via Domain-Adaptive Diffusion

Authors: Duo Peng, Qiuhong Ke, Yinjie Lei, Jun Liu

Abstract: Unsupervised Domain Adaptation (UDA) is quite challenging due to the large distribution discrepancy between the source domain and the target domain. Inspired by diffusion models which have strong capability to gradually convert data distributions across a large gap, we consider to explore the diffusion technique to handle the challenging UDA task. However, using diffusion models to convert data di… ▽ More Unsupervised Domain Adaptation (UDA) is quite challenging due to the large distribution discrepancy between the source domain and the target domain. Inspired by diffusion models which have strong capability to gradually convert data distributions across a large gap, we consider to explore the diffusion technique to handle the challenging UDA task. However, using diffusion models to convert data distribution across different domains is a non-trivial problem as the standard diffusion models generally perform conversion from the Gaussian distribution instead of from a specific domain distribution. Besides, during the conversion, the semantics of the source-domain data needs to be preserved for classification in the target domain. To tackle these problems, we propose a novel Domain-Adaptive Diffusion (DAD) module accompanied by a Mutual Learning Strategy (MLS), which can gradually convert data distribution from the source domain to the target domain while enabling the classification model to learn along the domain transition process. Consequently, our method successfully eases the challenge of UDA by decomposing the large domain gap into small ones and gradually enhancing the capacity of classification model to finally adapt to the target domain. Our method outperforms the current state-of-the-arts by a large margin on three widely used UDA datasets. △ Less

Submitted 26 August, 2023; originally announced August 2023.

Comments: 11 pages, 4 figures

arXiv:2308.12350 [pdf, other]

Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation

Authors: Duo Peng, Ping Hu, Qiuhong Ke, Jun Liu

Abstract: Translating images from a source domain to a target domain for learning target models is one of the most common strategies in domain adaptive semantic segmentation (DASS). However, existing methods still struggle to preserve semantically-consistent local details between the original and translated images. In this work, we present an innovative approach that addresses this challenge by using source… ▽ More Translating images from a source domain to a target domain for learning target models is one of the most common strategies in domain adaptive semantic segmentation (DASS). However, existing methods still struggle to preserve semantically-consistent local details between the original and translated images. In this work, we present an innovative approach that addresses this challenge by using source-domain labels as explicit guidance during image translation. Concretely, we formulate cross-domain image translation as a denoising diffusion process and utilize a novel Semantic Gradient Guidance (SGG) method to constrain the translation process, conditioning it on the pixel-wise source labels. Additionally, a Progressive Translation Learning (PTL) strategy is devised to enable the SGG method to work reliably across domains with large gaps. Extensive experiments demonstrate the superiority of our approach over state-of-the-art methods. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: Accepted to ICCV2023

arXiv:2306.16643 [pdf]

Cautious explorers generate more future academic impact

Authors: Xingsheng Yang, Zhaoru Ke, Qing Ke, Haipeng Zhang, Fengnan Gao

Abstract: Some scientists are more likely to explore unfamiliar research topics while others tend to exploit existing ones. In previous work, correlations have been found between scientists' topic choices and their career performances. However, literature has yet to untangle the intricate interplay between scientific impact and research topic choices, where scientific exploration and exploitation intertwine… ▽ More Some scientists are more likely to explore unfamiliar research topics while others tend to exploit existing ones. In previous work, correlations have been found between scientists' topic choices and their career performances. However, literature has yet to untangle the intricate interplay between scientific impact and research topic choices, where scientific exploration and exploitation intertwine. Here we study two metrics that gauge how frequently scientists switch topic areas and how large those jumps are, and discover that 'cautious explorers' who switch topics frequently but do so to 'close' domains have notably better future performance and can be identified at a remarkably early career stage. Cautious explorers who balance exploration and exploitation in their first four career years have up to 19% more citations per future paper. Our results suggest that the proposed metrics depict the scholarly traits of scientists throughout their careers and provide fresh insight, especially for nurturing junior scientists. △ Less

Submitted 29 June, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

Comments: 16 pages of main text and 94 pages of supplementary information. v2: Added page number and fixed typo in author list

arXiv:2306.13897 [pdf, other]

ICN: Interactive Convolutional Network for Forecasting Travel Demand of Shared Micromobility

Authors: Yiming Xu, Qian Ke, Xiaojian Zhang, Xilei Zhao

Abstract: Accurate shared micromobility demand predictions are essential for transportation planning and management. Although deep learning models provide powerful tools to deal with demand prediction problems, studies on forecasting highly-accurate spatiotemporal shared micromobility demand are still lacking. This paper proposes a deep learning model named Interactive Convolutional Network (ICN) to forecas… ▽ More Accurate shared micromobility demand predictions are essential for transportation planning and management. Although deep learning models provide powerful tools to deal with demand prediction problems, studies on forecasting highly-accurate spatiotemporal shared micromobility demand are still lacking. This paper proposes a deep learning model named Interactive Convolutional Network (ICN) to forecast spatiotemporal travel demand for shared micromobility. The proposed model develops a novel channel dilation method by utilizing multi-dimensional spatial information (i.e., demographics, functionality, and transportation supply) based on travel behavior knowledge for building the deep learning model. We use the convolution operation to process the dilated tensor to simultaneously capture temporal and spatial dependencies. Based on a binary-tree-structured architecture and interactive convolution, the ICN model extracts features at different temporal resolutions, and then generates predictions using a fully-connected layer. The proposed model is evaluated for two real-world case studies in Chicago, IL, and Austin, TX. The results show that the ICN model significantly outperforms all the selected benchmark models. The model predictions can help the micromobility operators develop optimal vehicle rebalancing schemes and guide cities to better manage the shared micromobility system. △ Less

Submitted 24 June, 2023; originally announced June 2023.

arXiv:2304.06724 [pdf, other]

GradMDM: Adversarial Attack on Dynamic Networks

Authors: Jianhong Pan, Lin Geng Foo, Qichen Zheng, Zhipeng Fan, Hossein Rahmani, Qiuhong Ke, Jun Liu

Abstract: Dynamic neural networks can greatly reduce computation redundancy without compromising accuracy by adapting their structures based on the input. In this paper, we explore the robustness of dynamic neural networks against energy-oriented attacks targeted at reducing their efficiency. Specifically, we attack dynamic models with our novel algorithm GradMDM. GradMDM is a technique that adjusts the dir… ▽ More Dynamic neural networks can greatly reduce computation redundancy without compromising accuracy by adapting their structures based on the input. In this paper, we explore the robustness of dynamic neural networks against energy-oriented attacks targeted at reducing their efficiency. Specifically, we attack dynamic models with our novel algorithm GradMDM. GradMDM is a technique that adjusts the direction and the magnitude of the gradients to effectively find a small perturbation for each input, that will activate more computational units of dynamic models during inference. We evaluate GradMDM on multiple datasets and dynamic models, where it outperforms previous energy-oriented attack techniques, significantly increasing computation complexity while reducing the perceptibility of the perturbations. △ Less

Submitted 1 April, 2023; originally announced April 2023.

Comments: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

arXiv:2304.00280 [pdf, other]

Progressive Channel-Shrinking Network

Authors: Jianhong Pan, Siyuan Yang, Lin Geng Foo, Qiuhong Ke, Hossein Rahmani, Zhipeng Fan, Jun Liu

Abstract: Currently, salience-based channel pruning makes continuous breakthroughs in network compression. In the realization, the salience mechanism is used as a metric of channel salience to guide pruning. Therefore, salience-based channel pruning can dynamically adjust the channel width at run-time, which provides a flexible pruning scheme. However, there are two problems emerging: a gating function is o… ▽ More Currently, salience-based channel pruning makes continuous breakthroughs in network compression. In the realization, the salience mechanism is used as a metric of channel salience to guide pruning. Therefore, salience-based channel pruning can dynamically adjust the channel width at run-time, which provides a flexible pruning scheme. However, there are two problems emerging: a gating function is often needed to truncate the specific salience entries to zero, which destabilizes the forward propagation; dynamic architecture brings more cost for indexing in inference which bottlenecks the inference speed. In this paper, we propose a Progressive Channel-Shrinking (PCS) method to compress the selected salience entries at run-time instead of roughly approximating them to zero. We also propose a Running Shrinking Policy to provide a testing-static pruning scheme that can reduce the memory access cost for filter indexing. We evaluate our method on ImageNet and CIFAR10 datasets over two prevalent networks: ResNet and VGG, and demonstrate that our PCS outperforms all baselines and achieves state-of-the-art in terms of compression-performance tradeoff. Moreover, we observe a significant and practical acceleration of inference. △ Less

Submitted 1 April, 2023; originally announced April 2023.

arXiv:2303.06596 [pdf, other]

Amodal Intra-class Instance Segmentation: Synthetic Datasets and Benchmark

Authors: Jiayang Ao, Qiuhong Ke, Krista A. Ehinger

Abstract: Images of realistic scenes often contain intra-class objects that are heavily occluded from each other, making the amodal perception task that requires parsing the occluded parts of the objects challenging. Although important for downstream tasks such as robotic grasping systems, the lack of large-scale amodal datasets with detailed annotations makes it difficult to model intra-class occlusions ex… ▽ More Images of realistic scenes often contain intra-class objects that are heavily occluded from each other, making the amodal perception task that requires parsing the occluded parts of the objects challenging. Although important for downstream tasks such as robotic grasping systems, the lack of large-scale amodal datasets with detailed annotations makes it difficult to model intra-class occlusions explicitly. This paper introduces two new amodal datasets for image amodal completion tasks, which contain a total of over 267K images of intra-class occlusion scenarios, annotated with multiple masks, amodal bounding boxes, dual order relations and full appearance for instances and background. We also present a point-supervised scheme with layer priors for amodal instance segmentation specifically designed for intra-class occlusion scenarios. Experiments show that our weakly supervised approach outperforms the SOTA fully supervised methods, while our layer priors design exhibits remarkable performance improvements in the case of intra-class occlusion in both synthetic and real images. △ Less

Submitted 7 November, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

Comments: Accepted at WACV 2024. Datasets are available at https://github.com/saraao/amodal-dataset

arXiv:2303.01692 [pdf, other]

Travel Demand Forecasting: A Fair AI Approach

Authors: Xiaojian Zhang, Qian Ke, Xilei Zhao

Abstract: Artificial Intelligence (AI) and machine learning have been increasingly adopted for travel demand forecasting. The AI-based travel demand forecasting models, though generate accurate predictions, may produce prediction biases and raise fairness issues. Using such biased models for decision-making may lead to transportation policies that exacerbate social inequalities. However, limited studies hav… ▽ More Artificial Intelligence (AI) and machine learning have been increasingly adopted for travel demand forecasting. The AI-based travel demand forecasting models, though generate accurate predictions, may produce prediction biases and raise fairness issues. Using such biased models for decision-making may lead to transportation policies that exacerbate social inequalities. However, limited studies have been focused on addressing the fairness issues of these models. Therefore, in this study, we propose a novel methodology to develop fairness-aware, highly-accurate travel demand forecasting models. Particularly, the proposed methodology can enhance the fairness of AI models for multiple protected attributes (such as race and income) simultaneously. Specifically, we introduce a new fairness regularization term, which is explicitly designed to measure the correlation between prediction accuracy and multiple protected attributes, into the loss function of the travel demand forecasting model. We conduct two case studies to evaluate the performance of the proposed methodology using real-world ridesourcing-trip data in Chicago, IL and Austin, TX, respectively. Results highlight that our proposed methodology can effectively enhance fairness for multiple protected attributes while preserving prediction accuracy. Additionally, we have compared our methodology with three state-of-the-art methods that adopt the regularization term approach, and the results demonstrate that our approach significantly outperforms them in both preserving prediction accuracy and enhancing fairness. This study can provide transportation professionals with a new tool to achieve fair and accurate travel demand forecasting. △ Less

Submitted 25 September, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: improved the methodology; updated new contents

arXiv:2211.16940 [pdf, other]

DiffPose: Toward More Reliable 3D Pose Estimation

Authors: Jia Gong, Lin Geng Foo, Zhipeng Fan, Qiuhong Ke, Hossein Rahmani, Jun Liu

Abstract: Monocular 3D human pose estimation is quite challenging due to the inherent ambiguity and occlusion, which often lead to high uncertainty and indeterminacy. On the other hand, diffusion models have recently emerged as an effective tool for generating high-quality images from noise. Inspired by their capability, we explore a novel pose estimation framework (DiffPose) that formulates 3D pose estimat… ▽ More Monocular 3D human pose estimation is quite challenging due to the inherent ambiguity and occlusion, which often lead to high uncertainty and indeterminacy. On the other hand, diffusion models have recently emerged as an effective tool for generating high-quality images from noise. Inspired by their capability, we explore a novel pose estimation framework (DiffPose) that formulates 3D pose estimation as a reverse diffusion process. We incorporate novel designs into our DiffPose to facilitate the diffusion process for 3D pose estimation: a pose-specific initialization of pose uncertainty distributions, a Gaussian Mixture Model-based forward diffusion process, and a context-conditioned reverse diffusion process. Our proposed DiffPose significantly outperforms existing methods on the widely used pose estimation benchmarks Human3.6M and MPI-INF-3DHP. Project page: https://gongjia0208.github.io/Diffpose/. △ Less

Submitted 9 April, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

Comments: Accepted to CVPR 2023

arXiv:2211.02883 [pdf, other]

Unified Multi-View Orthonormal Non-Negative Graph Based Clustering Framework

Authors: Liangchen Liu, Qiuhong Ke, Chaojie Li, Feiping Nie, Yingying Zhu

Abstract: Spectral clustering is an effective methodology for unsupervised learning. Most traditional spectral clustering algorithms involve a separate two-step procedure and apply the transformed new representations for the final clustering results. Recently, much progress has been made to utilize the non-negative feature property in real-world data and to jointly learn the representation and clustering re… ▽ More Spectral clustering is an effective methodology for unsupervised learning. Most traditional spectral clustering algorithms involve a separate two-step procedure and apply the transformed new representations for the final clustering results. Recently, much progress has been made to utilize the non-negative feature property in real-world data and to jointly learn the representation and clustering results. However, to our knowledge, no previous work considers a unified model that incorporates the important multi-view information with those properties, which severely limits the performance of existing methods. In this paper, we formulate a novel clustering model, which exploits the non-negative feature property and, more importantly, incorporates the multi-view information into a unified joint learning framework: the unified multi-view orthonormal non-negative graph based clustering framework (Umv-ONGC). Then, we derive an effective three-stage iterative solution for the proposed model and provide analytic solutions for the three sub-problems from the three stages. We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features. Extensive experiments on three benchmark data sets demonstrate the effectiveness of the proposed method. △ Less

Submitted 1 December, 2022; v1 submitted 3 November, 2022; originally announced November 2022.

arXiv:2210.09991 [pdf]

doi 10.1039/D2TC02105F

Versatile van der Waals Heterostructures of Gamma-GeSe with h-BN/Graphene/MoS2

Authors: Changmeng Huan, Pu Wang, Bingtao Liu, Binghan He, Yongqing Cai, Qingqing Ke

Abstract: Recent discovery of a novel hexagonal phase of GeSe (Gamma-GeSe) has triggered great interests in nanoelectronics applications owing to its electrical conductivity of bulk phase even higher than graphite while its monolayer is a semiconductor. For potential applications, construction of functional two-dimensional (2D) contacts is indispensable. Herein, via first-principles calculations, we propose… ▽ More Recent discovery of a novel hexagonal phase of GeSe (Gamma-GeSe) has triggered great interests in nanoelectronics applications owing to its electrical conductivity of bulk phase even higher than graphite while its monolayer is a semiconductor. For potential applications, construction of functional two-dimensional (2D) contacts is indispensable. Herein, via first-principles calculations, we propose the design of van der Waals heterostructures (vdWHs) of Gamma-GeSe contacting respectively with graphene, 2D h-BN and MoS2, as representatives of metallic, insulator, and semiconductor partners. Our work shows that the h-BN or graphene layer donates electrons to the Gamma-GeSe layer, resulting in n doping in Gamma-GeSe, while the MoS2 layer accepts electrons from the Gamma-GeSe layer leading to p doping of the latter. The Gamma-GeSe/BN heterostructure has a type-I band alignment with large band offsets, indicating that BN can be used as an effective passivating layer to protect Gamma-GeSe from its environmental disturbance while maintaining its major electronic and optical characteristics. For Gamma-GeSe/graphene heterostructure, it is prone to have a very low-Schottky barrier down to tens of meV, easily overcome by thermal excitation, which can be tunable by strain and external electric field. The Gamma-GeSe/MoS2 vdWH forms a Z-scheme interface, which is beneficial for carriers splitting and photon utilization. Our work indicates that Gamma-GeSe can be well passivated by BN, and form intimate contact with graphene for high charge injection efficiency and with MoS2 for efficient carriers splitting for redox reactions. △ Less

Submitted 16 October, 2022; originally announced October 2022.

Journal ref: J. Mater. Chem. C, 2022, 10, 10995-11004

arXiv:2210.08704 [pdf]

doi 10.1088/2053-1583/ac83d5

Highly Modulated Dual Semimetal and Semiconducting Gamma-GeSe with Strain Engineering

Authors: Changmeng Huan, Pu Wang, Binghan He, Yongqing Cai, Qingqing Ke

Abstract: Layered hexagonal Gamma--GeSe, a new polymorph of GeSe synthesized recently, shows strikingly high electronic conductivity in its bulk form (even higher than graphite) while semiconducting in the case of monolayer (1L). In this work, by using first-principles calculations, we demonstrate that, different from its orthorhombic phases of GeSe, the Gamma--GeSe shows a small spatial anisotropic depende… ▽ More Layered hexagonal Gamma--GeSe, a new polymorph of GeSe synthesized recently, shows strikingly high electronic conductivity in its bulk form (even higher than graphite) while semiconducting in the case of monolayer (1L). In this work, by using first-principles calculations, we demonstrate that, different from its orthorhombic phases of GeSe, the Gamma--GeSe shows a small spatial anisotropic dependence and a strikingly thickness-dependent behavior with transition from semimetal (bulk, 0.04 eV) to semiconductor (1L, 0.99 eV), and this dual conducting characteristic realized simply with thickness control in Gamma-GeSe has not been found in other 2D materials before. The lacking of d-orbital allows charge carrier with small effective mass (0.16 m0 for electron and 0.23 m0 for hole) which is comparable to phosphorene. Meanwhile, 1L Gamma--GeSe shows a superior flexibility with Young's modulus of 86.59 N/m, only one-quarter of that of graphene and three-quarters of that of MoS2, and Poisson's ratio of 0.26, suggesting a highly flexible lattice. Interestingly, 1L Gamma-GeSe shows an in-plane isotropic elastic modulus inherent with hexagonal symmetry while an anisotropic in-plane effective mass owing to shifted valleys around the band edges. We demonstrate the feasibility of strain engineering in inducing indirect-direct and semiconductor-metal transitions resulting from competing bands at the band edges. Our work shows that the free 1L Gamma-GeSe shows a strong light absorption (~106 cm-1) and an indirect bandgap with rich valleys at band edges, enabling high carrier concentration and a low rate of direct electron-hole recombination which would be promising for nanoelectronics and solar cell applications. △ Less

Submitted 16 October, 2022; originally announced October 2022.

Journal ref: 2D Mater. 9 045014 (2022)

arXiv:2210.08700 [pdf]

doi 10.1007/s11433-022-1949-9

Phonon anharmonicity and thermal conductivity of two-dimensional van der Waals materials: A review

Authors: Xuefei Yan, Bowen Wang, Yulong Hai, Devesh R. Kripalani, Qingqing Ke, Yongqing Cai

Abstract: Two-dimensional (2D) van der Waals (vdW) materials have extraordinary thermal properties due to the effect of quantum confinement, making them promising for thermoelectric energy conversion and thermal management in microelectronic devices. In this review, the mechanism of phonon anharmonicity originating from three- and four-phonon interactions is derived. The phonon anharmonicity of 2D vdW mater… ▽ More Two-dimensional (2D) van der Waals (vdW) materials have extraordinary thermal properties due to the effect of quantum confinement, making them promising for thermoelectric energy conversion and thermal management in microelectronic devices. In this review, the mechanism of phonon anharmonicity originating from three- and four-phonon interactions is derived. The phonon anharmonicity of 2D vdW materials, involving the Grüneisen parameter, phonon lifetime, and thermal conductivity, is summarized and derived in detail. The size-dependent thermal conductivity of representative 2D vdW materials is discussed experimentally and theoretically. This review will present fundamental and advanced knowledge on how to evaluate the phonon anharmonicity in 2D vdW materials, which will aid the design of new structures and materials for applications related to energy transfer and conversion. △ Less

Submitted 16 October, 2022; originally announced October 2022.

Journal ref: Science China Physics, Mechanics & Astronomy 65 (11), 1-10 (2022)

arXiv:2210.02537 [pdf, ps, other]

Multiphoton states engineering by heralded interference via six-port Mach-Zehnder interferometer

Authors: Qiang Ke, Xue-feng Zhan, Min-xiang Li, Xue-xiang Xu

Abstract: Based on heralded interference on a six-port Mach-Zehnder interferometer, we propose protocols to generate a series of multiphoton states in primary output port, by injecting a coherent state in primary input port and two Fock states in two ancillary input ports, and measuring two Fock states in two ancillary output ports. Only manipulating at the single-photon level (i.e, |0> or |1>) in all ancil… ▽ More Based on heralded interference on a six-port Mach-Zehnder interferometer, we propose protocols to generate a series of multiphoton states in primary output port, by injecting a coherent state in primary input port and two Fock states in two ancillary input ports, and measuring two Fock states in two ancillary output ports. Only manipulating at the single-photon level (i.e, |0> or |1>) in all ancillary ports, we generate sixteen types (six categories) of multiphoton nonclassical states, whose state vectors are unified as superposition of a new coherent state, a single-photon added coherent state, and a two-photon added coherent state. Indeed, a wide range of nonclassical phenomena can be created by modulating the interaction parameters (including coherent field strength and shift phase). We mainly analyze quadrature-squeezing effects for all our considered states. Of particular interest is maximum squeezing of up to 2.57dB, with success probability 6.7%, at least in our present cases. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: 7 pages, 6 figures, comments are welcome

arXiv:2209.13204 [pdf, other]

NEURAL MARIONETTE: A Transformer-based Multi-action Human Motion Synthesis System

Authors: Weiqiang Wang, Xuefei Zhe, Qiuhong Ke, Di Kang, Tingguang Li, Ruizhi Chen, Linchao Bao

Abstract: We present a neural network-based system for long-term, multi-action human motion synthesis. The system, dubbed as NEURAL MARIONETTE, can produce high-quality and meaningful motions with smooth transitions from simple user input, including a sequence of action tags with expected action duration, and optionally a hand-drawn moving trajectory if the user specifies. The core of our system is a novel… ▽ More We present a neural network-based system for long-term, multi-action human motion synthesis. The system, dubbed as NEURAL MARIONETTE, can produce high-quality and meaningful motions with smooth transitions from simple user input, including a sequence of action tags with expected action duration, and optionally a hand-drawn moving trajectory if the user specifies. The core of our system is a novel Transformer-based motion generation model, namely MARIONET, which can generate diverse motions given action tags. Different from existing motion generation models, MARIONET utilizes contextual information from the past motion clip and future action tag, dedicated to generating actions that can smoothly blend historical and future actions. Specifically, MARIONET first encodes target action tag and contextual information into an action-level latent code. The code is unfolded into frame-level control signals via a time unrolling module, which could be then combined with other frame-level control signals like the target trajectory. Motion frames are then generated in an auto-regressive way. By sequentially applying MARIONET, the system NEURAL MARIONETTE can robustly generate long-term, multi-action motions with the help of two simple schemes, namely "Shadow Start" and "Action Revision". Along with the novel system, we also present a new dataset dedicated to the multi-action motion synthesis task, which contains both action tags and their contextual information. Extensive experiments are conducted to study the action accuracy, naturalism, and transition smoothness of the motions generated by our system. △ Less

Submitted 27 November, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

arXiv:2209.10073 [pdf, other]

Adaptive Local-Component-aware Graph Convolutional Network for One-shot Skeleton-based Action Recognition

Authors: Anqi Zhu, Qiuhong Ke, Mingming Gong, James Bailey

Abstract: Skeleton-based action recognition receives increasing attention because the skeleton representations reduce the amount of training data by eliminating visual information irrelevant to actions. To further improve the sample efficiency, meta-learning-based one-shot learning solutions were developed for skeleton-based action recognition. These methods find the nearest neighbor according to the simila… ▽ More Skeleton-based action recognition receives increasing attention because the skeleton representations reduce the amount of training data by eliminating visual information irrelevant to actions. To further improve the sample efficiency, meta-learning-based one-shot learning solutions were developed for skeleton-based action recognition. These methods find the nearest neighbor according to the similarity between instance-level global average embedding. However, such measurement holds unstable representativity due to inadequate generalized learning on local invariant and noisy features, while intuitively, more fine-grained recognition usually relies on determining key local body movements. To address this limitation, we present the Adaptive Local-Component-aware Graph Convolutional Network, which replaces the comparison metric with a focused sum of similarity measurements on aligned local embedding of action-critical spatial/temporal segments. Comprehensive one-shot experiments on the public benchmark of NTU-RGB+D 120 indicate that our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art. △ Less

Submitted 20 September, 2022; originally announced September 2022.

arXiv:2209.01425 [pdf, other]

Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition

Authors: Tianjiao Li, Lin Geng Foo, Qiuhong Ke, Hossein Rahmani, Anran Wang, Jinghua Wang, Jun Liu

Abstract: The goal of fine-grained action recognition is to successfully discriminate between action categories with subtle differences. To tackle this, we derive inspiration from the human visual system which contains specialized regions in the brain that are dedicated towards handling specific tasks. We design a novel Dynamic Spatio-Temporal Specialization (DSTS) module, which consists of specialized neur… ▽ More The goal of fine-grained action recognition is to successfully discriminate between action categories with subtle differences. To tackle this, we derive inspiration from the human visual system which contains specialized regions in the brain that are dedicated towards handling specific tasks. We design a novel Dynamic Spatio-Temporal Specialization (DSTS) module, which consists of specialized neurons that are only activated for a subset of samples that are highly similar. During training, the loss forces the specialized neurons to learn discriminative fine-grained differences to distinguish between these similar samples, improving fine-grained recognition. Moreover, a spatio-temporal specialization method further optimizes the architectures of the specialized neurons to capture either more spatial or temporal fine-grained information, to better tackle the large range of spatio-temporal variations in the videos. Lastly, we design an Upstream-Downstream Learning algorithm to optimize our model's dynamic decisions during training, improving the performance of our DSTS module. We obtain state-of-the-art performance on two widely-used fine-grained action recognition datasets. △ Less

Submitted 3 September, 2022; originally announced September 2022.

Comments: Accepted to ECCV 2022

arXiv:2207.12100 [pdf, other]

IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition

Authors: Yunsheng Pang, Qiuhong Ke, Hossein Rahmani, James Bailey, Jun Liu

Abstract: Human interaction recognition is very important in many applications. One crucial cue in recognizing an interaction is the interactive body parts. In this work, we propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based interaction recognition via modeling the interactive body parts as graphs. More specifically, the proposed IGFormer constructs interaction graphs accord… ▽ More Human interaction recognition is very important in many applications. One crucial cue in recognizing an interaction is the interactive body parts. In this work, we propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based interaction recognition via modeling the interactive body parts as graphs. More specifically, the proposed IGFormer constructs interaction graphs according to the semantic and distance correlations between the interactive body parts, and enhances the representation of each person by aggregating the information of the interactive body parts based on the learned graphs. Furthermore, we propose a Semantic Partition Module to transform each human skeleton sequence into a Body-Part-Time sequence to better capture the spatial and temporal information of the skeleton sequence for learning the graphs. Extensive experiments on three benchmark datasets demonstrate that our model outperforms the state-of-the-art with a significant margin. △ Less

Submitted 25 July, 2022; originally announced July 2022.

Comments: Accepted by ECCV 2022

arXiv:2207.10452 [pdf, ps, other]

doi 10.1142/S0217732322502327

Multi-photon-addition amplified coherent state

Authors: Xue-feng Zhan, Qiang Ke, Min-xiang Li, Xue-xiang Xu

Abstract: State $g^{\hat{n}}\hat{a}^{†m}\left\vert α\right\rangle $ and state $\hat{a}^{†m}g^{\hat{n}}\left\vert α\right\rangle $ are same to state $\hat{a}^{†m}\left\vert gα\right\rangle $, which is called as multi-photon-addition amplified coherent state (MPAACS) by us. Here, $\hat{n}$, $\hat{a}^{†}$, $\left\vert α\right\rangle $, $g$ ( $\geq 1$), and $m$ are photon number operator, creation operator, coh… ▽ More State $g^{\hat{n}}\hat{a}^{†m}\left\vert α\right\rangle $ and state $\hat{a}^{†m}g^{\hat{n}}\left\vert α\right\rangle $ are same to state $\hat{a}^{†m}\left\vert gα\right\rangle $, which is called as multi-photon-addition amplified coherent state (MPAACS) by us. Here, $\hat{n}$, $\hat{a}^{†}$, $\left\vert α\right\rangle $, $g$ ( $\geq 1$), and $m$ are photon number operator, creation operator, coherent state, gain facor, and an interger, respectively. We study mathematical and physical properties for these MPAACSs, including normalization, photon component analysis, Wigner function, effective gain, quadrature squeezing, and equivalent input noise. Actually, the MPAACS, which contains more nonclassicality, is an amplified version of photon-added coherent state (PACS) introduced by Agrwal and Tara [Phys. Rev. A 43, 492 (1991)]. Our work provides theoretical references for implementing amplifiers for light fields. △ Less

Submitted 21 July, 2022; originally announced July 2022.

Comments: 10 pages, 5 figures

arXiv:2207.09675 [pdf, other]

ERA: Expert Retrieval and Assembly for Early Action Prediction

Authors: Lin Geng Foo, Tianjiao Li, Hossein Rahmani, Qiuhong Ke, Jun Liu

Abstract: Early action prediction aims to successfully predict the class label of an action before it is completely performed. This is a challenging task because the beginning stages of different actions can be very similar, with only minor subtle differences for discrimination. In this paper, we propose a novel Expert Retrieval and Assembly (ERA) module that retrieves and assembles a set of experts most sp… ▽ More Early action prediction aims to successfully predict the class label of an action before it is completely performed. This is a challenging task because the beginning stages of different actions can be very similar, with only minor subtle differences for discrimination. In this paper, we propose a novel Expert Retrieval and Assembly (ERA) module that retrieves and assembles a set of experts most specialized at using discriminative subtle differences, to distinguish an input sample from other highly similar samples. To encourage our model to effectively use subtle differences for early action prediction, we push experts to discriminate exclusively between samples that are highly similar, forcing these experts to learn to use subtle differences that exist between those samples. Additionally, we design an effective Expert Learning Rate Optimization method that balances the experts' optimization and leads to better performance. We evaluate our ERA module on four public action datasets and achieve state-of-the-art performance. △ Less

Submitted 22 July, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

Comments: Accepted to ECCV 2022

arXiv:2207.02062 [pdf, other]

doi 10.1016/j.cviu.2023.103661

Image Amodal Completion: A Survey

Authors: Jiayang Ao, Qiuhong Ke, Krista A. Ehinger

Abstract: Existing computer vision systems can compete with humans in understanding the visible parts of objects, but still fall far short of humans when it comes to depicting the invisible parts of partially occluded objects. Image amodal completion aims to equip computers with human-like amodal completion functions to understand an intact object despite it being partially occluded. The main purpose of thi… ▽ More Existing computer vision systems can compete with humans in understanding the visible parts of objects, but still fall far short of humans when it comes to depicting the invisible parts of partially occluded objects. Image amodal completion aims to equip computers with human-like amodal completion functions to understand an intact object despite it being partially occluded. The main purpose of this survey is to provide an intuitive understanding of the research hotspots, key technologies and future trends in the field of image amodal completion. Firstly, we present a comprehensive review of the latest literature in this emerging field, exploring three key tasks in image amodal completion, including amodal shape completion, amodal appearance completion, and order perception. Then we examine popular datasets related to image amodal completion along with their common data collection methods and evaluation metrics. Finally, we discuss real-world applications and future research directions for image amodal completion, facilitating the reader's understanding of the challenges of existing technologies and upcoming research trends. △ Less

Submitted 7 November, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

Comments: Accepted at Computer Vision and Image Understanding. See https://doi.org/10.1016/j.cviu.2023.103661 for the final version

arXiv:2206.06544 [pdf, ps, other]

A Survey of Automated Data Augmentation Algorithms for Deep Learning-based Image Classification Tasks

Authors: Zihan Yang, Richard O. Sinnott, James Bailey, Qiuhong Ke

Abstract: In recent years, one of the most popular techniques in the computer vision community has been the deep learning technique. As a data-driven technique, deep model requires enormous amounts of accurately labelled training data, which is often inaccessible in many real-world applications. A data-space solution is Data Augmentation (DA), that can artificially generate new images out of original sample… ▽ More In recent years, one of the most popular techniques in the computer vision community has been the deep learning technique. As a data-driven technique, deep model requires enormous amounts of accurately labelled training data, which is often inaccessible in many real-world applications. A data-space solution is Data Augmentation (DA), that can artificially generate new images out of original samples. Image augmentation strategies can vary by dataset, as different data types might require different augmentations to facilitate model training. However, the design of DA policies has been largely decided by the human experts with domain knowledge, which is considered to be highly subjective and error-prone. To mitigate such problem, a novel direction is to automatically learn the image augmentation policies from the given dataset using Automated Data Augmentation (AutoDA) techniques. The goal of AutoDA models is to find the optimal DA policies that can maximize the model performance gains. This survey discusses the underlying reasons of the emergence of AutoDA technology from the perspective of image classification. We identify three key components of a standard AutoDA model: a search space, a search algorithm and an evaluation function. Based on their architecture, we provide a systematic taxonomy of existing image AutoDA approaches. This paper presents the major works in AutoDA field, discussing their pros and cons, and proposing several potential directions for future improvements. △ Less

Submitted 6 October, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: 68 pages, 9 figures. Submitted to Knowledge and Information Systems (KAIS)

MSC Class: A.1; I.4.3; I.5.2

arXiv:2205.03825 [pdf, other]

Iterative Geometry-Aware Cross Guidance Network for Stereo Image Inpainting

Authors: Ang Li, Shanshan Zhao, Qingjie Zhang, Qiuhong Ke

Abstract: Currently, single image inpainting has achieved promising results based on deep convolutional neural networks. However, inpainting on stereo images with missing regions has not been explored thoroughly, which is also a significant but different problem. One crucial requirement for stereo image inpainting is stereo consistency. To achieve it, we propose an Iterative Geometry-Aware Cross Guidance Ne… ▽ More Currently, single image inpainting has achieved promising results based on deep convolutional neural networks. However, inpainting on stereo images with missing regions has not been explored thoroughly, which is also a significant but different problem. One crucial requirement for stereo image inpainting is stereo consistency. To achieve it, we propose an Iterative Geometry-Aware Cross Guidance Network (IGGNet). The IGGNet contains two key ingredients, i.e., a Geometry-Aware Attention (GAA) module and an Iterative Cross Guidance (ICG) strategy. The GAA module relies on the epipolar geometry cues and learns the geometry-aware guidance from one view to another, which is beneficial to make the corresponding regions in two views consistent. However, learning guidance from co-existing missing regions is challenging. To address this issue, the ICG strategy is proposed, which can alternately narrow down the missing regions of the two views in an iterative manner. Experimental results demonstrate that our proposed network outperforms the latest stereo image inpainting model and state-of-the-art single image inpainting models. △ Less

Submitted 10 May, 2022; v1 submitted 8 May, 2022; originally announced May 2022.

Comments: Accepted by IJCAI 2022

arXiv:2202.01579 [pdf, ps, other]

doi 10.1103/PhysRevD.105.034016

Excited doubly heavy baryon production via $W^+$ boson decays

Authors: Peng-Hui Zhang, Lei Guo, Xu-Chang Zheng, Qi-Wei Ke

Abstract: In this paper, decay widths of the doubly heavy baryons ($Ξ_{cc} ~\text{and} ~Ξ_{bc}$) production are theoritically calculated in the whole phase space through $W^+ \to Ξ_{cc}+ \bar{c}+\bar{s}$ and $W^+ \to Ξ_{bc}+ \bar{b}+\bar{s}$, within the framework of nonrelativistic QCD (NRQCD). Differential widths $dΓ/ds_{12}$, $dΓ/ds_{23}$, $dΓ/dcosθ_{12}$, and $dΓ/dcosθ_{13}$ are also given. In addition t… ▽ More In this paper, decay widths of the doubly heavy baryons ($Ξ_{cc} ~\text{and} ~Ξ_{bc}$) production are theoritically calculated in the whole phase space through $W^+ \to Ξ_{cc}+ \bar{c}+\bar{s}$ and $W^+ \to Ξ_{bc}+ \bar{b}+\bar{s}$, within the framework of nonrelativistic QCD (NRQCD). Differential widths $dΓ/ds_{12}$, $dΓ/ds_{23}$, $dΓ/dcosθ_{12}$, and $dΓ/dcosθ_{13}$ are also given. In addition to the ordinary S wave contributions for the baryons, we specifically calculate P wave contributions as a comparison, namely the high excited states of the intermediate diquark, including $[^1P_1]$ and $[^3P_J]$ (with $J=0,1,2$) in both color anti-triplet state $\overline{\mathbf{3}}$ and color sextuplet state $\mathbf{6}$. It shows that the contribution from P wave is about one order lower than S wave. According to the results, we can expect plentiful events produced at the LHC, i.e., $3.69\times10^5$ $Ξ_{cc}$ events and $4.91\times10^4$ $Ξ_{bc}$ events per year. △ Less

Submitted 12 February, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

Comments: 11 pages, 5 figures

Journal ref: Physical Review D 105, 034016 (2022)

arXiv:2201.11900 [pdf]

doi 10.1039/D1TC05150D

Oxygen Deficient α-MoO3 with Promoted Adsorption and State-Quenching of H2O for Gas Sensor: A DFT Study

Authors: Changmeng Huan, Pu Wang, Binghan He, Yongqing Cai, Qingqing Ke

Abstract: Semiconducting oxides with reducible cations are ideal platforms for various functional applications in nanoelectronics and catalysts. Here we report an ultrathin monolayer alpha-MoO3 where tunable electronic properties and different gas adsorbing behaviors upon introducing the oxygen vacancies (VO). The unique property of alpha-MoO3 is that it contains three different types of oxygen atoms occupy… ▽ More Semiconducting oxides with reducible cations are ideal platforms for various functional applications in nanoelectronics and catalysts. Here we report an ultrathin monolayer alpha-MoO3 where tunable electronic properties and different gas adsorbing behaviors upon introducing the oxygen vacancies (VO). The unique property of alpha-MoO3 is that it contains three different types of oxygen atoms occupying three Wyckoff sites that are absent in other low-dimensional oxides and provides rich electronic hybridized states. The presence of VO triggers intermediate state in the gap at ~0.59 eV below the conduction band minimum and reduces the work function dramatically, together with new excitations at near infrared. The realigned Fermi level associated with the dangling state of VO reduces the neighboring Mo atoms and affects the gas adsorption thereafter. The binding energy of H2O molecules above VO is 2.5 times up to -0.75 eV compared with that of perfect lattice site and trends of transfer of electrons also reverse. The latter is related with the shallow localized state in the band gap due to H2O adsorbed above perfect MoO3 which becomes quenched upon adsorbing at the VO site. Those rich in-gap defective states in oxygen deficient MoO3, broadening the light absorption and promoting the uptake of water, are conductive to the application of alpha-MoO3 for optoelectronics, photothermal therapy, and sensor of moisture. △ Less

Submitted 27 January, 2022; originally announced January 2022.

Journal ref: J. Mater. Chem. C, 2022,10, 1839-1849

arXiv:2110.09783 [pdf, other]

Spatial-Temporal Transformer for 3D Point Cloud Sequences

Authors: Yimin Wei, Hao Liu, Tingting Xie, Qiuhong Ke, Yulan Guo

Abstract: Effective learning of spatial-temporal information within a point cloud sequence is highly important for many down-stream tasks such as 4D semantic segmentation and 3D action recognition. In this paper, we propose a novel framework named Point Spatial-Temporal Transformer (PST2) to learn spatial-temporal representations from dynamic 3D point cloud sequences. Our PST2 consists of two major modules:… ▽ More Effective learning of spatial-temporal information within a point cloud sequence is highly important for many down-stream tasks such as 4D semantic segmentation and 3D action recognition. In this paper, we propose a novel framework named Point Spatial-Temporal Transformer (PST2) to learn spatial-temporal representations from dynamic 3D point cloud sequences. Our PST2 consists of two major modules: a Spatio-Temporal Self-Attention (STSA) module and a Resolution Embedding (RE) module. Our STSA module is introduced to capture the spatial-temporal context information across adjacent frames, while the RE module is proposed to aggregate features across neighbors to enhance the resolution of feature maps. We test the effectiveness our PST2 with two different tasks on point cloud sequences, i.e., 4D semantic segmentation and 3D action recognition. Extensive experiments on three benchmarks show that our PST2 outperforms existing methods on all datasets. The effectiveness of our STSA and RE modules have also been justified with ablation experiments. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Journal ref: WACV2022

arXiv:2108.08344 [pdf, other]

The Multi-Modal Video Reasoning and Analyzing Competition

Authors: Haoran Peng, He Huang, Li Xu, Tianjiao Li, Jun Liu, Hossein Rahmani, Qiuhong Ke, Zhicheng Guo, Cong Wu, Rongchang Li, Mang Ye, Jiahao Wang, Jiaxu Zhang, Yuanzhong Liu, Tao He, Fuwei Zhang, Xianbin Liu, Tao Lin

Abstract: In this paper, we introduce the Multi-Modal Video Reasoning and Analyzing Competition (MMVRAC) workshop in conjunction with ICCV 2021. This competition is composed of four different tracks, namely, video question answering, skeleton-based action recognition, fisheye video-based action recognition, and person re-identification, which are based on two datasets: SUTD-TrafficQA and UAV-Human. We summa… ▽ More In this paper, we introduce the Multi-Modal Video Reasoning and Analyzing Competition (MMVRAC) workshop in conjunction with ICCV 2021. This competition is composed of four different tracks, namely, video question answering, skeleton-based action recognition, fisheye video-based action recognition, and person re-identification, which are based on two datasets: SUTD-TrafficQA and UAV-Human. We summarize the top-performing methods submitted by the participants in this competition and show their results achieved in the competition. △ Less

Submitted 18 August, 2021; originally announced August 2021.

Comments: Accepted to ICCV 2021 Workshops

ACM Class: I.2.10; I.2.6

arXiv:2107.09176 [pdf]

Temporal search in the scientific space predicts breakthrough inventions

Authors: Chao Min, Qing Ke

Abstract: The development of inventions is theorized as a process of searching and recombining existing knowledge components. Previous studies under this theory have examined myriad characteristics of recombined knowledge and their performance implications. One feature that has received much attention is technological knowledge age. Yet, little is known about how the age of scientific knowledge influences t… ▽ More The development of inventions is theorized as a process of searching and recombining existing knowledge components. Previous studies under this theory have examined myriad characteristics of recombined knowledge and their performance implications. One feature that has received much attention is technological knowledge age. Yet, little is known about how the age of scientific knowledge influences the impact of inventions, despite the widely known catalyzing role of science in the creation of new technologies. Here we use a large corpus of patents and derive features characterizing how patents temporally search in the scientific space. We find that patents that cite scientific papers have more citations and substantially more likely to become breakthroughs. Conditional on searching in the scientific space, referencing more recent papers increases the impact of patents and the likelihood of being breakthroughs. However, this positive effect can be offset if patents cite papers whose ages exhibit a low variance. These effects are consistent across technological fields. △ Less

Submitted 19 July, 2021; originally announced July 2021.

arXiv:2106.06487 [pdf, other]

A dataset of mentorship in science with semantic and demographic estimations

Authors: Qing Ke, Lizhen Liang, Ying Ding, Stephen V. David, Daniel E. Acuna

Abstract: Mentorship in science is crucial for topic choice, career decisions, and the success of mentees and mentors. Typically, researchers who study mentorship use article co-authorship and doctoral dissertation datasets. However, available datasets of this type focus on narrow selections of fields and miss out on early career and non-publication-related interactions. Here, we describe MENTORSHIP, a crow… ▽ More Mentorship in science is crucial for topic choice, career decisions, and the success of mentees and mentors. Typically, researchers who study mentorship use article co-authorship and doctoral dissertation datasets. However, available datasets of this type focus on narrow selections of fields and miss out on early career and non-publication-related interactions. Here, we describe MENTORSHIP, a crowdsourced dataset of 743176 mentorship relationships among 738989 scientists across 112 fields that avoids these shortcomings. We enrich the scientists' profiles with publication data from the Microsoft Academic Graph and "semantic" representations of research using deep learning content analysis. Because gender and race have become critical dimensions when analyzing mentorship and disparities in science, we also provide estimations of these factors. We perform extensive validations of the profile--publication matching, semantic content, and demographic inferences. We anticipate this dataset will spur the study of mentorship in science and deepen our understanding of its role in scientists' career outcomes. △ Less

Submitted 11 June, 2021; originally announced June 2021.

Comments: Data can be found at https://doi.org/10.5281/zenodo.4917086

arXiv:2106.01532 [pdf, other]

Noise Doesn't Lie: Towards Universal Detection of Deep Inpainting

Authors: Ang Li, Qiuhong Ke, Xingjun Ma, Haiqin Weng, Zhiyuan Zong, Feng Xue, Rui Zhang

Abstract: Deep image inpainting aims to restore damaged or missing regions in an image with realistic contents. While having a wide range of applications such as object removal and image recovery, deep inpainting techniques also have the risk of being manipulated for image forgery. A promising countermeasure against such forgeries is deep inpainting detection, which aims to locate the inpainted regions in a… ▽ More Deep image inpainting aims to restore damaged or missing regions in an image with realistic contents. While having a wide range of applications such as object removal and image recovery, deep inpainting techniques also have the risk of being manipulated for image forgery. A promising countermeasure against such forgeries is deep inpainting detection, which aims to locate the inpainted regions in an image. In this paper, we make the first attempt towards universal detection of deep inpainting, where the detection network can generalize well when detecting different deep inpainting methods. To this end, we first propose a novel data generation approach to generate a universal training dataset, which imitates the noise discrepancies exist in real versus inpainted image contents to train universal detectors. We then design a Noise-Image Cross-fusion Network (NIX-Net) to effectively exploit the discriminative information contained in both the images and their noise patterns. We empirically show, on multiple benchmark datasets, that our approach outperforms existing detection methods by a large margin and generalize well to unseen deep inpainting techniques. Our universal training dataset can also significantly boost the generalizability of existing detection methods. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Comments: Accepted by IJCAI 2021

arXiv:2105.11537 [pdf, other]

Graph Neural Network Based VC Investment Success Prediction

Authors: Shiwei Lyu, Shuai Ling, Kaihao Guo, Haipeng Zhang, Kunpeng Zhang, Suting Hong, Qing Ke, Jinjie Gu

Abstract: Predicting the start-ups that will eventually succeed is essentially important for the venture capital business and worldwide policy makers, especially at an early stage such that rewards can possibly be exponential. Though various empirical studies and data-driven modeling work have been done, the predictive power of the complex networks of stakeholders including venture capital investors, star… ▽ More Predicting the start-ups that will eventually succeed is essentially important for the venture capital business and worldwide policy makers, especially at an early stage such that rewards can possibly be exponential. Though various empirical studies and data-driven modeling work have been done, the predictive power of the complex networks of stakeholders including venture capital investors, start-ups, and start-ups' managing members has not been thoroughly explored. We design an incremental representation learning mechanism and a sequential learning model, utilizing the network structure together with the rich attributes of the nodes. In general, our method achieves the state-of-the-art prediction performance on a comprehensive dataset of global venture capital investments and surpasses human investors by large margins. Specifically, it excels at predicting the outcomes for start-ups in industries such as healthcare and IT. Meanwhile, we shed light on impacts on start-up success from observable factors including gender, education, and networking, which can be of value for practitioners as well as policy makers when they screen ventures of high growth potentials. △ Less

Submitted 25 May, 2021; originally announced May 2021.

Comments: 11pages, 5figures

arXiv:2103.04778 [pdf, other]

Bridging the Distribution Gap of Visible-Infrared Person Re-identification with Modality Batch Normalization

Authors: Wenkang Li, Qi Ke, Wenbin Chen, Yicong Zhou

Abstract: Visible-infrared cross-modality person re-identification (VI-ReID), whose aim is to match person images between visible and infrared modality, is a challenging cross-modality image retrieval task. Most existing works integrate batch normalization layers into their neural network, but we found out that batch normalization layers would lead to two types of distribution gap: 1) inter-mini-batch distr… ▽ More Visible-infrared cross-modality person re-identification (VI-ReID), whose aim is to match person images between visible and infrared modality, is a challenging cross-modality image retrieval task. Most existing works integrate batch normalization layers into their neural network, but we found out that batch normalization layers would lead to two types of distribution gap: 1) inter-mini-batch distribution gap -- the distribution gap of the same modality between each mini-batch; 2) intra-mini-batch modality distribution gap -- the distribution gap of different modality within the same mini-batch. To address these problems, we propose a new batch normalization layer called Modality Batch Normalization (MBN), which normalizes each modality sub-mini-batch respectively instead of the whole mini-batch, and can reduce these distribution gap significantly. Extensive experiments show that our MBN is able to boost the performance of VI-ReID models, even with different datasets, backbones and losses. △ Less

Submitted 8 March, 2021; originally announced March 2021.

arXiv:2101.10897 [pdf, other]

HexCNN: A Framework for Native Hexagonal Convolutional Neural Networks

Authors: Yunxiang Zhao, Qiuhong Ke, Flip Korn, Jianzhong Qi, Rui Zhang

Abstract: Hexagonal CNN models have shown superior performance in applications such as IACT data analysis and aerial scene classification due to their better rotation symmetry and reduced anisotropy. In order to realize hexagonal processing, existing studies mainly use the ZeroOut method to imitate hexagonal processing, which causes substantial memory and computation overheads. We address this deficiency wi… ▽ More Hexagonal CNN models have shown superior performance in applications such as IACT data analysis and aerial scene classification due to their better rotation symmetry and reduced anisotropy. In order to realize hexagonal processing, existing studies mainly use the ZeroOut method to imitate hexagonal processing, which causes substantial memory and computation overheads. We address this deficiency with a novel native hexagonal CNN framework named HexCNN. HexCNN takes hexagon-shaped input and performs forward and backward propagation on the original form of the input based on hexagon-shaped filters, hence avoiding computation and memory overheads caused by imitation. For applications with rectangle-shaped input but require hexagonal processing, HexCNN can be applied by padding the input into hexagon-shape as preprocessing. In this case, we show that the time and space efficiency of HexCNN still outperforms existing hexagonal CNN methods substantially. Experimental results show that compared with the state-of-the-art models, which imitate hexagonal processing but using rectangle-shaped filters, HexCNN reduces the training time by up to 42.2%. Meanwhile, HexCNN saves the memory space cost by up to 25% and 41.7% for loading the input and performing convolution, respectively. △ Less

Submitted 25 January, 2021; originally announced January 2021.

arXiv:2101.06704 [pdf, other]

Adversarial Interaction Attack: Fooling AI to Misinterpret Human Intentions

Authors: Nodens Koren, Qiuhong Ke, Yisen Wang, James Bailey, Xingjun Ma

Abstract: Understanding the actions of both humans and artificial intelligence (AI) agents is important before modern AI systems can be fully integrated into our daily life. In this paper, we show that, despite their current huge success, deep learning based AI systems can be easily fooled by subtle adversarial noise to misinterpret the intention of an action in interaction scenarios. Based on a case study… ▽ More Understanding the actions of both humans and artificial intelligence (AI) agents is important before modern AI systems can be fully integrated into our daily life. In this paper, we show that, despite their current huge success, deep learning based AI systems can be easily fooled by subtle adversarial noise to misinterpret the intention of an action in interaction scenarios. Based on a case study of skeleton-based human interactions, we propose a novel adversarial attack on interactions, and demonstrate how DNN-based interaction models can be tricked to predict the participants' reactions in unexpected ways. From a broader perspective, the scope of our proposed attack method is not confined to problems related to skeleton data but can also be extended to any type of problems involving sequential regressions. Our study highlights potential risks in the interaction loop with AI and humans, which need to be carefully addressed when deploying AI systems in safety-critical applications. △ Less

Submitted 17 January, 2021; originally announced January 2021.

Comments: Preprint

arXiv:2012.11866 [pdf, other]

doi 10.1109/TPAMI.2022.3183112

Human Action Recognition from Various Data Modalities: A Review

Authors: Zehua Sun, Qiuhong Ke, Hossein Rahmani, Mohammed Bennamoun, Gang Wang, Jun Liu

Abstract: Human Action Recognition (HAR) aims to understand human behavior and assign a label to each action. It has a wide range of applications, and therefore has been attracting increasing attention in the field of computer vision. Human actions can be represented using various data modalities, such as RGB, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, radar, and WiFi signal,… ▽ More Human Action Recognition (HAR) aims to understand human behavior and assign a label to each action. It has a wide range of applications, and therefore has been attracting increasing attention in the field of computer vision. Human actions can be represented using various data modalities, such as RGB, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, radar, and WiFi signal, which encode different sources of useful yet distinct information and have various advantages depending on the application scenarios. Consequently, lots of existing works have attempted to investigate different types of approaches for HAR using various modalities. In this paper, we present a comprehensive survey of recent progress in deep learning methods for HAR based on the type of input data modality. Specifically, we review the current mainstream deep learning methods for single data modalities and multiple data modalities, including the fusion-based and the co-learning-based frameworks. We also present comparative results on several benchmark datasets for HAR, together with insightful observations and inspiring future research directions. △ Less

Submitted 21 June, 2022; v1 submitted 22 December, 2020; originally announced December 2020.

arXiv:2010.09925 [pdf, other]

doi 10.1109/TIP.2020.3031173

Hierarchical Paired Channel Fusion Network for Street Scene Change Detection

Authors: Yinjie Lei, Duo Peng, Pingping Zhang, Qiuhong Ke, Haifeng Li

Abstract: Street Scene Change Detection (SSCD) aims to locate the changed regions between a given street-view image pair captured at different times, which is an important yet challenging task in the computer vision community. The intuitive way to solve the SSCD task is to fuse the extracted image feature pairs, and then directly measure the dissimilarity parts for producing a change map. Therefore, the key… ▽ More Street Scene Change Detection (SSCD) aims to locate the changed regions between a given street-view image pair captured at different times, which is an important yet challenging task in the computer vision community. The intuitive way to solve the SSCD task is to fuse the extracted image feature pairs, and then directly measure the dissimilarity parts for producing a change map. Therefore, the key for the SSCD task is to design an effective feature fusion method that can improve the accuracy of the corresponding change maps. To this end, we present a novel Hierarchical Paired Channel Fusion Network (HPCFNet), which utilizes the adaptive fusion of paired feature channels. Specifically, the features of a given image pair are jointly extracted by a Siamese Convolutional Neural Network (SCNN) and hierarchically combined by exploring the fusion of channel pairs at multiple feature levels. In addition, based on the observation that the distribution of scene changes is diverse, we further propose a Multi-Part Feature Learning (MPFL) strategy to detect diverse changes. Based on the MPFL strategy, our framework achieves a novel approach to adapt to the scale and location diversities of the scene change regions. Extensive experiments on three public datasets (i.e., PCD, VL-CMU-CD and CDnet2014) demonstrate that the proposed framework achieves superior performance which outperforms other state-of-the-art methods with a considerable margin. △ Less

Submitted 19 October, 2020; originally announced October 2020.

Comments: To appear in Transactions on Image Processing, including 13 pages, 13 figures, 9 tables

arXiv:2009.01142 [pdf, other]

Long-Term Anticipation of Activities with Cycle Consistency

Authors: Yazan Abu Farha, Qiuhong Ke, Bernt Schiele, Juergen Gall

Abstract: With the success of deep learning methods in analyzing activities in videos, more attention has recently been focused towards anticipating future activities. However, most of the work on anticipation either analyzes a partially observed activity or predicts the next action class. Recently, new approaches have been proposed to extend the prediction horizon up to several minutes in the future and th… ▽ More With the success of deep learning methods in analyzing activities in videos, more attention has recently been focused towards anticipating future activities. However, most of the work on anticipation either analyzes a partially observed activity or predicts the next action class. Recently, new approaches have been proposed to extend the prediction horizon up to several minutes in the future and that anticipate a sequence of future activities including their durations. While these works decouple the semantic interpretation of the observed sequence from the anticipation task, we propose a framework for anticipating future activities directly from the features of the observed frames and train it in an end-to-end fashion. Furthermore, we introduce a cycle consistency loss over time by predicting the past activities given the predicted future. Our framework achieves state-of-the-art results on two datasets: the Breakfast dataset and 50Salads. △ Less

Submitted 2 September, 2020; originally announced September 2020.

Comments: GCPR 2020

Showing 1–50 of 77 results for author: Ke, Q