subscribe to arXiv mailings

Offline Inverse Constrained Reinforcement Learning for Safe-Critical Decision Making in Healthcare

Authors: Nan Fang, Guiliang Liu, Wei Gong

Abstract: Reinforcement Learning (RL) applied in healthcare can lead to unsafe medical decisions and treatment, such as excessive dosages or abrupt changes, often due to agents overlooking common-sense constraints. Consequently, Constrained Reinforcement Learning (CRL) is a natural choice for safe decisions. However, specifying the exact cost function is inherently difficult in healthcare. Recent Inverse Co… ▽ More Reinforcement Learning (RL) applied in healthcare can lead to unsafe medical decisions and treatment, such as excessive dosages or abrupt changes, often due to agents overlooking common-sense constraints. Consequently, Constrained Reinforcement Learning (CRL) is a natural choice for safe decisions. However, specifying the exact cost function is inherently difficult in healthcare. Recent Inverse Constrained Reinforcement Learning (ICRL) is a promising approach that infers constraints from expert demonstrations. ICRL algorithms model Markovian decisions in an interactive environment. These settings do not align with the practical requirement of a decision-making system in healthcare, where decisions rely on historical treatment recorded in an offline dataset. To tackle these issues, we propose the Constraint Transformer (CT). Specifically, 1) we utilize a causal attention mechanism to incorporate historical decisions and observations into the constraint modeling, while employing a Non-Markovian layer for weighted constraints to capture critical states. 2) A generative world model is used to perform exploratory data augmentation, enabling offline RL methods to simulate unsafe decision sequences. In multiple medical scenarios, empirical results demonstrate that CT can capture unsafe states and achieve strategies that approximate lower mortality rates, reducing the occurrence probability of unsafe behaviors. △ Less

Submitted 14 October, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

arXiv:2407.10862 [pdf, other]

R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection

Authors: Zheyuan Zhou, Le Wang, Naiyu Fang, Zili Wang, Lemiao Qiu, Shuyou Zhang

Abstract: 3D anomaly detection plays a crucial role in monitoring parts for localized inherent defects in precision manufacturing. Embedding-based and reconstruction-based approaches are among the most popular and successful methods. However, there are two major challenges to the practical application of the current approaches: 1) the embedded models suffer the prohibitive computational and storage due to t… ▽ More 3D anomaly detection plays a crucial role in monitoring parts for localized inherent defects in precision manufacturing. Embedding-based and reconstruction-based approaches are among the most popular and successful methods. However, there are two major challenges to the practical application of the current approaches: 1) the embedded models suffer the prohibitive computational and storage due to the memory bank structure; 2) the reconstructive models based on the MAE mechanism fail to detect anomalies in the unmasked regions. In this paper, we propose R3D-AD, reconstructing anomalous point clouds by diffusion model for precise 3D anomaly detection. Our approach capitalizes on the data distribution conversion of the diffusion process to entirely obscure the input's anomalous geometry. It step-wisely learns a strict point-level displacement behavior, which methodically corrects the aberrant points. To increase the generalization of the model, we further present a novel 3D anomaly simulation strategy named Patch-Gen to generate realistic and diverse defect shapes, which narrows the domain gap between training and testing. Our R3D-AD ensures a uniform spatial transformation, which allows straightforwardly generating anomaly results by distance comparison. Extensive experiments show that our R3D-AD outperforms previous state-of-the-art methods, achieving 73.4% Image-level AUROC on the Real3D-AD dataset and 74.9% Image-level AUROC on the Anomaly-ShapeNet dataset with an exceptional efficiency. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2401.12433 [pdf, other]

A Novel Garment Transfer Method Supervised by Distilled Knowledge of Virtual Try-on Model

Authors: Naiyu Fang, Lemiao Qiu, Shuyou Zhang, Zili Wang, Kerui Hu, Jianrong Tan

Abstract: This paper proposes a novel garment transfer method supervised with knowledge distillation from virtual try-on. Our method first reasons the transfer parsing to provide shape prior to downstream tasks. We employ a multi-phase teaching strategy to supervise the training of the transfer parsing reasoning model, learning the response and feature knowledge from the try-on parsing reasoning model. To c… ▽ More This paper proposes a novel garment transfer method supervised with knowledge distillation from virtual try-on. Our method first reasons the transfer parsing to provide shape prior to downstream tasks. We employ a multi-phase teaching strategy to supervise the training of the transfer parsing reasoning model, learning the response and feature knowledge from the try-on parsing reasoning model. To correct the teaching error, it transfers the garment back to its owner to absorb the hard knowledge in the self-study phase. Guided by the transfer parsing, we adjust the position of the transferred garment via STN to prevent distortion. Afterward, we estimate a progressive flow to precisely warp the garment with shape and content correspondences. To ensure warping rationality, we supervise the training of the garment warping model using target shape and warping knowledge from virtual try-on. To better preserve body features in the transfer result, we propose a well-designed training strategy for the arm regrowth task to infer new exposure skin. Experiments demonstrate that our method has state-of-the-art performance compared with other virtual try-on and garment transfer methods in garment transfer, especially for preserving garment texture and body features. △ Less

Submitted 4 April, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

arXiv:2309.17059 [pdf, other]

GSDC Transformer: An Efficient and Effective Cue Fusion for Monocular Multi-Frame Depth Estimation

Authors: Naiyu Fang, Lemiao Qiu, Shuyou Zhang, Zili Wang, Zheyuan Zhou, Kerui Hu

Abstract: Depth estimation provides an alternative approach for perceiving 3D information in autonomous driving. Monocular depth estimation, whether with single-frame or multi-frame inputs, has achieved significant success by learning various types of cues and specializing in either static or dynamic scenes. Recently, these cues fusion becomes an attractive topic, aiming to enable the combined cues to perfo… ▽ More Depth estimation provides an alternative approach for perceiving 3D information in autonomous driving. Monocular depth estimation, whether with single-frame or multi-frame inputs, has achieved significant success by learning various types of cues and specializing in either static or dynamic scenes. Recently, these cues fusion becomes an attractive topic, aiming to enable the combined cues to perform well in both types of scenes. However, adaptive cue fusion relies on attention mechanisms, where the quadratic complexity limits the granularity of cue representation. Additionally, explicit cue fusion depends on precise segmentation, which imposes a heavy burden on mask prediction. To address these issues, we propose the GSDC Transformer, an efficient and effective component for cue fusion in monocular multi-frame depth estimation. We utilize deformable attention to learn cue relationships at a fine scale, while sparse attention reduces computational requirements when granularity increases. To compensate for the precision drop in dynamic scenes, we represent scene attributes in the form of super tokens without relying on precise shapes. Within each super token attributed to dynamic scenes, we gather its relevant cues and learn local dense relationships to enhance cue fusion. Our method achieves state-of-the-art performance on the KITTI dataset with efficient fusion speed. △ Less

Submitted 4 December, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

arXiv:2304.08956 [pdf, other]

PG-VTON: A Novel Image-Based Virtual Try-On Method via Progressive Inference Paradigm

Authors: Naiyu Fang, Lemiao Qiu, Shuyou Zhang, Zili Wang, Kerui Hu

Abstract: Virtual try-on is a promising computer vision topic with a high commercial value wherein a new garment is visually worn on a person with a photo-realistic effect. Previous studies conduct their shape and content inference at one stage, employing a single-scale warping mechanism and a relatively unsophisticated content inference mechanism. These approaches have led to suboptimal results in terms of… ▽ More Virtual try-on is a promising computer vision topic with a high commercial value wherein a new garment is visually worn on a person with a photo-realistic effect. Previous studies conduct their shape and content inference at one stage, employing a single-scale warping mechanism and a relatively unsophisticated content inference mechanism. These approaches have led to suboptimal results in terms of garment warping and skin reservation under challenging try-on scenarios. To address these limitations, we propose a novel virtual try-on method via progressive inference paradigm (PGVTON) that leverages a top-down inference pipeline and a general garment try-on strategy. Specifically, we propose a robust try-on parsing inference method by disentangling semantic categories and introducing consistency. Exploiting the try-on parsing as the shape guidance, we implement the garment try-on via warping-mapping-composition. To facilitate adaptation to a wide range of try-on scenarios, we adopt a covering more and selecting one warping strategy and explicitly distinguish tasks based on alignment. Additionally, we regulate StyleGAN2 to implement re-naked skin inpainting, conditioned on the target skin shape and spatial-agnostic skin features. Experiments demonstrate that our method has state-of-the-art performance under two challenging scenarios. The code will be available at https://github.com/NerdFNY/PGVTON. △ Less

Submitted 4 December, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

arXiv:2304.03650 [pdf, other]

A Cross-Scale Hierarchical Transformer with Correspondence-Augmented Attention for inferring Bird's-Eye-View Semantic Segmentation

Authors: Naiyu Fang, Lemiao Qiu, Shuyou Zhang, Zili Wang, Kerui Hu, Kang Wang

Abstract: As bird's-eye-view (BEV) semantic segmentation is simple-to-visualize and easy-to-handle, it has been applied in autonomous driving to provide the surrounding information to downstream tasks. Inferring BEV semantic segmentation conditioned on multi-camera-view images is a popular scheme in the community as cheap devices and real-time processing. The recent work implemented this task by learning th… ▽ More As bird's-eye-view (BEV) semantic segmentation is simple-to-visualize and easy-to-handle, it has been applied in autonomous driving to provide the surrounding information to downstream tasks. Inferring BEV semantic segmentation conditioned on multi-camera-view images is a popular scheme in the community as cheap devices and real-time processing. The recent work implemented this task by learning the content and position relationship via the vision Transformer (ViT). However, the quadratic complexity of ViT confines the relationship learning only in the latent layer, leaving the scale gap to impede the representation of fine-grained objects. And their plain fusion method of multi-view features does not conform to the information absorption intention in representing BEV features. To tackle these issues, we propose a novel cross-scale hierarchical Transformer with correspondence-augmented attention for semantic segmentation inferring. Specifically, we devise a hierarchical framework to refine the BEV feature representation, where the last size is only half of the final segmentation. To save the computation increase caused by this hierarchical framework, we exploit the cross-scale Transformer to learn feature relationships in a reversed-aligning way, and leverage the residual connection of BEV features to facilitate information transmission between scales. We propose correspondence-augmented attention to distinguish conducive and inconducive correspondences. It is implemented in a simple yet effective way, amplifying attention scores before the Softmax operation, so that the position-view-related and the position-view-disrelated attention scores are highlighted and suppressed. Extensive experiments demonstrate that our method has state-of-the-art performance in inferring BEV semantic segmentation conditioned on multi-camera-view images. △ Less

Submitted 17 August, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

arXiv:2004.06355 [pdf, other]

doi 10.1364/OE.395204

On the interplay between physical and content priors in deep learning for computational imaging

Authors: Mo Deng, Shuai Li, Iksung Kang, Nicholas X. Fang, George Barbastathis

Abstract: Deep learning (DL) has been applied extensively in many computational imaging problems, often leading to superior performance over traditional iterative approaches. However, two important questions remain largely unanswered: first, how well can the trained neural network generalize to objects very different from the ones in training? This is particularly important in practice, since large-scale an… ▽ More Deep learning (DL) has been applied extensively in many computational imaging problems, often leading to superior performance over traditional iterative approaches. However, two important questions remain largely unanswered: first, how well can the trained neural network generalize to objects very different from the ones in training? This is particularly important in practice, since large-scale annotated examples similar to those of interest are often not available during training. Second, has the trained neural network learnt the underlying (inverse) physics model, or has it merely done something trivial, such as memorizing the examples or point-wise pattern matching? This pertains to the interpretability of machine-learning based algorithms. In this work, we use the Phase Extraction Neural Network (PhENN), a deep neural network (DNN) for quantitative phase retrieval in a lensless phase imaging system as the standard platform and show that the two questions are related and share a common crux: the choice of the training examples. Moreover, we connect the strength of the regularization effect imposed by a training set to the training process with the Shannon entropy of images in the dataset. That is, the higher the entropy of the training images, the weaker the regularization effect can be imposed. We also discover that weaker regularization effect leads to better learning of the underlying propagation model, i.e. the weak object transfer function, applicable for weakly scattering objects under the weak object approximation. Finally, simulation and experimental results show that better cross-domain generalization performance can be achieved if DNN is trained on a higher-entropy database, e.g. the ImageNet, than if the same DNN is trained on a lower-entropy database, e.g. MNIST, as the former allows the underlying physics model be learned better than the latter. △ Less

Submitted 14 April, 2020; originally announced April 2020.

arXiv:1810.05372 [pdf, ps, other]

Granularity of wagers in games and the possibility of savings

Authors: George Barmpalias, Nan Fang

Abstract: In a casino where arbitrarily small bets are admissible, any betting strategy M can be modified into a savings strategy that, not only is successful on each casino sequence where M is (thus accumulating unbounded wealth inside the casino) but also saves an unbounded capital, by permanently and gradually withdrawing it from the game. Teutsch showed that this is no longer the case when a fixed minim… ▽ More In a casino where arbitrarily small bets are admissible, any betting strategy M can be modified into a savings strategy that, not only is successful on each casino sequence where M is (thus accumulating unbounded wealth inside the casino) but also saves an unbounded capital, by permanently and gradually withdrawing it from the game. Teutsch showed that this is no longer the case when a fixed minimum wager is imposed by the casino, thus exemplifying a savings paradox where a player can win unbounded wealth inside the casino, but upon withdrawing a sufficiently large amount out of the game, he is forced into bankruptcy. We study the potential for saving under a shrinking minimum wager rule (granularity) and its dependence on the rate of decrease (inflation) as well as timid versus bold play. △ Less

Submitted 12 June, 2020; v1 submitted 12 October, 2018; originally announced October 2018.

arXiv:1807.04635 [pdf, ps, other]

Monotonous betting strategies in warped casinos

Authors: George Barmpalias, Nan Fang, Andrew Lewis-Pye

Abstract: Suppose that the outcomes of a roulette table are not entirely random, in the sense that there exists a successful betting strategy. Is there a successful `separable' strategy, in the sense that it does not use the winnings from betting on red in order to bet on black, and vice-versa? We study this question from an algorithmic point of view and observe that every strategy $M$ can be replaced by a… ▽ More Suppose that the outcomes of a roulette table are not entirely random, in the sense that there exists a successful betting strategy. Is there a successful `separable' strategy, in the sense that it does not use the winnings from betting on red in order to bet on black, and vice-versa? We study this question from an algorithmic point of view and observe that every strategy $M$ can be replaced by a separable strategy which is computable from $M$ and successful on any outcome-sequence where $M$ is successful. We then consider the case of mixtures and show: (a) there exists an effective mixture of separable strategies which succeeds on every casino sequence with effective Hausdorff dimension less than 1/2; (b) there exists a casino sequence of effective Hausdorff dimension 1/2 on which no effective mixture of separable strategies succeeds. Finally we extend (b) to a more general class of strategies. △ Less

Submitted 17 April, 2019; v1 submitted 10 July, 2018; originally announced July 2018.

arXiv:1801.02566 [pdf, ps, other]

Equivalences between learning of data and probability distributions, and their applications

Authors: George Barmpalias, Nan Fang, Frank Stephan

Abstract: Algorithmic learning theory traditionally studies the learnability of effective infinite binary sequences (reals), while recent work by [Vitanyi and Chater, 2017] and [Bienvenu et al., 2014] has adapted this framework to the study of learnability of effective probability distributions from random data. We prove that for certain families of probability measures that are parametrized by reals, learn… ▽ More Algorithmic learning theory traditionally studies the learnability of effective infinite binary sequences (reals), while recent work by [Vitanyi and Chater, 2017] and [Bienvenu et al., 2014] has adapted this framework to the study of learnability of effective probability distributions from random data. We prove that for certain families of probability measures that are parametrized by reals, learnability of a subclass of probability measures is equivalent to learnability of the class of the corresponding real parameters. This equivalence allows to transfer results from classical algorithmic theory to learning theory of probability measures. We present a number of such applications, providing many new results regarding EX and BC learnability of classes of measures, thus drawing parallels between the two learning theories. △ Less

Submitted 14 July, 2018; v1 submitted 5 January, 2018; originally announced January 2018.

arXiv:1602.03208 [pdf, ps, other]

Optimal asymptotic bounds on the oracle use in computations from Chaitin's Omega

Authors: George Barmpalias, Nan Fang, Andrew Lewis-Pye

Abstract: Chaitin's number Omega is the halting probability of a universal prefix-free machine, and although it depends on the underlying enumeration of prefix-free machines, it is always Turing-complete. It can be observed, in fact, that for every computably enumerable (c.e.) real, there exists a Turing functional via which Omega computes it, and such that the number of bits of omega that are needed for th… ▽ More Chaitin's number Omega is the halting probability of a universal prefix-free machine, and although it depends on the underlying enumeration of prefix-free machines, it is always Turing-complete. It can be observed, in fact, that for every computably enumerable (c.e.) real, there exists a Turing functional via which Omega computes it, and such that the number of bits of omega that are needed for the computation of the first n bits of the given number (i.e. the use on argument n) is bounded above by a computable function h(n) = n+o(n). We characterise the asymptotic upper bounds on the use of Chaitin's omega in oracle computations of halting probabilities (i.e. c.e. reals). We show that the following two conditions are equivalent for any computable function h such that h(n)-n is non-decreasing: (1) h(n)-n is an information content measure, (2) for every c.e. real there exists a Turing functional via which omega computes the real with use bounded by h. We also give a similar characterisation with respect to computations of c.e. sets from Omega, by showing that the following are equivalent for any computable non-decreasing function g: (1) g is an information-content measure, (2) for every c.e. set A, Omega computes A with use bounded by g. Further results and some connections with Solovay functions are given. △ Less

Submitted 3 May, 2016; v1 submitted 5 February, 2016; originally announced February 2016.

Showing 1–11 of 11 results for author: Fang, N