-
Decomposable Sparse Tensor on Tensor Regression
Authors:
Haiyi Mao,
Jason Xiaotian Dou
Abstract:
Most regularized tensor regression research focuses on tensors predictors with scalars responses or vectors predictors to tensors responses. We consider the sparse low rank tensor on tensor regression where predictors $\mathcal{X}$ and responses $\mathcal{Y}$ are both high-dimensional tensors. By demonstrating that the general inner product or the contracted product on a unit rank tensor can be de…
▽ More
Most regularized tensor regression research focuses on tensors predictors with scalars responses or vectors predictors to tensors responses. We consider the sparse low rank tensor on tensor regression where predictors $\mathcal{X}$ and responses $\mathcal{Y}$ are both high-dimensional tensors. By demonstrating that the general inner product or the contracted product on a unit rank tensor can be decomposed into standard inner products and outer products, the problem can be simply transformed into a tensor to scalar regression followed by a tensor decomposition. So we propose a fast solution based on stagewise search composed by contraction part and generation part which are optimized alternatively. We successfully demonstrate our method can out perform current methods in terms of accuracy and predictors selection by effectively incorporating the structural information.
△ Less
Submitted 14 December, 2022; v1 submitted 9 December, 2022;
originally announced December 2022.
-
Sampling Through the Lens of Sequential Decision Making
Authors:
Jason Xiaotian Dou,
Alvin Qingkai Pan,
Runxue Bao,
Haiyi Harry Mao,
Lei Luo,
Zhi-Hong Mao
Abstract:
Sampling is ubiquitous in machine learning methodologies. Due to the growth of large datasets and model complexity, we want to learn and adapt the sampling process while training a representation. Towards achieving this grand goal, a variety of sampling techniques have been proposed. However, most of them either use a fixed sampling scheme or adjust the sampling scheme based on simple heuristics.…
▽ More
Sampling is ubiquitous in machine learning methodologies. Due to the growth of large datasets and model complexity, we want to learn and adapt the sampling process while training a representation. Towards achieving this grand goal, a variety of sampling techniques have been proposed. However, most of them either use a fixed sampling scheme or adjust the sampling scheme based on simple heuristics. They cannot choose the best sample for model training in different stages. Inspired by "Think, Fast and Slow" (System 1 and System 2) in cognitive science, we propose a reward-guided sampling strategy called Adaptive Sample with Reward (ASR) to tackle this challenge. To the best of our knowledge, this is the first work utilizing reinforcement learning (RL) to address the sampling problem in representation learning. Our approach optimally adjusts the sampling process to achieve optimal performance. We explore geographical relationships among samples by distance-based sampling to maximize overall cumulative reward. We apply ASR to the long-standing sampling problems in similarity-based loss functions. Empirical results in information retrieval and clustering demonstrate ASR's superb performance across different datasets. We also discuss an engrossing phenomenon which we name as "ASR gravity well" in experiments.
△ Less
Submitted 13 December, 2022; v1 submitted 17 August, 2022;
originally announced August 2022.
-
COEM: Cross-Modal Embedding for MetaCell Identification
Authors:
Haiyi Mao,
Minxue Jia,
Jason Xiaotian Dou,
Haotian Zhang,
Panayiotis V. Benos
Abstract:
Metacells are disjoint and homogeneous groups of single-cell profiles, representing discrete and highly granular cell states. Existing metacell algorithms tend to use only one modality to infer metacells, even though single-cell multi-omics datasets profile multiple molecular modalities within the same cell. Here, we present \textbf{C}ross-M\textbf{O}dal \textbf{E}mbedding for \textbf{M}etaCell Id…
▽ More
Metacells are disjoint and homogeneous groups of single-cell profiles, representing discrete and highly granular cell states. Existing metacell algorithms tend to use only one modality to infer metacells, even though single-cell multi-omics datasets profile multiple molecular modalities within the same cell. Here, we present \textbf{C}ross-M\textbf{O}dal \textbf{E}mbedding for \textbf{M}etaCell Identification (COEM), which utilizes an embedded space leveraging the information of both scATAC-seq and scRNA-seq to perform aggregation, balancing the trade-off between fine resolution and sufficient sequencing coverage. COEM outperforms the state-of-the-art method SEACells by efficiently identifying accurate and well-separated metacells across datasets with continuous and discrete cell types. Furthermore, COEM significantly improves peak-to-gene association analyses, and facilitates complex gene regulatory inference tasks.
△ Less
Submitted 24 July, 2022; v1 submitted 15 July, 2022;
originally announced July 2022.
-
What Words Do We Use to Lie?: Word Choice in Deceptive Messages
Authors:
Jason Xiaotian Dou,
Michelle Liu,
Haaris Muneer,
Adam Schlussel
Abstract:
Text messaging is the most widely used form of computer-mediated communication (CMC). Previous findings have shown that linguistic factors can reliably indicate messages as deceptive. For example, users take longer and use more words to craft deceptive messages than they do truthful messages. Existing research has also examined how factors, such as student status and gender, affect rates of decept…
▽ More
Text messaging is the most widely used form of computer-mediated communication (CMC). Previous findings have shown that linguistic factors can reliably indicate messages as deceptive. For example, users take longer and use more words to craft deceptive messages than they do truthful messages. Existing research has also examined how factors, such as student status and gender, affect rates of deception and word choice in deceptive messages. However, this research has been limited by small sample sizes and has returned contradicting findings. This paper aims to address these issues by using a dataset of text messages collected from a large and varied set of participants using an Android messaging application. The results of this paper show significant differences in word choice and frequency of deceptive messages between male and female participants, as well as between students and non-students.
△ Less
Submitted 1 August, 2022; v1 submitted 30 September, 2017;
originally announced October 2017.
-
Impartial Redistricting: A Markov Chain Approach
Authors:
Lucy Chenyun Wu,
Jason Xiaotian Dou,
Danny Sleator,
Alan Frieze,
David Miller
Abstract:
The gerrymandering problem is a worldwide problem which sets great threat to democracy and justice in district based elections. Thanks to partisan redistricting commissions, district boundaries are often manipulated to benefit incumbents. Since an independent commission is hard to come by, the possibility of impartially generating districts with a computer is explored in this thesis. We have devel…
▽ More
The gerrymandering problem is a worldwide problem which sets great threat to democracy and justice in district based elections. Thanks to partisan redistricting commissions, district boundaries are often manipulated to benefit incumbents. Since an independent commission is hard to come by, the possibility of impartially generating districts with a computer is explored in this thesis. We have developed an algorithm to randomly produce legal redistricting schemes for Pennsylvania.
△ Less
Submitted 13 October, 2015; v1 submitted 12 October, 2015;
originally announced October 2015.