×

Efficient block boundaries estimation in block-wise constant matrices: an application to HiC data. (English) Zbl 1362.62012

Summary: In this paper, we propose a novel modeling and a new methodology for estimating the location of block boundaries in a random matrix consisting of a block-wise constant matrix corrupted with white noise. Our method consists in rewriting this problem as a variable selection issue. A penalized least-squares criterion with an \(\ell_{1}\)-type penalty is used for dealing with this problem. Firstly, some theoretical results ensuring the consistency of our block boundaries estimators are provided. Secondly, we explain how to implement our approach in a very efficient way. This implementation is available in the R package blockseg which can be found in the Comprehensive R Archive Network. Thirdly, we provide some numerical experiments to illustrate the statistical and numerical performance of our package, as well as a thorough comparison with existing methods. Fourthly, an empirical procedure is proposed for estimating the number of blocks. Finally, our approach is applied to HiC data which are used in molecular biology for better understanding the influence of the chromosomal conformation on the cells functioning.

MSC:

62-07 Data analysis (statistics) (MSC2010)
62F30 Parametric inference under constraints
62P10 Applications of statistics to biology and medical sciences; meta analysis
62F12 Asymptotic properties of parametric estimators
62J07 Ridge regression; shrinkage estimators (Lasso)

Software:

blockseg; Armadillo; R