×

Fitting DNA sequences through log-linear modelling with linear constraints. (English) Zbl 1228.62144

Summary: For some discrete state series, such as DNA sequences, it can often be postulated that its probabilistic behaviour is given by a Markov chain. For making the decision on whether or not an uncharacterized piece of DNA is part of the coding region of a gene, under the Markovian assumption, there are two statistical tools that are essential to be considered: the hypothesis testing of the order in a Markov chain and the estimators of transition probabilities. In order to improve the traditional statistical procedures for both of them when the stationarity assumption can be considered, a new version for understanding the homogeneity hypothesis is proposed so that log-linear modelling is applied for conditional independence jointly with homogeneity restrictions on the expected means of transition counts in the sequence. In addition we can consider a variety of test-statistics and estimators by using \(\phi\)-divergence measures. As special case of them the well-known likelihood ratio test-statistics and maximum-likelihood estimators are obtained.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62M02 Markov processes: hypothesis testing
62H17 Contingency tables
92C40 Biochemistry, molecular biology
62M05 Markov processes: estimation; hidden Markov models
62B10 Statistical aspects of information-theoretic topics
65C60 Computational problems in statistics (MSC2010)
Full Text: DOI

References:

[1] DOI: 10.1016/0097-8485(93)85004-V · Zbl 0812.92008 · doi:10.1016/0097-8485(93)85004-V
[2] DOI: 10.1093/nar/gki487 · doi:10.1093/nar/gki487
[3] DOI: 10.1017/S0305004100026402 · doi:10.1017/S0305004100026402
[4] Hoel P. G., Biometrika 41 pp 430– (1954)
[5] DOI: 10.1214/aoms/1177707039 · Zbl 0087.14905 · doi:10.1214/aoms/1177707039
[6] Billingsley P., Statistical Inference for Markov Processes (1961) · Zbl 0106.34201
[7] Menéndez M. L., Methodol. Comput. Appl. Probab (2010)
[8] DOI: 10.1016/j.csda.2005.11.006 · Zbl 1157.62419 · doi:10.1016/j.csda.2005.11.006
[9] DOI: 10.1111/1467-9876.00139 · Zbl 0928.62106 · doi:10.1111/1467-9876.00139
[10] DOI: 10.2307/2289238 · Zbl 0604.62058 · doi:10.2307/2289238
[11] DOI: 10.2307/2290865 · Zbl 0799.62063 · doi:10.2307/2290865
[12] DOI: 10.2307/2986195 · Zbl 0821.62072 · doi:10.2307/2986195
[13] DOI: 10.1214/aos/1032894462 · Zbl 0859.62061 · doi:10.1214/aos/1032894462
[14] DOI: 10.1016/j.jmva.2008.01.002 · Zbl 1144.62041 · doi:10.1016/j.jmva.2008.01.002
[15] Christensen R., Log-Linear Models and Logistic Regression (1997) · Zbl 0880.62073
[16] Davison A. C., Statistical Models (1993)
[17] Bishop Y. M., Discrete Multivariate Analysis: Theory and Practice (1995)
[18] Pardo L., Statistical Inference Based on Divergence Measures (2006) · Zbl 1118.62008
[19] DOI: 10.1007/978-1-4612-4578-0 · doi:10.1007/978-1-4612-4578-0
[20] Martin, N. and Pardo, L. Phi-divergence tests statistics in multinomial sampling for hierarchical sequences of loglinear models with linear constraints. Monografias Seminario Matemático Garcia Galdeano, Ninth Zaragoza-Pau Conference on Applied Mathematics and Statistics Vol. 31. pp.301–308. Universidad de Zaragoza. Available athttp://www.unizar.es/galdeano/actas_pau/PDFIX/MarPar05.pdf
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.