×

In-phase implies large likelihood for independent codon model: distinguishing coding from non-coding sequences. (English) Zbl 1464.92224

Summary: It is proven that under the independent codon model, the likelihood of a DNA coding sequence read according to the correct frame is asymptotically larger than that read with an incorrect frame. Based on this proposition, a single set of probabilities of the codon usage is enough for discriminating the six frames of coding sequences under the independent codon model. The direct coding sequence of Escherichia coli genome is taken as an example to examine the codon independency by using the mutual information and \(\chi^2\) analysis. The contrast between the coding frame and the two offset frames is evident. A self-learning approach for generating training set is proposed to estimate probability parameters.

MSC:

92D20 Protein sequences, DNA sequences
Full Text: DOI

References:

[1] Abramson, N., Information Theory and Coding (1963), McGraw-Hill: McGraw-Hill New York
[2] Audic, S.; Claverie, J.-M., Self-identification of protein-coding regions in microbial genomes, Proc. Natl. Acad. Sci. USA, 95, 10026-10031 (1998)
[3] Fickett, J. W., The gene identification probleman overview for developers, Comput. Chem., 20, 103-118 (1996)
[4] Fickett, J. W.; Tung, C. S., Assessment of protein coding measures, Nucleic Acids Res., 20, 6441-6450 (1992)
[5] Grantham, R.; Gautier, C.; Mercier, R.; Pave, A., Codon catalog usage and the genome hypothesis, Nucleic Acids Res., 8, 49r-62r (1980)
[6] Grantham, R.; Gautier, C.; Gouy, M.; Jacobzone, M.; Mercier, R., Codon catalog usage is a genome strategy modulated for gene expressivity, Nucleic Acids Res., 9, 43r-74r (1981)
[7] Kullback, S., Topics in Statistical Information Theory (1987), Springer: Springer Berlin · Zbl 0632.62003
[8] Kullback, S.; Keegel, J. C.; Kullback, J. H., Information Theory and Statistics (1959), Wiley: Wiley New York · Zbl 0632.62003
[9] Rassias, T.M. (Ed.), 2000. Survey on Classical Inequalities. Kluwer Academic Publishers, Dordrecht.; Rassias, T.M. (Ed.), 2000. Survey on Classical Inequalities. Kluwer Academic Publishers, Dordrecht. · Zbl 0998.00010
[10] Rassias, T.M., Srivastava, H.M. (Eds.), 1999. Analytic and Geometric Inequalities and Applications. Kluwer Academic Publishers, Dordrecht.; Rassias, T.M., Srivastava, H.M. (Eds.), 1999. Analytic and Geometric Inequalities and Applications. Kluwer Academic Publishers, Dordrecht. · Zbl 0947.00027
[11] Sakamoto, T.; Ishiguro, M.; Kitagawa, G., Akaike Information Criterion Statistics (1986), KTK Scientific: KTK Scientific Tokyo · Zbl 0608.62006
[12] Shannon, C. E., A mathematical theory of communication, Bell Syst. Tech. J., 27, 379-423, 623-656 (1948) · Zbl 1154.94303
[13] Staden, R.; McLachlan, A. D., Codon preference and its use in identifying protein coding regions in long DNA sequences, Nucleic Acids Res., 10, 141-156 (1982)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.