Abstract
Detection of somatic mutations from tumor and matched normal sequencing data has become a standard approach in cancer research. Although a number of mutation callers are developed, it is still difficult to detect mutations with low allele frequency even in exome sequencing. We expect that overlapping paired-end read information is effective for this purpose, but no mutation caller has modeled overlapping information statistically in a proper form in exome sequence data. Here, we develop a Bayesian hierarchical method, OVarCall, where overlapping paired-end read information improves the accuracy of low allele frequency mutation detection. Firstly, we construct two generative models: one is for reads with somatic variants generated from tumor cells and the other is for reads that does not have somatic variants but potentially includes sequence errors. Secondly, we calculate marginal likelihood for each model using a variational Bayesian algorithm to compute Bayes factor for the detection of somatic mutations. We empirically evaluated the performance of OVarCall and confirmed its better performance than other existing methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Benson, G.: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27(2), 573–580 (1999)
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)
Chen-Harris, H., et al.: Ultra-deep mutant spectrum profiling: improving sequencing accuracy using overlapping read pairs. BMC Genomics 14(1), 96 (2013)
Cibulskis, K., et al.: Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31(3), 213–219 (2013)
Dohm, J.C., et al.: Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 36(16), e105 (2008)
Jensen, J.L.W.V.: Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Acta Math. 30(1), 175–193 (1906)
Koboldt, D.C., et al.: VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22(3), 568–576 (2012)
Larson, D.E., et al.: SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28(3), 311–317 (2012)
Li, H., et al.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009). Oxford, England
Li, M., Stoneking, M.: A new approach for detecting low-level mutations in next-generation sequence data. Genome Biol. 13(5), R34 (2012)
Meyerson, M., et al.: Advances in understanding cancer genomes through second-generation sequencing. Nat. Reviews. Genet. 11(10), 685–696 (2010)
Nakamura, K., et al.: Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 39(13), e90 (2011)
Pope, B.J., et al.: ROVER variant caller: read-pair overlap considerate variant-calling software applied to PCR-based massively parallel sequencing datasets. Source Code Biol. Med. 9(1), 3 (2014)
Roth, A., et al.: JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics 28(7), 907–913 (2012)
Sato, Y., et al.: Integrated molecular analysis of clear-cell renal cell carcinoma. Nat. Genet. 45(8), 860–867 (2013)
Saunders, C.T., et al.: Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28(14), 1811–1817 (2012)
Shah, S.P., et al.: Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature 461(7265), 809–813 (2009)
Sherry, S.T.: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29(1), 308–311 (2001)
Shiraishi, Y., et al.: An empirical Bayesian framework for somatic mutation detection from cancer genome sequencing data. Nucleic Acids Res. 41(7), e89 (2013)
Usuyama, N., et al.: HapMuC: somatic mutation calling using heterozygous germ line variants near candidate mutations. Bioinformatics 30(23), 3302–3309 (2014)
Yoshida, K., et al.: Frequent pathway mutations of splicing machinery in myelodysplasia. Nature 478(7367), 64–69 (2011)
Acknowledgments
The super-computing resource was provided by Human Genome Center, the Institute of Medical Science, the University of Tokyo.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Moriyama, T., Shiraishi, Y., Chiba, K., Yamaguchi, R., Imoto, S., Miyano, S. (2016). OVarCall: Bayesian Mutation Calling Method Utilizing Overlapping Paired-End Reads. In: Bourgeois, A., Skums, P., Wan, X., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2016. Lecture Notes in Computer Science(), vol 9683. Springer, Cham. https://doi.org/10.1007/978-3-319-38782-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-38782-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-38781-9
Online ISBN: 978-3-319-38782-6
eBook Packages: Computer ScienceComputer Science (R0)