×

A sparse hierarchical Bayesian model for detecting relevant antigenic sites in virus evolution. (English) Zbl 1417.62307

Summary: Understanding how viruses offer protection against closely related emerging strains is vital for creating effective vaccines. For many viruses, multiple serotypes often co-circulate and testing large numbers of vaccines can be infeasible. Therefore the development of an in silico predictor of cross-protection between strains is important to help optimise vaccine choice. Here we present a sparse hierarchical Bayesian model for detecting relevant antigenic sites in virus evolution (SABRE) which can account for the experimental variability in the data and predict antigenic variability. The method uses spike and slab priors to identify sites in the viral protein which are important for the neutralisation of the virus. Using the SABRE method we are able to identify a number of key antigenic sites within several viruses, as well as providing estimates of significant changes in the evolutionary history of the serotypes. We show how our method outperforms alternative established methods; standard mixed effects models, the mixed effects LASSO, and the mixed effects elastic nets. We also propose novel proposal mechanisms for the Markov chain Monte Carlo simulations, which improve mixing and convergence over that of the established component-wise Gibbs sampler.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62J05 Linear regression; mixed models
62F15 Bayesian inference
62-08 Computational methods for problems pertaining to statistics

References:

[1] Aderhold A, Husmeier D, Grzegorczyk M (2014) Statistical inference of regulatory networks for circadian regulation. Stat Appl Genet Mol Biol 13(3):227-273 · Zbl 1296.92011 · doi:10.1515/sagmb-2013-0051
[2] Aktas S, Samuel AR (2000) Identification of antigenic epitopes on the foot and mouth disease virus isolate O-1/Manisa/Turkey/69 using monoclonal antibodies. Sci Tech Rev Office Int Epizoot 19(3):744-753 · doi:10.20506/rst.19.3.1244
[3] Andersen MR, Winther O, Hansen LK (2014) Bayesian inference for structured spike and slab priors. Adv Neural Inf Process Syst 27:1745-1753 · Zbl 1295.14052
[4] Andrieu C, Doucet A (1999) Joint bayesian model selection and estimation of noisy sinusoids via reversible jump MCMC. IEEE Trans Signal Process 47(10):2667-2676 · doi:10.1109/78.790649
[5] Barbieri L, Berger J (2004) Optimal predictive model selection. Ann Stat 32(3):870-897 · Zbl 1092.62033 · doi:10.1214/009053604000000238
[6] Barnett P, Ouldridge E, Rowlands D, Brown F, Parry N (1989) Neutralizing epitopes of type O foot-and-mouth disease virus. I. Identification and characterization of three functionally independent, conformational sites. J Gen Virol 70(Pt 6):1483-1491 · doi:10.1099/0022-1317-70-6-1483
[7] Bates D, Mächler M, Bolker B, Walker S (2015) Fitting linear mixed-effects models using lme4. J Stat Softw 67(1):1-48. doi:10.18637/jss.v067.i01 · doi:10.18637/jss.v067.i01
[8] Baxt B, Vakharia V, Moore D, Franke A, Morgan D (1989) Analysis of neutralizing antigenic sites on the surface of type A12 foot-and-mouth disease virus. J Virol 63(5):2143-2151
[9] Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin · Zbl 1107.68072
[10] Bolwell C, Brown A, Barnett P, Campbell R, Clarke B, Parry N, Ouldridge E, Brown F, Rowlands D (1989) Host cell selection of antigenic variants of foot-and-mouth disease virus. J Gen Virol 70(Pt 1):45-57 · doi:10.1099/0022-1317-70-1-45
[11] Caton AJ, Brownlee GG, Yewdell JW, Gerhard W (1982) The antigenic structure of the influenza virus A/PR/8/34 hemagglutinin (H1 subtype). Cell 31(2 Pt 1):417-427 · doi:10.1016/0092-8674(82)90135-0
[12] Crowther J, Farias S, Carpenter W, Samuel A (1993a) Identification of a fifth neutralizable site on type O foot-and-mouth disease virus following characterization of single and quintuple monoclonal antibody escape mutants. J Gen Virol 74(Pt 8):1547-1553 · doi:10.1099/0022-1317-74-8-1547
[13] Crowther J, Rowe C, Butcher R (1993b) Characterization of monoclonal antibodies against a type SAT 2 foot-and-mouth disease virus. Epidemiol Infect 111(2):391-406 · doi:10.1017/S0950268800057083
[14] Dalton L, Dougherty E (2012) Exact sample conditioned MSE performance of the Bayesian MMSE estimator for classification error—part II: consistency and performance analysis. IEEE Trans Signal Process 60(5):2588-2603 · Zbl 1391.62043 · doi:10.1109/TSP.2012.2184102
[15] Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407-499 · Zbl 1091.62054 · doi:10.1214/009053604000000067
[16] Filippone M, Zhong M, Girolami M (2013) A comparative evaluation of stochastic-based inference methods for Gaussian process models. Mach Learn 93:93-114 · Zbl 1294.62048 · doi:10.1007/s10994-013-5388-x
[17] Gelman A (2004) Parameterization and bayesian modeling. J Am Stat Assoc 99(466):537-545 · Zbl 1117.62343 · doi:10.1198/016214504000000458
[18] Gelman A (2006) Prior distributions for variance parameters in hierarchical models. Bayesian Anal 1(3):515-534 · Zbl 1331.62139 · doi:10.1214/06-BA117A
[19] Gelman A, Rubin D (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7:457-511 · Zbl 1386.65060 · doi:10.1214/ss/1177011136
[20] Gelman A, Carlin JB, Stern HS, Dunson DB, Ventari A, Rubin DB (2013) Bayesian data analysis, 3rd edn. Chapman & Hall, London
[21] Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721-741 · Zbl 0573.62030 · doi:10.1109/TPAMI.1984.4767596
[22] George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88(423):881-889 · doi:10.1080/01621459.1993.10476353
[23] George EI, McCulloch RE (1997) Approaches for Bayesian variable selection. Stat Sin 7:339-373 · Zbl 0884.62031
[24] Grazioli S, Moretti M, Barbieri I, Crosatti M, Brocchi E (2006) Use of monoclonal antibodies to identify and map new antigenic determinants involved in neutralisation on FMD viruses type SAT 1 and SAT 2. In: Report of the session of the research group of the standing technical committee of the European commission for the control of foot-and-mouth disease, pp 287-297, appendix 43 · Zbl 0673.62051
[25] Grazioli S, Fallacara F, Brocchi E (2013) Mapping of antigenic sites of foot-and-mouth disease virus serotype Asia 1 and relationships with sites described in other serotypes. J Gen Virol 94(3):559-569 · doi:10.1099/vir.0.048249-0
[26] Grzegorczyk M, Husmeier D (2013) Regularization of non-homogeneous dynamic Bayesian networks with global information-coupling based on hierarchical Bayesian models. Mach Learn 91:105-151 · Zbl 1273.68378 · doi:10.1007/s10994-012-5326-3
[27] Haario H, Laine M, Mira A, Saksman E (2006) DRAM: efficient adaptive MCMC. Stat Comput 16(4):339-354 · doi:10.1007/s11222-006-9438-0
[28] Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29-36 · doi:10.1148/radiology.143.1.7063747
[29] Harvey WT, Gregory V, Benton DJ, Hall JP, Daniels RS, Bedford T, Haydon DT, Hay AJ, McCauley JW, Reeve R (2016) Identifying the genetic basis of antigenic change in influenza A (H1N1). arXiv preprint arXiv:1404.4197
[30] Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, Berlin · Zbl 1273.62005 · doi:10.1007/978-0-387-84858-7
[31] Hastings W (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97-109 · Zbl 0219.65008 · doi:10.1093/biomet/57.1.97
[32] Hernández-Lobato D, Hernández-Lobato JM, Dupont P (2013) Generalized spike-and-slab priors for Bayesian group feature selection using expectation propagation. J Mach Learn Res 14(1):1891-1945 · Zbl 1318.62229
[33] Heydari J, Lawless C, Lydall DA, Wilkinson DJ (2016) Bayesian hierarchical modelling for inferring genetic interactions in yeast. J R Stat Soc Ser C (Appl Stat) 65(3):367-393 · doi:10.1111/rssc.12126
[34] Hirst GK (1942) The quantitative determination of influenza virus and antibodies by means of red cell agglutination. J Exp Med 75(1):49-64 · doi:10.1084/jem.75.1.49
[35] Holland J, Spindler K, Horodyski F, Grabau E, Nichol S, VandePol S (1982) Rapid evolution of RNA genomes. Science 215:1577-1585 · doi:10.1126/science.7041255
[36] Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65-70 · Zbl 0402.62058
[37] Jow H, Boys RJ, Wilkinson DJ (2014) Bayesian identification of protein differential expression in multi-group isobaric labelled mass spectrometry data. Stat Appl Genet Mol Biol 13(5):531-551 · Zbl 1298.92041 · doi:10.1515/sagmb-2012-0066
[38] Kitson J, McCahon D, Belsham G (1990) Sequence analysis of monoclonal antibody resistant mutants of type O foot and mouth disease virus: evidence for the involvement of the three surface exposed capsid proteins in four antigenic sites. Virology 179(1):26-34 · doi:10.1016/0042-6822(90)90269-W
[39] Knowles N, Samuel A (2003) Molecular epidemiology of foot-and-mouth disease virus. Virus Res 91:65-80 · doi:10.1016/S0168-1702(02)00260-5
[40] Lea S, Hernandez J, Blakemore W, Brocchi E, Curry S, Domingo E, Fry E, Abu Ghazaleh R, King A, Newman J, Stuart D, Mateu M (1994) The structure and antigenicity of a type C foot-and-mouth disease virus. Structure 2(2):123-139 · doi:10.1016/S0969-2126(00)00014-9
[41] Leisch F, Weingessel A, Hornik K (1988) On the generation of correlated artificial binary data. Working paper series, Working paper no. 13. SFB “Adaptive information systems and modelling in economics and management science”. Vienna University of Economics and Business Administration, Wien, Austria. http://www.wu-wien.ac.at/am
[42] Mateu M (1995) Antibody recognition of picornaviruses and escape from neutralization: a structural view. Virus Res 38(1):1-24 · doi:10.1016/0168-1702(95)00048-U
[43] Mattion N, König G, Seki C, Smitsaart E, Maradei E, Robiolo B, Duffy S, León E, Piccone M, Sadir A, Bottini R, Cosentino B, Falczuk A, Maresca R, Periolo O, Bellinzoni R, Espinoza A, Torre J, Palma E (2004) Reintroduction of foot-and-mouth disease in Argentina: characterisation of the isolates and development of tools for the control and eradication of the disease. Vaccine 22:4149-4162 · doi:10.1016/j.vaccine.2004.06.040
[44] McDonald NJ, Smith CB, Cox NJ (2007) Antigenic drift in the evolution of H1N1 influenza A viruses resulting from deletion of a single amino acid in the haemagglutinin gene. J Gen Virol 88(Pt 12):3209-3213 · doi:10.1099/vir.0.83184-0
[45] Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E (1953) Equations of state calculations by fast computing machines. J Chem Phys 21(6):1087-1092 · Zbl 1431.65006 · doi:10.1063/1.1699114
[46] Minka TP (2001) Expectation propagation for approximate Bayesian inference. In: Proceedings of the seventeenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 362-369
[47] Mitchell T, Beauchamp J (1988) Bayesian variable selection in linear regression. J Am Stat Assoc 83(404):1023-1032 · Zbl 0673.62051 · doi:10.1080/01621459.1988.10478694
[48] Mohamed S, Heller K, Ghahramani Z (2012) Bayesian and \[l_1\] l1 approaches for sparse unsupervised learning. In: Proceedings of the 29th international conference on machine learning (ICML-12), pp 751-758
[49] Murphy KP (2012) Machine learning: a probabilistic perspective. MIT Press, Cambridge · Zbl 1295.68003
[50] Park T, Casella G (2008) The Bayesian lasso. J Am Stat Assoc 103(482):681-686 · Zbl 1330.62292 · doi:10.1198/016214508000000337
[51] Paton D, Valarcher J, Bergmann I, Matlho O, Zakharov V, Palma E, Thomson G (2005) Selection of foot and mouth disease vaccine strains—a review. Rev Sci Tech 24:981-993 · doi:10.20506/rst.24.3.1632
[52] Pinheiro JC, Bates D (2000) Mixed-effects models in S and S-PLUS. Springer, Berlin · Zbl 0953.62065 · doi:10.1007/978-1-4419-0318-1
[53] Plummer M, Best N, Cowles K, Vines K (2006) CODA: convergence diagnosis and output analysis for MCMC. R News 6(1):7-11
[54] R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna · Zbl 1294.62048
[55] Reeve R, Blignaut B, Esterhuysen JJ, Opperman P, Matthews L, Fry EE, de Beer TAP, Theron J, Rieder E, Vosloo W, O’Neill HG, Haydon DT, Maree FF (2010) Sequence-based prediction for vaccine strain selection and identification of antigenic variability in foot-and-mouth disease virus. PLoS Comput Biol 6(12):e1001027 · doi:10.1371/journal.pcbi.1001027
[56] Reeve R, Borley DW, Maree FF, Upadhyaya S, Lukhwareni A, Esterhuysen JJ, Harvey WT, Blignaut B, Fry EE, Parida S, Paton DJ, Mahapatra M (2016) Tracking the antigenic evolution of foot-and-mouth disease virus. PloS ONE 11(7):1-17 · doi:10.1371/journal.pone.0159360
[57] Ripley B (1979) Algorithm AS 137: simulating spatial patterns: dependent samples from a multivariate density. J R Stat Soc Ser C 28(1):109-112
[58] Ruyssinck J, Huynh-Thu V, Geurts P, Dhaene T, Demeester P, Saeys Y (2014) NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms. PLoS ONE 9(3):e92709 · doi:10.1371/journal.pone.0092709
[59] Sabatti C, James GM (2005) Bayesian sparse hidden components analysis for transcription networks. Bioinformatics 22(6):739-746 · doi:10.1093/bioinformatics/btk017
[60] Saiz JC, Gonzalez MJ, Borca MV, Sobrino F, Moore DM (1991) Identification of neutralizing antigenic sites on VP1 and VP2 of type A5 foot-and-mouth disease virus, defined by neutralization-resistant variants. J Virol 65(5):2518-2524
[61] Schelldorfer J, Bühlmann P, van de Geer S (2011) Estimation for high-dimensional linear mixed-effects models using \[{\ell }1\] ℓ1-penalization. Scand J Stat 38(2):197-214 · Zbl 1246.62161 · doi:10.1111/j.1467-9469.2011.00740.x
[62] Scott JG, Berger JO (2010) Bayes and empirical-bayes multiplicity adjustment in the variable-selection problem. Ann Stat 38(5):2587-2619 · Zbl 1200.62020 · doi:10.1214/10-AOS792
[63] Skehel JJ, Wiley DC (2000) Receptor binding and membrane fusion in virus entry: the influenza hemagglutinin. Ann Rev Biochem 69(1):531-569 · doi:10.1146/annurev.biochem.69.1.531
[64] Thomas A, Woortmeijer R, Barteling S, Meloen R (1988a) Evidence for more than one important, neutralizing site on foot-and-mouth disease virus. Brief report. Arch Virol 99(3-4):237-242 · doi:10.1007/BF01311072
[65] Thomas A, Woortmeijer R, Puijk W, Barteling S (1988b) Antigenic sites on foot-and-mouth disease virus type A10. J Virol 62(8):2782-2789
[66] Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267-288 · Zbl 0850.62538
[67] Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective (with comments). J R Stat Soc Ser B 73(3):273-282 · Zbl 1411.62212 · doi:10.1111/j.1467-9868.2011.00771.x
[68] Titsias MK, Lázaro-Gredilla M (2011) Spike and slab variational inference for multi-task and multiple kernel learning. In: Advances in neural information processing systems, pp 2339-2347
[69] Watanabe S (2010) Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory. J Mach Learn Res 11:3571-3594 · Zbl 1242.62024
[70] WHO (2011) Manual for the laboratory diagnosis and virological surveillance of influenza. http://whqlibdoc.who.int/publications/2011/9789241548090_eng.pdf · Zbl 1117.62343
[71] Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67(2):301-320 · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.