×

Estimating Boltzmann averages for protein structural quantities using sequential Monte Carlo. (English) Zbl 07864348

Summary: Sequential Monte Carlo (SMC) methods are widely used to draw samples from intractable target distributions. Weight degeneracy can hinder the use of SMC when the target distribution is highly constrained. As a motivating application, we consider the problem of sampling protein structures from the Boltzmann distribution. This paper proposes a general SMC method that propagates multiple descendants for each particle, followed by resampling to maintain the desired number of particles. A simulation study demonstrates the efficacy of the method for tackling the protein sampling problem, compared to existing SMC methods. As a real data example, we estimate the number of atomic contacts for a key segment of the SARS-CoV-2 viral spike protein.

MSC:

62-XX Statistics

References:

[1] Adcock, S. A. and McCammon, J. A. (2006). Molecular dynamics: Survey of methods for simulating the activity of proteins. Chemical Reviews 106, 1589-1615.
[2] Ali, A. and Vijayan, R. (2020). Dynamics of the ACE2-SARS-CoV-2/SARS-CoV spike protein interface reveal unique mechanisms. Scientific Reports 10, 14214.
[3] Andrieu, C. and Doucet, A. (2002). Particle filtering for partially observed Gaussian state space models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64, 827-836. · Zbl 1067.62098
[4] Andrieu, C., Doucet, A. and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72, 269-342. · Zbl 1411.65020
[5] Anfinsen, C. B. (1973). Principles that govern the folding of protein chains. Science 181, 223-230.
[6] Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H. et al. (2000). The protein data bank. Nucleic Acids Research 28, 235-242.
[7] Boltzmann, L. (1868). Studien uber das Gleichgewicht der lebenden Kraft. Wissenschafiliche Abhandlungen 1, 49-96.
[8] Bu, Z. and Callaway, D. J. (2011). Proteins move! Protein dynamics and long-range allostery in cell signaling. Advances in Protein Chemistry and Structural Biology 83, 163-221.
[9] Carpenter, J., Clifford, P. and Fearnhead, P. (1999). Improved particle filter for non-linear problems. IEE Proceedings -Radar, Sonar and Navigation 146, 2-7.
[10] Carvalho, C. M., Johannes, M. S., Lopes, H. F. and Polson, N. G. (2010). Particle learning and smoothing. Statistical Science 25, 88-106. · Zbl 1328.62541
[11] Casella, G. and Robert, C. P. (1996). Rao-Blackwellisation of sampling schemes. Biometrika 83, 81-94. · Zbl 0866.62024
[12] Chen, R. and Liu, J. S. (2000). Mixture Kalman filters. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 62, 493-508. · Zbl 0953.62100
[13] Chen, T., Schon, T. B., Ohlsson, H. and Ljung, L. (2010). Decentralized particle filter with arbitrary state decomposition. IEEE Transactions on Signal Processing 59, 465-478. · Zbl 1391.93229
[14] Chen, Y., Liu, Q. and Guo, D. (2020). Emerging coronaviruses: Genome structure, replication, and pathogenesis. Journal of Medical Virology 92, 418-423.
[15] Chopin, N. and Singh, S. (2015). On particle Gibbs sampling. Bernoulli 21, 1855-1883. · Zbl 1333.60164
[16] Dai, C., Heng, J., Jacob, P. E. and Whiteley, N. (2022). An invitation to sequential Monte Carlo samplers. Journal of the American Statistical Association 117, 1587-1600. · Zbl 1506.65007
[17] Dehury, B., Raina, V., Misra, N. and Suar, M. (2021). Effect of mutation on structure, function and dynamics of receptor binding domain of human SARS-CoV-2 with host cell receptor ACE2: A molecular dynamics simulations study. Journal of Biomolecular Structure and Dynamics 39, 7231-7245.
[18] Del Moral, P., Doucet, A. and Jasra, A. (2006). Sequential Monte Carlo samplers. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68, 411-436. · Zbl 1105.62034
[19] Di Lena, P., Nagata, K. and Baldi, P. (2012). Deep architectures for protein contact map prediction. Bioinformatics 28, 2449-2457.
[20] Doucet, A., de Freitas, N. and Gordon, N. (2001). An introduction to sequential Monte Carlo methods. In Sequential Monte Carlo Methods in Practice (Edited by A. Doucet, N. de Freitas and N. Gordon), 3-14. Springer. · Zbl 1056.93576
[21] Esposito, L., De Simone, A., Zagari, A. and Vitagliano, L. (2005). Correlation between ω and ψ dihedral angles in protein structures. Journal of Molecular Biology 347, 483-487.
[22] Fearnhead, P. and Clifford, P. (2003). On-line inference for hidden Markov models via particle filters. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65, 887-899. · Zbl 1059.62098
[23] Fraser, J. S., Clarkson, M. W., Degnan, S. C., Erion, R., Kern, D. and Alber, T. (2009). Hidden alternative structures of proline isomerase essential for catalysis. Nature 462, 669-673.
[24] Fraser, J. S., van den Bedem, H., Samelson, A. J., Lang, P. T., Holton, J. M., Echols, N. et al. (2011). Accessing protein conformational ensembles using room-temperature X-ray crystallography. Proceedings of the National Academy of Sciences 108, 16247-16252.
[25] Gordon, N. J., Salmond, D. J. and Smith, A. F. (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE proceedings. Part F. Radar and Signal Processing 140, 107-113.
[26] Jacquier, E., Polson, N. G. and Rossi, P. E. (2002). Bayesian analysis of stochastic volatility models. Journal of Business & Economic Statistics 20, 69-87.
[27] Johansen, A. M., Whiteley, N. and Doucet, A. (2012). Exact approximation of Rao-Blackwellised particle filters. IFAC Proceedings Volumes 45, 488-493.
[28] Kantas, N., Doucet, A., Singh, S. S., Maciejowski, J. and Chopin, N. (2015). On particle methods for parameter estimation in state-space models. Statistical Science 30, 328-351. · Zbl 1332.62096
[29] Karlin, S., Zhu, Z.-Y. and Baud, F. (1999). Atom density in protein structures. Proceedings of the National Academy of Sciences 96, 12500-12505.
[30] Kitagawa, G. (1996). Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. Journal of Computational and Graphical Statistics 5, 1-25.
[31] Lan, J., Ge, J., Yu, J., Shan, S., Zhou, H., Fan, S. et al. (2020). Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor. Nature 581, 215-220.
[32] Landau, L. D. and Lifshitz, E. M. (2013). Statistical Physics. Elsevier.
[33] Li, Y., Wang, W., Deng, K. and Liu, J. S. (2022). Stratification and optimal resampling for sequential Monte Carlo. Biometrika 109, 181-194. · Zbl 07474109
[34] Lin, M., Chen, R. and Liu, J. S. (2013). Lookahead strategies for sequential Monte Carlo. Statistical Science 28, 69-94. · Zbl 1332.62144
[35] Liu, J. S. (2001). Monte Carlo Strategies in Scientific Computing. Springer. · Zbl 0991.65001
[36] Liu, J. S. and Chen, R. (1995). Blind deconvolution via sequential imputations. Journal of the American Statistical Association 90, 567-576. · Zbl 0826.62062
[37] Liu, J. S. and Chen, R. (1998). Sequential Monte Carlo methods for dynamic systems. Journal of the American Statistical Association 93, 1032-1044. · Zbl 1064.65500
[38] Liu, J. S., Chen, R. and Logvinenko, T. (2001). A theoretical framework for sequential importance sampling with resampling. In Sequential Monte Carlo Methods in Practice (Edited by A. Doucet, N. de Freitas and N. Gordon), 225-246. Springer. · Zbl 1056.93584
[39] Liu, J. S., Chen, R. and Wong, W. H. (1998). Rejection control and sequential importance sampling. Journal of the American Statistical Association 93, 1022-1031. · Zbl 1064.65501
[40] Neal, R. M. (2001). Annealed importance sampling. Statistics and Computing 11, 125-139.
[41] Nguyen, H. L., Lan, P. D., Thai, N. Q., Nissley, D. A., O’Brien, E. P. and Li, M. S. (2020). Does SARS-CoV-2 bind to human ACE2 more strongly than does SARS-CoV? The Journal of Physical Chemistry B 124, 7336-7347.
[42] Onuchic, J. N., Luthey-Schulten, Z. and Wolynes, P. G. (1997). Theory of protein folding: The energy landscape perspective. Annual Review of Physical Chemistry 48, 545-600.
[43] Pintar, A., Carugo, O. and Pongor, S. (2002). CX, an algorithm that identifies protruding atoms in proteins. Bioinformatics 18, 980-984.
[44] Simons, K. T., Kooperberg, C., Huang, E. and Baker, D. (1997). Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. Journal of Molecular Biology 268, 209-225.
[45] Wang, L., Wang, S. and Bouchard-Côté, A. (2020). An annealed sequential Monte Carlo method for Bayesian phylogenetics. Systematic Biology 69, 155-183.
[46] Williams, J. K., Wang, B., Sam, A., Hoop, C. L., Case, D. A. and Baum, J. (2022). Molecular dynamics analysis of a flexible loop at the binding interface of the SARS-CoV-2 spike protein receptor-binding domain. Proteins: Structure, Function, and Bioinformatics 90, 1044-1053.
[47] Wong, S. W., Liu, J. S. and Kou, S. (2017). Fast de novo discovery of low-energy protein loop conformations. Proteins: Structure, Function, and Bioinformatics 85, 1402-1412.
[48] Wong, S. W., Liu, J. S. and Kou, S. (2018). Exploring the conformational space for protein folding with sequential Monte Carlo. Annals of Applied Statistics 12, 1628-1654. · Zbl 1405.62215
[49] Wrapp, D., Wang, N., Corbett, K. S., Goldsmith, J. A., Hsieh, C.-L., Abiona, O. et al. (2020). Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science 367, 1260-1263.
[50] Zhang, J., Lin, M., Chen, R., Liang, J. and Liu, J. S. (2007). Monte Carlo sampling of near-native structures of proteins with applications. Proteins: Structure, Function, and Bioinformatics 66, 61-68.
[51] Zhou, R. and Berne, B. (1997). Smart walking: A new method for Boltzmann sampling of protein conformations. The Journal of Chemical Physics 107, 9185-9196.
[52] Samuel W.K. Wong Department of Statistics and Actuarial Science, University of Waterloo, Ontario N2L 3G1, Canada. E-mail: samuel.wong@uwaterloo.ca (Received October 2022; accepted October 2023)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.