×

Learning compositional representations of interacting systems with restricted Boltzmann machines: comparative study of lattice proteins. (English) Zbl 1429.92113

Summary: A restricted Boltzmann machine (RBM) is an unsupervised machine learning bipartite graphical model that jointly learns a probability distribution over data and extracts their relevant statistical features. RBMs were recently proposed for characterizing the patterns of coevolution between amino acids in protein sequences and for designing new sequences. Here, we study how the nature of the features learned by RBM changes with its defining parameters, such as the dimensionality of the representations (size of the hidden layer) and the sparsity of the features. We show that for adequate values of these parameters, RBMs operate in a so-called compositional phase in which visible configurations sampled from the RBM are obtained by recombining these features. We then compare the performance of RBM with other standard representation learning algorithms, including principal or independent component analysis (PCA, ICA), autoencoders (AE), variational autoencoders (VAE), and their sparse variants. We show that RBMs, due to the stochastic mapping between data configurations and representations, better capture the underlying interactions in the system and are significantly more robust with respect to sample size than deterministic methods such as PCA or ICA. In addition, this stochastic mapping is not prescribed a priori as in VAE, but learned from data, which allows RBMs to show good performance even with shallow architectures. All numerical results are illustrated on synthetic lattice protein data that share similar statistical features with real protein sequences and for which ground-truth interactions are known.

MSC:

92D20 Protein sequences, DNA sequences
92-08 Computational methods for problems pertaining to biology

Software:

Scikit; darch; Keras

References:

[1] Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1987). A learning algorithm for Boltzmann machines. In M. Fischler & O. Firschein (Eds.), Readings in computer vision (pp. 522-533). Amsterdam: Elsevier.
[2] Agliari, E., Annibale, A., Barra, A., Coolen, A., & Tantari, D. (2013). Immune networks: Multitasking capabilities near saturation. Journal of Physics A: Mathematical and Theoretical, 46(41), 415003. , · Zbl 1300.92040
[3] Agliari, E., Barra, A., Galluzzi, A., Guerra, F., & Moauro, F. (2012). Multitasking associative networks. Phys. Rev. Lett., 109, 268101. ,
[4] Amit, D. J., Gutfreund, H., & Sompolinsky, H. (1985). Storing infinite numbers of patterns in a spin-glass model of neural networks. Physical Review Letters, 55(14), 1530. ,
[5] Baden, T., Berens, P., Franke, K., Rosón, M. R., Bethge, M., & Euler, T. (2016). The functional diversity of retinal ganglion cells in the mouse. Nature, 529(7586), 345. ,
[6] Barra, A., Bernacchia, A., Santucci, E., & Contucci, P. (2012). On the equivalence of Hopfield networks and Boltzmann machines. Neural Networks, 34, 1-9. , · Zbl 1258.68112
[7] Bell, A. J., & Sejnowski, T. J. (1995). An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6), 1129-1159. ,
[8] Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., … Varoquaux, G. (2013). API design for machine learning software: Experiences from the Scikit-learn project. arXiv:1309.0238.
[9] Casari, G., Sander, C., & Valencia, A. (1995). A method to predict functional residues in proteins. Nature Structural and Molecular Biology, 2(2), 171. ,
[10] Cho, K., Raiko, T., & Ilin, A. (2010). Parallel tempering is efficient for learning restricted Boltzmann machines. In Proceedings of the 2010 International Joint Conference on Neural Networks (pp. 1-8). Piscataway, NJ: IEEE. ,
[11] Chollet, F. (2015). Keras. https://keras.io/
[12] Cocco, S., Feinauer, C., Figliuzzi, M., Monasson, R., & Weigt, M. (2018). Inverse statistical physics of protein sequences: A key issues review. Reports on Progress in Physics, 81(3), 032601. ,
[13] Courville, A., Bergstra, J., & Bengio, Y. (2011). A spike and slab restricted Boltzmann machine. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (pp. 233-241). AISTATS.
[14] Dahl, G., Ranzato, M., Mohamed, A.-r., & Hinton, G. E. (2010). Phone recognition with the mean-covariance restricted Boltzmann machine. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, & A. Culotta (Eds.), Advances in neural information processing systems, 23 (pp. 469-477). Red Hook, NY: Curran.
[15] De Juan, D., Pazos, F., & Valencia, A. (2013). Emerging methods in protein co-evolution. Nature Reviews Genetics, 14(4), 249. ,
[16] Decelle, A., Fissore, G., & Furtlehner, C. (2017). Spectral dynamics of learning in restricted Boltzmann machines. EPL (Europhysics Letters), 119(6), 60001. , · Zbl 1407.82041
[17] Desjardins, G., Courville, A., & Bengio, Y. (2010). Adaptive parallel tempering for stochastic maximum likelihood learning of RBMS. arXiv:1012.3476.
[18] Desjardins, G., Courville, A., Bengio, Y., Vincent, P., & Delalleau, O. (2010). Tempered Markov chain Monte Carlo for training of restricted Boltzmann machines. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (pp. 145-152). AISTATS.
[19] Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on Information Theory, 52(4), 1289-1306. , · Zbl 1288.94016
[20] Figliuzzi, M., Jacquier, H., Schug, A., Tenaillon, O., & Weigt, M. (2016). Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase tem-1. Molecular Biology and Evolution, 33(1), 268-280. ,
[21] Finn, R. D., Coggill, P., Eberhardt, R. Y., Eddy, S. R., Mistry, J., Mitchell, A. L., … Salazar, G. A. (2015). The PFAM protein families database: Towards a more sustainable future. Nucleic Acids Research, 44(D1), D279-D285. ,
[22] Fischer, A., & Igel, C. (2012). An introduction to restricted Boltzmann machines. In Iberoamerican Congress on Pattern Recognition (pp. 14-36). Berlin: Springer. ,
[23] Fowler, D. M., Araya, C. L., Fleishman, S. J., Kellogg, E. H., Stephany, J. J., Baker, D., & Fields, S. (2010). High-resolution mapping of protein sequence-function relationships. Nature Methods, 7(9), 741. ,
[24] Gabrié, M., Tramel, E. W., & Krzakala, F. (2015). Training restricted Boltzmann machine via the thouless-Anderson-Palmer free energy. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in neural information processing systems, 28 (pp. 640-648). Red Hook, NY: Curran.
[25] Greener, J. G., Moffat, L., & Jones, D. T. (2018). Design of metalloproteins and novel protein folds using variational autoencoders. Scientific Reports, 8(1), 16189. ,
[26] Halabi, N., Rivoire, O., Leibler, S., & Ranganathan, R. (2009). Protein sectors: Evolutionary units of three-dimensional structure. Cell, 138(4), 774-786. ,
[27] Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771-1800. , · Zbl 1010.68111
[28] Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527-1554. , · Zbl 1106.68094
[29] Hopf, T. A., Ingraham, J. B., Poelwijk, F. J., Schärfe, C. P., Springer, M., Sander, C., & Marks, D. (2017). Mutation effects predicted from sequence co-variation. Nature Biotechnology, 35(2), 128-135. ,
[30] Hyvärinen, A., & Oja, E. (2000). Independent component analysis: Algorithms and applications. Neural Networks, 13(4-5), 411-430. ,
[31] Jacquin, H., Gilson, A., Shakhnovich, E., Cocco, S., & Monasson, R. (2016). Benchmarking inverse statistical approaches for protein structure and design with exactly solvable models. PLoS Computational Biology, 12(5), e1004889. ,
[32] Kingma, D. P., & Welling, M. (2013). Auto-encoding variational Bayes. arXiv:1312.6114.
[33] Kolodziejczyk, A. A., Kim, J. K., Svensson, V., Marioni, J. C., & Teichmann, S. A. (2015). The technology and biology of single-cell RNA sequencing. Molecular Cell, 58(4), 610-620. ,
[34] Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2009). Online dictionary learning for sparse coding. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 689-696). New York: ACM. , · Zbl 1242.62087
[35] McKeown, M. J., Makeig, S., Brown, G. G., Jung, T.-P., Kindermann, S. S., Bell, A. J., & Sejnowski, T. J. (1998). Analysis of FMRI data by blind separation into independent spatial components. Human Brain Mapping, 6(3), 160-188. ,
[36] McLaughlin Jr, R. N., Poelwijk, F. J., Raman, A., Gosal, W. S., & Ranganathan, R. (2012). The spatial architecture of protein function and adaptation. Nature, 491(7422), 138. ,
[37] Mescheder, L., Nowozin, S., & Geiger, A. (2017). Adversarial variational Bayes: Unifying variational autoencoders and generative adversarial networks. arXiv:1701.04722.
[38] Miyazawa, S., & Jernigan, R. L. (1996). Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. Journal of Molecular Biology, 256(3), 623-644. ,
[39] Morcos, F., Pagnani, A., Lunt, B., Bertolino, A., Marks, D. S., Sander, C., … Weigt, M. (2011). Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proceedings of the National Academy of Sciences, 108(49), E1293-E1301. ,
[40] Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (pp. 807-814). Madison, WI: Omnipress.
[41] Neal, R. M. (2001). Annealed importance sampling. Statistics and Computing, 11(2), 125-139. ,
[42] Nguyen, H. C., Zecchina, R., & Berg, J. (2017). Inverse statistical problems: From the inverse Ising problem to data science. Advances in Physics, 66(3), 197-261. ,
[43] Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607. ,
[44] Pennington, J., Schoenholz, S., & Ganguli, S. (2017). Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanatan, & R. Garnett (Eds.), Advances in neural information processing systems, 30 (pp. 4785-4795).
[45] Posani, L., Cocco, S., Ježek, K., & Monasson, R. (2017). Functional connectivity models for decoding of spatial representations from hippocampal CA1 recordings. Journal of Computational Neuroscience, 43(1), 17-33. ,
[46] Quadeer, A. A., Morales-Jimenez, D., & McKay, M. R. (2018). Co-evolution networks of HIV/HCV are modular with direct association to structure and function. bioRxiv:307033.
[47] Rausell, A., Juan, D., Pazos, F., & Valencia, A. (2010). Protein interactions and ligand binding: From protein subfamilies to functional specificity. Proceedings of the National Academy of Sciences, 107(5), 1995-2000. ,
[48] Riesselman, A. J., Ingraham, J. B., & Marks, D. S. (2017). Deep generative models of genetic variation capture mutation effects. arxiv:1712.06527.
[49] Rivoire, O., Reynolds, K. A., & Ranganathan, R. (2016). Evolution-based functional decomposition of proteins. PLoS Computational Biology, 12(6), e1004817. ,
[50] Salakhutdinov, R. (2010). Learning deep Boltzmann machines using adaptive MCMC. In Proceedings of the 27th International Conference on Machine Learning (pp. 943-950). Madison, WI: Omnipress.
[51] Salakhutdinov, R., & Murray, I. (2008). On the quantitative analysis of deep belief networks. In Proceedings of the 25th International Conference on Machine Learning (pp. 872-879). New York: ACM. ,
[52] Schwarz, D. A., Lebedev, M. A., Hanson, T. L., Dimitrov, D. F., Lehew, G., Meloy, J., … Nicolelis, M. A. (2014). Chronic, wireless recordings of large-scale brain activity in freely moving rhesus monkeys. Nature Methods, 11(6), 670. ,
[53] Shakhnovich, E., & Gutin, A. (1990). Enumeration of all compact conformations of copolymers with random sequence of links. Journal of Chemical Physics, 93(8), 5967-5971. ,
[54] Sinai, S., Kelsic, E., Church, G. M., & Novak, M. A. (2017). Variational autoencoding of protein sequences. arxiv:1712.03346.
[55] Tieleman, T. (2008). Training restricted Boltzmann machines using approximations to the likelihood gradient. In Proceedings of the 25th International Conference on Machine Learning (pp. 1064-1071). New York: ACM. ,
[56] Tkacik, G., Prentice, J. S., Balasubramanian, V., & Schneidman, E. (2010). Optimal population coding by noisy spiking neurons. Proceedings of the National Academy of Sciences USA, 107(32), 14419-14424. ,
[57] Tramel, E. W., Gabrié, M., Manoel, A., Caltagirone, F., & Krzakala, F. (2017). A deterministic and generalized framework for unsupervised learning with restricted Boltzmann machines. arXiv:1702.03260.
[58] Tubiana, J., Cocco, S., & Monasson, R. (2018). Learning protein constitutive motifs from sequence data. arXiv:1803.08718. · Zbl 1429.92113
[59] Tubiana, J., Cocco, S., & Monasson, R. (2019). Efficient sampling and parameterization improve Boltzmann machines. Manuscript in preparation. · Zbl 1429.92113
[60] Tubiana, J., & Monasson, R. (2017). Emergence of compositional representations in restricted Boltzmann machines. Physical Review Letters, 118(13), 138301. , · Zbl 1429.92113
[61] Vu, H., Nguyen, T. D., Le, T., Luo, W., & Phung, D. (2018). Batch normalized deep Boltzmann machines. In Proceedings of the Asian Conference on Machine Learning (pp. 359-374).
[62] Weigt, M., White, R. A., Szurmant, H., Hoch, J. A., & Hwa, T. (2009). Identification of direct residue contacts in protein-protein interaction by message passing. Proceedings of the National Academy of Sciences, 106(1), 67-72. ,
[63] Wolf, S., Supatto, W., Debrégeas, G., Mahou, P., Kruglik, S. G., Sintes, J.-M., … Candelier, R. (2015). Whole-brain functional imaging with two-photon light-sheet microscopy. Nature Methods, 12(5), 379. ,
[64] Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2), 265-286. ,
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.