×

Equi-energy sampler with applications in statistical inference and statistical mechanics. (English) Zbl 1246.82054

Summary: We introduce a new sampling algorithm, the equi-energy sampler, for efficient statistical sampling and estimation. Complementary to the widely used temperature-domain methods, the equi-energy sampler, utilizing the temperature-energy duality, targets the energy directly. The focus on the energy function not only facilitates efficient sampling, but also provides a powerful means for statistical estimation, for example, the calculation of the density of states and microcanonical averages in statistical mechanics. The equi-energy sampler is applied to a variety of problems, including exponential regression in statistics, motif sampling in computational biology and protein folding in biophysics.

MSC:

82B80 Numerical methods in equilibrium statistical mechanics (MSC2010)
65C05 Monte Carlo methods
65C40 Numerical analysis or methods applied to Markov chains
94A20 Sampling theory in information and communication theory
62F15 Bayesian inference
62D05 Sampling theory, sample surveys

References:

[1] Bailey, T. L. and Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Second International Conference on Intelligent Systems for Molecular Biology 2 28–36. AAAI Press, Menlo Park, CA.
[2] Berg, B. A. and Neuhaus, T. (1991). Multicanonical algorithms for first order phase-transitions. Phys. Lett. B 267 249–253.
[3] Besag, J. and Green, P. J. (1993). Spatial statistics and Bayesian computation. J. Roy. Statist. Soc. Ser. B 55 25–37. JSTOR: · Zbl 0800.62572
[4] Dill, K. A. and Chan, H. S. (1997). From Levinthal to pathways to funnels. Nature Structural Biology 4 10–19.
[5] Edwards, R. G. and Sokal, A. D. (1988). Generalization of the Fortuin–Kasteleyn–Swendsen–Wang representation and Monte Carlo algorithm. Phys. Rev. D ( 3 ) 38 2009–2012. · doi:10.1103/PhysRevD.38.2009
[6] Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398–409. JSTOR: · Zbl 0702.62020 · doi:10.2307/2289776
[7] Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Analysis and Machine Intelligence 6 721–741. · Zbl 0573.62030 · doi:10.1109/TPAMI.1984.4767596
[8] Geyer, C. J. (1991). Markov chain Monte Carlo maximum likelihood. In Computing Science and Statistics : Proc. 23rd Symposium on the Interface (E. M. Keramidas, ed.) 156–163. Interface Foundation, Fairfax Station, VA. · Zbl 0751.12004 · doi:10.1007/BF02950753
[9] Geyer, C. J. (1994). Estimating normalizing constants and reweighting mixtures in Markov chain Monte Carlo. Technical Report 568, School of Statistics, Univ. Minnesota.
[10] Geyer, C. J. and Thompson, E. (1995). Annealing Markov chain Monte Carlo with applications to ancestral inference. J. Amer. Statist. Assoc. 90 909–920. · Zbl 0850.62834 · doi:10.2307/2291325
[11] Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 97–109. · Zbl 0219.65008 · doi:10.1093/biomet/57.1.97
[12] Higdon, D. M. (1998). Auxiliary variable methods for Markov chain Monte Carlo with applications. J. Amer. Statist. Assoc. 93 585–595. · Zbl 0953.62103 · doi:10.2307/2670110
[13] Hukushima, K. and Nemoto, K. (1996). Exchange Monte Carlo and application to spin glass simulations. J. Phys. Soc. Japan 65 1604–1608.
[14] Jensen, S. T., Liu, X. S., Zhou, Q. and Liu, J. S. (2004). Computational discovery of gene regulatory binding motifs: A Bayesian perspective. Statist. Sci. 19 188–204. · Zbl 1057.62101 · doi:10.1214/088342304000000107
[15] Kong, A., Liu, J. S. and Wong, W. H. (1994). Sequential imputations and Bayesian missing data problems. J. Amer. Statist. Assoc. 89 278–288. · Zbl 0800.62166 · doi:10.2307/2291224
[16] Kou, S. C., Oh, J. and Wong, W. H. (2006). A study of density of states and ground states in hydrophobic-hydrophilic protein folding models by equi-energy sampling. J. Chemical Physics 124 244903.
[17] Kou, S. C., Xie, X. S. and Liu, J. S. (2005). Bayesian analysis of single-molecule experimental data (with discussion). Appl. Statist. 54 469–506. · Zbl 1490.62346 · doi:10.1111/j.1467-9876.2005.00509.x
[18] Landau, D. P. and Binder, K. (2000). A Guide to Monte Carlo Simulations in Statistical Physics . Cambridge Univ. Press. · Zbl 0998.82504
[19] Lau, K. F. and Dill, K. A. (1989). A lattice statistical mechanics model of the conformational and sequence spaces of proteins. Macromolecules 22 3986–3997. · Zbl 1394.94870
[20] Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F. and Wootton, J. (1993). Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262 208–214.
[21] Lawrence, C. E. and Reilly, A. A. (1990). An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7 41–51.
[22] Li, K.-H. (1988). Imputation using Markov chains. J. Statist. Comput. Simulation 30 57–79. · Zbl 0726.62017 · doi:10.1080/00949658808811085
[23] Liang, F. and Wong, W. H. (2001). Real-parameter evolutionary Monte Carlo with applications to Bayesian mixture models. J. Amer. Statist. Assoc. 96 653–666. JSTOR: · Zbl 1017.62022 · doi:10.1198/016214501753168325
[24] Liu, J. S. (1994). The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. J. Amer. Statist. Assoc. 89 958–966. JSTOR: · Zbl 0804.62033 · doi:10.2307/2290921
[25] Liu, J. S., Neuwald, A. F. and Lawrence, C. E. (1995). Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Amer. Statist. Assoc. 90 1156–1170. · Zbl 0864.62076 · doi:10.2307/2291508
[26] Liu, X., Brutlag, D. L. and Liu, J. S. (2001). BioProspector: Discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In Pacific Symp. Biocomputing 6 127–138.
[27] Marinari, E. and Parisi, G. (1992). Simulated tempering: A new Monte Carlo scheme, Europhys. Lett. 19 451–458.
[28] Meng, X.-L. and Wong, W. H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statist. Sinica 6 831–860. · Zbl 0857.62017
[29] Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953). Equations of state calculations by fast computing machines. J. Chemical Physics 21 1087–1091.
[30] Mira, A., Moller, J. and Roberts, G. (2001). Perfect slice samplers. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 593–606. JSTOR: · Zbl 0993.65015 · doi:10.1111/1467-9868.00301
[31] Neal, R. M. (2003). Slice sampling (with discussion). Ann. Statist. 31 705–767. · Zbl 1051.65007 · doi:10.1214/aos/1056562461
[32] Roberts, G. and Rosenthal, J. S. (1999). Convergence of slice sampler Markov chains. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 643–660. JSTOR: · Zbl 0929.62098 · doi:10.1111/1467-9868.00198
[33] Roth, F. P., Hughes, J. D., Estep, P. W. and Church, G. M. (1998). Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nature Biotechnology 16 939–945.
[34] Schneider, T. D. and Stephens, R. M. (1990). Sequence logos: A new way to display consensus sequences. Nucleic Acids Research 18 6097–6100.
[35] Sela, M., White, F. H. and Anfinsen, C. B. (1957). Reductive cleavage of disulfide bridges in ribonuclease. Science 125 691–692.
[36] Stormo, G. D. and Hartzell, G. W. (1989). Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl. Acad. Sci. USA 86 1183–1187.
[37] Swendsen, R. H. and Wang, J.-S. (1987). Nonuniversal critical dynamics in Monte Carlo simulations. Phys. Rev. Lett. 58 86–88.
[38] Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). J. Amer. Statist. Assoc. 82 528–550. JSTOR: · Zbl 0619.62029 · doi:10.2307/2289457
[39] Tierney, L. (1994). Markov chains for exploring posterior distributions (with discussion). Ann. Statist. 22 1701–1762. · Zbl 0829.62080 · doi:10.1214/aos/1176325750
[40] Wang, F. and Landau, D. P. (2001). Determining the density of states for classical statistical models: A random walk algorithm to produce a flat histogram. Phys. Rev. E 64 056101.
[41] Zhou, Q. and Wong, W. H. (2004). CisModule: De novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proc. Natl. Acad. Sci. USA 101 12114–12119.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.