×

A knowledge gradient policy for sequencing experiments to identify the structure of RNA molecules using a sparse additive belief model. (English) Zbl 1519.92166

Summary: We present a sparse knowledge gradient (SpKG) algorithm for adaptively selecting the targeted regions within a large RNA molecule to identify which regions are most amenable to interactions with other molecules. Experimentally, such regions can be inferred from fluorescence measurements obtained by binding a complementary probe with fluorescence markers to the targeted regions. We perform a regularized, sparse linear model with a log link function where the marginal contribution to the thermodynamic cycle of each nucleotide is purely additive. The SpKG algorithm uniquely combines the Bayesian ranking and selection problem with the frequentist \(l_1\) regularized regression approach Lasso. We use this algorithm to identify the sparsity pattern of the linear model as well as sequentially decide the best regions to test before exhausting an experimental budget. We also develop two new algorithms: batch SpKG and batch SpKG-LM. The first algorithm generates more suggestions sequentially to run parallel experiments. The second one dynamically adds new alternatives, in the form of types of probes, which are created by inserting, deleting, or mutating nucleotides within existing probes. In simulation, we demonstrate these algorithms on the Tetrahymena Group I intron (a midsize RNA molecule), showing that they efficiently learn the correct sparsity pattern, identify the most accessible region, and outperform several other policies.

MSC:

92D20 Protein sequences, DNA sequences

Software:

EGO

References:

[1] Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2-3):235-256.Crossref, Google Scholar · Zbl 1012.68093 · doi:10.1023/A:1013689704352
[2] Bennett CF, Swayze EE (2010) RNA targeting therapeutics: Molecular mechanisms of antisense oligonucleotides as a therapeutic platform. Ann. Rev. Pharmacol. Toxicol. 50:259-293.Crossref, Google Scholar · doi:10.1146/annurev.pharmtox.010909.105654
[3] Brochu E, Brochu T, de Freitas N (2010) A Bayesian interactive optimization approach to procedural animation design. Otaduy M, Popovic Z, eds. Proc. 2010 ACM SIGGRAPH/Eurographics Sympos. Comput. Animation (Eurographics Association), 103-112.Google Scholar
[4] Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5:1-122.Crossref, Google Scholar · Zbl 1281.91051 · doi:10.1561/2200000024
[5] Cech TR, Zaug AJ, Grabowski PJ (1981) In vitro splicing of the ribosomal RNA precursor of Tetrahymena: Involvement of a guanosine nucleotide in the excision of the intervening sequence. Cell 27(3):487-496.Crossref, Google Scholar · doi:10.1016/0092-8674(81)90390-1
[6] Chan JH, Lim S, Wong WS (2006) Antisense oligonucleotides: From design to therapeutic application. Clin. Experiment. Pharmacol. Physiol. 33(5-6):533-540.Crossref, Google Scholar · doi:10.1111/j.1440-1681.2006.04403.x
[7] Chen C (2010) Stochastic Simulation Optimization: An Optimal Computing Budget Allocation, Vol. 1 (World Scientific, Singapore).Crossref, Google Scholar · doi:10.1142/7437
[8] Chen X, Lin Q, Pena J (2012) Optimal regularized dual averaging methods for stochastic optimization. Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Adv. Neural Inform. Process Systems, Vol. 25, 404-412.Google Scholar
[9] Chen Y, Hero AO (2012) Recursive group Lasso. IEEE Trans. Signal Processing 60(8):3978-3987.Crossref, Google Scholar · Zbl 1393.94195 · doi:10.1109/TSP.2012.2192924
[10] Cohn DA, Ghahramani Z, Jordan MI (1996) Active learning with statistical models. J. Artificial Intell. Res. 4:129-145.Crossref, Google Scholar · Zbl 0900.68366 · doi:10.1613/jair.295
[11] Cressie N (2015) Statistics for Spatial Data (John Wiley & Sons, Hoboken, NJ).Google Scholar · Zbl 1347.62005
[12] Currin C, Mitchell T, Morris M, Ylvisaker D (1991) Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments. J. Amer. Statist. Assoc. 86(416):953-963.Crossref, Google Scholar · doi:10.1080/01621459.1991.10475138
[13] De Jong H (2002) Modeling and simulation of genetic regulatory systems: A literature review. J. Comput. Biol. 9(1):67-103.Crossref, Google Scholar · doi:10.1089/10665270252833208
[14] DeVos SL, Miller TM (2013) Antisense oligonucleotides: Treating neurodegeneration at the level of RNA. Neurotherapeutics 10(3): 486-497.Crossref, Google Scholar · doi:10.1007/s13311-013-0194-5
[15] Ding Y, Tang Y, Kwok CK, Zhang Y, Bevilacqua PC, Assmann SM (2014) In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature 505(7485):696-700.Crossref, Google Scholar · doi:10.1038/nature12756
[16] Federer WTet al. (1955) Experimental design. Experiment. Design (Wiley, Oxford, UK).Google Scholar
[17] Frazier PI, Powell WB, Dayanik S (2008) A knowledge-gradient policy for sequential information collection. SIAM J. Control Optim. 47(5):2410-2439.Crossref, Google Scholar · Zbl 1274.62155 · doi:10.1137/070693424
[18] Frazier PI, Powell WB, Dayanik S (2009) The knowledge-gradient policy for correlated normal beliefs. INFORMS J. Comput. 21(4):599-613.Link, Google Scholar · Zbl 1243.91014
[19] Garrigues P, El Ghaoui L (2008) An homotopy algorithm for the Lasso with online observations. Koller D, Schuurmans D, Bengio Y, Bottou L, eds. Adv. Neural Inform. Process Systems, Vol. 21, 489-496.Google Scholar
[20] Gelderman G, Contreras LM (2013) Discovery of post-transcriptional regulatory RNAs using next generation sequencing technologies. Gelderman G, Contreras LM, eds. Systems Metabolic Engineering (Humana Press, Totowa, NJ), 269-295.Crossref, Google Scholar · doi:10.1007/978-1-62703-299-5_14
[21] Gittins J, Glazebrook K, Weber R (2011) Multi-Armed Bandit Allocation Indices (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar · Zbl 1401.90257 · doi:10.1002/9780470980033
[22] Gold L, Polisky B, Uhlenbeck O, Yarus M (1995) Diversity of oligonucleotide functions. Annual Rev. Biochem. 64(1):763-797.Crossref, Google Scholar · doi:10.1146/annurev.bi.64.070195.003555
[23] Gupta SS, Miescke KJ (1996) Bayesian look ahead one-stage sampling allocations for selection of the best population. J. Statist. Planning Inference 54(2):229-244.Crossref, Google Scholar · Zbl 0854.62018 · doi:10.1016/0378-3758(95)00169-7
[24] Haning K, Cho SH, Contreras LM (2015) Strain engineering via regulatory noncoding RNAs: Not a one-blueprint fits all. Curr. Opin. Chem. Engrg. 10:25-34.Crossref, Google Scholar · doi:10.1016/j.coche.2015.07.008
[25] Huang D, Allen TT, Notz WI, Miller RA (2006b) Sequential kriging optimization using multiple-fidelity evaluations. Structural Multidisciplinary Optim. 32(5):369-382.Crossref, Google Scholar · doi:10.1007/s00158-005-0587-0
[26] Huang D, Allen TT, Notz WI, Zeng N (2006a) Global optimization of stochastic black-box systems via sequential kriging meta-models. J. Global Optim. 34(3):441-466.Crossref, Google Scholar · Zbl 1098.90097 · doi:10.1007/s10898-005-2454-3
[27] Jones DM, Gittins JC (1972) A Dynamic Allocation Index for the Sequential Design of Experiments (Department of Engineering, University of Cambridge, Cambridge, UK).Google Scholar
[28] Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J. Global Optim. 13(4):455-492.Crossref, Google Scholar · Zbl 0917.90270 · doi:10.1023/A:1008306431147
[29] Kennedy MC, O’Hagan A (2001) Bayesian calibration of computer models. J. Roy. Statist. Soc. Ser. B-Statist. Methodol. 63(3):425-464.Crossref, Google Scholar · Zbl 1007.62021 · doi:10.1111/1467-9868.00294
[30] Kertesz M, Iovino N, Unnerstall U, Gaul U, Segal E (2007) The role of site accessibility in microRNA target recognition. Nature Genetics 39(10):1278-1284.Crossref, Google Scholar · doi:10.1038/ng2135
[31] Kertesz M, Wan Y, Mazor E, Rinn JL, Nutter RC, Chang HY, Segal E (2010) Genome-wide measurement of RNA secondary structure in yeast. Nature 467(7311):103-107.Crossref, Google Scholar · doi:10.1038/nature09322
[32] Laing C, Schlick T (2011) Computational approaches to RNA structure prediction, analysis, and design. Current Opinion Structural Biol. 21(3):306-318.Crossref, Google Scholar · doi:10.1016/j.sbi.2011.03.015
[33] Li Y, Liu H, Powell WB (2015) The knowledge gradient policy using a sparse additive belief model. Working paper, Princeton University, Princeton, NJ.Google Scholar
[34] Li Y, Liu H, Powell WB (2016) A Lasso-based sparse knowledge gradient policy for sequential optimal learning. Internat. Conf. Artificial Intelligence Statist., Vol. 51, 417-425.Google Scholar
[35] Negoescu DM, Frazier PI, Powell WB (2011) The knowledge-gradient algorithm for sequencing experiments in drug discovery. INFORMS J. Comput. 23(3):346-363.Link, Google Scholar · Zbl 1243.92023
[36] Pain A, Ott A, Amine H, Rochat T, Bouloc P, Gautheret D (2015) An assessment of bacterial small RNA target prediction programs. RNA Biol. 12(5):509-513.Crossref, Google Scholar · doi:10.1080/15476286.2015.1020269
[37] Powell WB, Ryzhov IO (2012) Optimal Learning (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar · doi:10.1002/9781118309858
[38] Rodrigo G, Landrain TE, Majer E, Daros JA, Jaramillo A (2013) Full design automation of multi-state RNA devices to program gene expression using energy-based optimization. PLoS Comput. Biol. 9(8):e1003172.Crossref, Google Scholar · doi:10.1371/journal.pcbi.1003172
[39] Russell R, Das R, Suh H, Travers KJ, Laederach A, Engelhardt MA, Herschlag D (2006) The paradoxical behavior of a highly structured misfolded intermediate in RNA folding. J. Mol. Biol. 363(2):531-544.Crossref, Google Scholar · doi:10.1016/j.jmb.2006.08.024
[40] Ryzhov IO, Powell WB, Frazier PI (2012) The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1):180-195.Link, Google Scholar · Zbl 1241.90201
[41] Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysis of computer experiments. Statist. Sci. 4(4):409-423.Crossref, Google Scholar · Zbl 0955.62619 · doi:10.1214/ss/1177012413
[42] SantaLucia J (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc. Natl. Acad. Sci. USA 95(4):1460-1465.Crossref, Google Scholar · doi:10.1073/pnas.95.4.1460
[43] Settles B (2010) Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison, Madison.Google Scholar
[44] Sowa SW, Vazquez-Anderson J, Clark CA, De La Peña R, Dunn K, Fung EK, Khoury MJ, Contreras LM (2015) Exploiting post-transcriptional regulation to probe RNA structures in vivo via fluorescence. Nucleic Acids Res. 43(2):e13-e13.Crossref, Google Scholar · doi:10.1093/nar/gku1191
[45] Spall JC (2005) Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control, Vol. 65 (John Wiley & Sons, Hoboken, NJ).Google Scholar
[46] Swisher JR, Jacobson SH, Yücesan E (2003) Discrete-event simulation optimization using ranking, selection, and multiple comparison procedures: A survey. ACM Trans. Model. Comput. Simul. 13(2):134-154.Crossref, Google Scholar · Zbl 1390.65025 · doi:10.1145/858481.858484
[47] Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B-Statist. Methodol. 58(1):267-288.Google Scholar · Zbl 0850.62538
[48] Vazquez E, Bect J (2010) Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. J. Statist. Plan. Inference 140(11):3088-3095.Crossref, Google Scholar · Zbl 1419.62200 · doi:10.1016/j.jspi.2010.04.018
[49] Vazquez-Anderson J, Contreras LM (2013) Regulatory RNAs: Charming gene management styles for synthetic biology applications. RNA Biol. 10(12):1778-1797.Crossref, Google Scholar · doi:10.4161/rna.27102
[50] Vazquez-Anderson J, Mihailovic MK, Baldridge KC, Reyes KG, Haning K, Cho SH, Amador Pet al. (2017) Optimization of a novel biophysical model using large scale in vivo antisense hybridization data displays improved prediction capabilities of structurally accessible RNA regions. Nucleic Acids Res. 45(9):5523-5538.Crossref, Google Scholar · doi:10.1093/nar/gkx115
[51] Wang Y, Reyes KG, Brown KA, Mirkin CA, Powell WB (2015) Nested-batch-mode learning and stochastic optimization with an application to sequential multistage testing in materials science. SIAM J. Sci. Comput. 37(3):B361-B381.Crossref, Google Scholar · Zbl 1332.62270 · doi:10.1137/140971117
[52] Whittle P (1980) Multi-armed bandits and the gittins index. J. Roy. Statist. Soc. Ser. B-Statist. Methodol. 42(2):143-149.Google Scholar · Zbl 0439.90096
[53] Winer BJ, Brown DR, Michels KM (1971) Statistical Principles in Experimental Design, Vol. 2 (McGraw-Hill, New York).Google Scholar
[54] Xia T, · doi:10.1021/bi9809425
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.