Abstract
Recent years have seen revived interest in computer-assisted organic synthesis1,2. The use of reaction- and neural-network algorithms that can plan multistep synthetic pathways have revolutionized this field1,3,4,5,6,7, including examples leading to advanced natural products6,7. Such methods typically operate on full, literature-derived ‘substrate(s)-to-product’ reaction rules and cannot be easily extended to the analysis of reaction mechanisms. Here we show that computers equipped with a comprehensive knowledge-base of mechanistic steps augmented by physical-organic chemistry rules, as well as quantum mechanical and kinetic calculations, can use a reaction-network approach to analyse the mechanisms of some of the most complex organic transformations: namely, cationic rearrangements. Such rearrangements are a cornerstone of organic chemistry textbooks and entail notable changes in the molecule’s carbon skeleton8,9,10,11,12. The algorithm we describe and deploy at https://HopCat.allchemy.net/ generates, within minutes, networks of possible mechanistic steps, traces plausible step sequences and calculates expected product distributions. We validate this algorithm by three sets of experiments whose analysis would probably prove challenging even to highly trained chemists: (1) predicting the outcomes of tail-to-head terpene (THT) cyclizations in which substantially different outcomes are encoded in modular precursors differing in minute structural details; (2) comparing the outcome of THT cyclizations in solution or in a supramolecular capsule; and (3) analysing complex reaction mixtures. Our results support a vision in which computers no longer just manipulate known reaction types1,2,3,4,5,6,7 but will help rationalize and discover new, mechanistically complex transformations.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Mechanistic reaction rules, physical-organic methods and the kinetic model are detailed in the main text, Methods and the Supporting Information. All 715 atom-mapped mechanistic pathways from which the mechanistic steps were extracted are posted at https://HopCatResults.allchemy.net. Therein, networks propagated from the literature substrates are also deposited. Experimental details including spectroscopic data can be found in Supplementary Information sections 7 and 8. We intend to update HopCat based on new literature findings; these improvements will be made available to the software’s users.
Code availability
The interactive HopCat web application allowing for calculations starting from arbitrary carbocations is freely available to academic users at https://HopCat.allchemy.net/ (given server capacity, to five concurrent academic users on a rolling basis and 2-week slots). HopCat’s pseudocode is provided in Supplementary Information section 3. Code for the calculation of conformers under confinement is deposited at https://github.com/Nanotekton/ellipsoid_cavity.
References
Szymkuć, S. et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. 55, 5904–5937 (2016).
Corey, E. J. & Wipke, W. T. Computer-assisted design of complex organic syntheses. Science 166, 178–192 (1969).
Klucznik, T. et al. Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory. Chem. 4, 522–532 (2018).
Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).
Coley, C. W., Green, W. H. & Jensen, K. F. Machine learning in computer-aided synthesis planning. Acc. Chem. Res. 51, 1281–1289 (2018).
Mikulak-Klucznik, B. et al. Computational planning of the synthesis of complex natural products. Nature 588, 83–88 (2020).
Lin, Y., Zhang, R., Wang, D. & Cernak, T. Computer-aided key step generation in alkaloid total synthesis. Science 379, 453–457 (2023).
Tantillo, D. J. Biosynthesis via carbocations: theoretical studies on terpene formation. Nat. Prod. Rep. 28, 1035–1053 (2011).
Christianson, D. W. Structural biology and chemistry of the terpenoid cyclases. Chem. Rev. 106, 3412–3442 (2006).
Olah, G. My search for carbocations and their role in chemistry (Nobel Lecture). Angew. Chem. Int. Ed. Eng. 34, 1393–1405 (1995).
Reis, M. C., Lopez, C. S., Faza, O. N. & Tantillo, D. J. Pushing the limits of concertedness. A waltz of wandering carbocations. Chem. Sci. 10, 2159–2170 (2019).
Hare, S. R. & Tantillo, D. J. Post-transition state bifurcations gain momentum—current state of the field. Pure Appl. Chem. 89, 679–698 (2017).
Breitmaier, E. Terpenes (Wiley‐VCH, 2006).
Hong, Y. J. & Tantillo, D. J. The taxadiene-forming carbocation cascade. J. Am. Chem. Soc. 133, 18249–18256 (2011).
Surendra, K., Rajendar, G. & Corey, E. J. Useful catalytic enantioselective cationic double annulation reactions initiated a tan internal π-bond: method and applications. J. Am. Chem. Soc. 136, 642–645 (2014).
Jørgensen, L. et al. 14-Step synthesis of (+)-ingenol from (+)-3-carene. Science 341, 878–882 (2013).
Pemberton, R. P., Hong, Y. J. & Tantillo, D. J. Inherent dynamical preferences in carbocation rearrangements leading to terpene natural products. Pure Appl. Chem. 85, 1949–1957 (2013).
Hare, S. R., Pemberton, R. P. & Tantillo, D. J. Navigating past a fork in the road: carbocation−π interactions can manipulate dynamic behavior of reactions facing post-transition-state bifurcations. J. Am. Chem. Soc. 139, 7485–7493 (2017).
Gutta, P. & Tantillo, D. J. Proton sandwiches: nonclassical carbocations with tetracoordinate protons. Angew. Chem. Int. Ed. 44, 2719–2723 (2005).
Gordeeva, E. V., Shcherbukhin, V. V. & Zefirov, N. S. The ICAR program: computer-assisted investigation of carbocationic rearrangements. Tetrah. Comp. Meth. 3, 429–443 (1990).
Gund, T. M., Schleyer, P. R., Gund, P. H. & Wipke, W. T. Computer assisted graph theoretical analysis of complex mechanistic problems in polycyclic hydrocarbons. The mechanism of diamantane formation from various pentacyclotetradecanes. J. Am. Chem. Soc. 97, 743–751 (1975).
Chen, J. H. & Baldi, P. No electron left behind: a rule-based expert system to predict chemical reactions and reaction mechanisms. J. Chem. Inf. Model. 49, 2034–2043 (2009).
Kayala, M. A. & Baldi, P. ReactionPredictor: prediction of complex chemical reactions at the mechanistic level using machine learning. J. Chem. Inf. Mod. 51, 2526–2540 (2012).
Tian, B., Poulter, C. D. & Jacobson, M. P. Defining the product chemical space of monoterpenoid synthases. PLoS Comput. Biol. 12, e1005053 (2016).
Chow, J. Y. et al. Computational-guided discovery and characterization of a sesquiterpene synthase from Streptomyces clavuligerus. Proc. Natl Acad. Sci. USA 112, 5661–5666 (2015).
Levy, D. E. Arrow-Pushing in Organic Chemistry: An Easy Approach to Understanding Reaction Mechanisms (Wiley, 2017).
Molga, K., Gajewska, E. P., Szymkuć, S. & Grzybowski, B. A. The logic of translating chemical knowledge into machine-processable forms: a modern playground for physical-organic chemistry. React. Chem. Eng. 4, 1506–1521 (2019).
Hare, S. R. & Tantillo, D. J. Dynamic behavior of rearranging carbocations–implications for terpene biosynthesis. Beilstein J. Org. Chem. 12, 377–390 (2016).
Wołos, A. et al. Computer-designed repurposing of chemical wastes into drugs. Nature 604, 668–676 (2022).
Jonathan, H. G., Baldwin, J. E. & Adlington, R. M. Enantiospecific, biosynthetically inspired formal total synthesis of (+)-liphagal. Org. Lett. 12, 2394–2397 (2010).
Duc, D. K. M., Fetizon, M. & Lazare, S. A short synthesis of (+)-isophyllocladene and (+)-phyllocladene. J. Chem. Soc., Chem. Commun. 8, 282 (1975).
Kasturi, T. R. & Chandra, R. Rearrangement of homobrendane derivatives. Total syntheses of racemic copacamphor, ylangocamphor, and their homologues. J. Org. Chem. 53, 3178–3183 (1988).
Michalak, M., Michalak, K., Urbanczyk-Lipkowska, Z. & Wicha, J. Synthetic studies on dicyclopenta[a,d]cyclooctane terpenoids: construction of the core structure of fusicoccins and ophiobolins on the route involving a Wagner–Meerwein rearrangement. J. Org. Chem. 76, 7497–7509 (2011).
Hosoyama, H., Shigemori, H. & Kobayashi, J. Further unexpected boron trifluoride-catalyzed reactions of toxoids with α- and β-4,20-epoxides. J. Chem. Soc., Perkin Trans. 1 3, 449–451 (2000).
Hur, S. & Bruice, T. C. Enzymes do what is expected (chalcone isomerase versus chorismate mutase). J. Am. Chem. Soc. 125, 1472–1473 (2003).
Merget, S., Catti, L., Piccini, G. & Tiefenbacher, K. Requirements for terpene cyclizations inside the supramolecular resorcinarene capsule: bound water and its protonation determine the catalytic activity. J. Am. Chem. Soc. 142, 4400–4410 (2020).
Zhang, Q. & Tiefenbacher, K. Terpene cyclization catalysed inside a self-assembled cavity. Nat. Chem. 7, 197–202 (2015).
Lossing, F. P. & Holmes, J. L. Stabilization energy and ion size in carbocations in the gas phase. J. Am. Chem. Soc. 106, 6917–6920 (1984).
Pulkkinen, E., Vedenlohkaisussa, F. & Toisiintumisista, T. Suom. Kemistil. A 30, 239–245 (1957).
Junqi, L. et al. Synthesis of many different types of organic small molecules using one automated process. Science 347, 1221–1226 (2015).
Blair, D. J. et al. Automated iterative Csp3-C bond formation. Nature 604, 92–97 (2022).
Zhang, Q., Rinkel, J., Goldfuss, B., Dickschat, J. S. & Tiefenbacher, K. Sesquiterpene cyclizations catalysed inside the resorcinarene capsule and application in the short synthesis of isolongifolene and isolongifolenone. Nat. Catal. 1, 609–615 (2018).
Stewart, J. J. P. Optimization of parameters for semiempirical methods VI: more modifications to the NDDO approximations and re-optimization of parameters. J. Mol. Model. 19, 1–32 (2013).
McCreadie, T. & Overton, K. H. The conversion of labdadienols into pimara-and rosa-dienes. J. Chem. Soc. C 312–316 (1971).
Ungur, N. D., Barba, A. N. & Vlad, P. F. Cyclization and rearrangement of diterpenoids. VII. Composition of the hydrocarbon fraction of a mixture of the products of cyclization of manool and sclareol by ordinary acids. Chem. Nat. Compd. 24, 612–614 (1988).
Wang, M., Wu, A., Pan, X. & Yang, H. Total synthesis of two naturally occurring bicyclo[3.2.1]octanoid neolignans. J. Org. Chem. 67, 5405–5407 (2002).
Kobayashi, J. & Shigemori, H. Bioactive taxoids from the Japanese yew Taxus cuspidate. Med. Res. Rev. 22, 305–328 (2002).
Schneider, F., Pan, L., Ottenbruch, M., List, T. & Gaich, T. The chemistry of nonclassical taxane diterpene. Acc. Chem. Res. 54, 2347–2360 (2021).
Vrček, I., Vrček, V. & Siehl, H. Quantum chemical study of degenerate hydride shifts in acyclic tertiary carbocations. J. Phys. Chem. A 106, 1604–1611 (2002).
Bannwarth, C. et al. Extended tight-binding quantum chemistry methods. Wiley Interdiscip. Rev. Comput. Mol. Sci. 11, e1493 (2021).
Stewart, J. J. P. Optimization of parameters for semiempirical methods. V. Modification of NDDO approximations and application to 70 elements. J. Mol. Model. 13, 1173–1213 (2007).
The modern open-source version of the Molecular Orbital PACkage (MOPAC). Version 22.0.4 GitHub https://github.com/opeSoftwarenmopac/mopac (8 July 2022).
Atz, K., Isert, C., Böcker, M., Jiménez-Luna, J., & Schneider, G. Open-source Δ-quantum machine learning for medicinal chemistry. Preprint at https://doi.org/10.26434/chemrxiv-2021-fz6v7-v2 (2021).
Cristiano, M. et al. Investigations into the mechanism of action of nitrobenzene as a mild dehydrogenating agent under acid-catalysed conditions. Org. Biomol. Chem. 1, 565–574 (2003).
Shampine, L. F. & Reichelt, M. W. The MATLAB Ode Suite. SIAM J. Sci. Comput. https://doi.org/10.1137/S1064827594276424 (1997).
Powell, M. J. D. A Direct Search Optimization Method that Models the Objective and Constraint Functions by Linear Interpolation (Springer, 1994).
Gomez, S. & Hennart, J.-P. Advances in Optimization and Numerical Analysis (Springer Science & Business Media, 2013).
Powell, M. J. A view of algorithms for optimization without derivatives. Math. Today Bull. Inst. Math. Appl. 43, 170–174 (2007).
Gutierrez, O. et al. Carbonium vs. carbenium ion-like transition state geometries for carbocation cyclization–how strain associated with bridging affects 5-exo vs. 6-endo selectivity. Chem. Sci. 4, 3894–3898 (2013).
Pemberton, R. P. & Tantillo, D. J. Lifetimes of carbocations encountered along reaction coordinates for terpene formation. Chem. Sci. 5, 3301–3308 (2014).
Olah, G. A., Jeuell, C. L., Kelly, D. P. & Porter, R. D. Stable carbocations. CXIV. Structure of cyclopropylcarbinyl and cyclobutyl cations. J. Am. Chem. Soc. 94, 146–156 (1972).
Barkash, V. A. & Shubin, V. G. Contemporary Problems in Carbonium Ion Chemistry I/II (Springer, 1984).
Yokoo, K., Sakai, D. & Mori, K. Highly stereoselective synthesis of fused tatrahydropyrans via Lewis-acid-promoted double C(sp3)-H bond functionalization. Org. Lett. 22, 5801–5805 (2020).
Cui, C. et al. Total synthesis and target identification of the curcusone diterpenes. J. Am. Chem. Soc. 143, 4379–4386 (2021).
Sato, H., Takagi, T., Miyamoto, K. & Uchiyama, M. Theoretical study on the mechanism of spirocyclization in spiroviolene biosynthesis. Chem. Pharm. Bull. 69, 1034–1038 (2021).
Lauterbach, L., Rinkel, J. & Dickschat, J. S. Two bacterial diterpene synthases from Allokutzneria albata produce bonnadiene, phomopsene, and allokutznerene. Angew. Chem. Int. Ed. 57, 8280–8283 (2018).
Qin, B. et al. An unusual chimeric diterpene synthase from Emericella variecolor and its functional conversion into a sesterterpene synthase by domain swapping. Angew. Chem. Int. Ed. 55, 1658–1661 (2016).
Acknowledgements
Development of all codes and algorithms described in this work was supported by internal funds of Allchemy, Inc. (to T.K., B.M.-K., M.M., S.S. and W.B.). Experimental validations by S.B. and J.M. were supported in part by the Foundation for Polish Science (award no. TEAM/2017-4/38 to J.M.). Experimental validations by L.G. were supported by the National Science Centre, Poland (grant Maestro, grant no. 2018/30/A/ST5/00529). L.-D.S. received funding from the European Union’s Framework Programme for Research and Innovation Horizon 2020 (2014-2020) under the Marie Skłodowska-Curie grant agreement no. 836024. M.D.B. gratefully acknowledges support from an NIH MIRA award (grant no. R35GM118185). K.T. gratefully acknowledges support from the NCCR Catalysis (grant no. 180544), a National Centre of Competence in Research supported by the Swiss National Science Foundation. Analysis of pathways and writing of the paper by B.A.G. was supported by the Institute for Basic Science, Korea (Project Code no. IBS-R020-D1).
Author information
Authors and Affiliations
Contributions
T.K. codified most of the mechanistic-step rules with the help from B.M.-K. and S.S. W.B. developed the network and kinetic codes and, with the help of T.K., the physical-organic constraints. L.-D.S., M.D.B. and K.T. conceived the substrate-controlled cyclization studies described in Fig. 5. L.-D.S. carried out all synthesis and characterization for the substrate-controlled cyclization studies under supervision from M.D.B. and K.T. S.B. performed experiments under the supervision of J.M and B.A.G. L.G. helped with the identification and analysis of literature examples. M.M. developed the HopCat WebApp. B.A.G. conceived and supervised the project and wrote the paper with contributions of all co-authors.
Corresponding authors
Ethics declarations
Competing interests
The authors declare the following competing interests: T.K., W.B., B.M.-K., M.M., S.S. and B.A.G. are consultants and/or stakeholders of Allchemy, Inc. Allchemy software and its HopCat module are property of Allchemy, Inc., USA. All queries about access options to Allchemy, including academic collaborations, should be sent to saraszymkuc@allchemy.net.
Peer review
Peer review information
Nature thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 HopCat’s mechanistic analysis of a reaction yielding a fused tetrahydropyran.
An example of a problem not “seen” by the machine during training on 715 literature examples. In the original publication63, the authors focused on the double 1,5-H shifts as key steps and did not consider the full mechanism. HopCat’s calculations ran up to n = 4 generations and traced a complete and unique mechanistic pathway. This pathway starts with a series of carbonyl and allyl resonances placing positive charge at the position available for 1,5-H shift followed by 1,6-olefin exo cyclization. Subsequently, the sequence of 1,5-H shift and 1,6-olefin exo cyclisation steps is repeated to afford tetrahydropyran’s bicyclic scaffold. The last two mechanistic steps along the pathway are: (i) carbonyl resonance to form oxocarbenium species and (ii) elimination of Lewis acid yielding the final, quenched product. The software’s solution agrees with the partial mechanism postulated in the original publication. a, A screenshot showing a simplified network (without stereochemistry). In reality, the network was generated with full stereochemistry and comprised of ~28,000 nodes that cannot be clearly visualized as a miniature. b, Details of all mechanistic steps (for raw screenshots from HopCat, in traditional and atom-mapped visualization modalities, see Supplementary Section S5). Additional examples are also provided in Supplementary Section S5.
Extended Data Fig. 2 HopCat’s mechanistic analysis of a reaction leading to a tricyclic dienone.
HopCat solves another problem not “seen” in the 715 training set. The dienone is an intermediate used in the recent synthesis of curcusone diterpenes64. In the original publication, authors included a plausible arrow-pushing scheme of electron movements for the double deprotection-aldol sequence but did not support it with a more detailed mechanistic analysis. HopCat identifies the reaction’s product in G4 and proposes a plausible and unique mechanistic route. Starting from a carbocation generated via elimination of substrate’s tertiary alcohol (bottom row of the network), this intermediate undergoes two consecutive resonances (allyl and carbonyl) that result in the formation of an oxocarbenium cation. Subsequent retro oxa-cyclization followed by ring closure constructs a seven-membered, central ring of the molecule. The last two steps describe deprotection of the enol ether. Formation of the oxocarbenium cation via carbonyl resonance makes the alkyl group on the oxygen a good leaving group, enabling its subsequent elimination and formation of the final product. The overall movement of electrons is consistent with the one proposed by the authors. a, A screenshot showing the network comprised of ~2,000 nodes. b, Details of all mechanistic steps (for raw screenshots from HopCat, in traditional and atom-mapped visualization modalities, see Supplementary Section S5). Additional examples are also provided in Supplementary Section S5.
Extended Data Fig. 3 A contested and only recently resolved65 biosynthesis of spiroviolene relies on a macrocyclization step (1,11-olefin endo cyclization), which does not occur in abiotic set of carbocation transformations.
Identifying the mechanistic pathway for the biosynthesis of spiroviolene has proven a computationally challenging problem – in fact, the pathway was not found within G7 and expansions to higher generations exceeded computing power. Accordingly, we implemented a “mixed” strategy search in which 7 generations were expanded from the substrate in the forward direction and 6 generations from the product in the retrosynthetic direction (using “reversed” mechanistic rules). This strategy considerably reduces the computational cost as the number of nodes in two smaller networks, each propagated to n generations and with branching factor m, scales as 2mn vs. m2n for one forward network expanded to 2n generations (for n = m = 7, the difference is mn /2 ~ 400,000 times). The algorithm then searched for common node(s) in the two networks and, when they were found, was able to concatenate a 10-step route. a, HopCat’s screenshot showing a grossly simplified network generated by a mixed forward-retro search. In reality, the network comprised of 909,937 nodes that could not be clearly visualized as a miniature. HopCat’s shortest route is marked with purple lines and agrees with the recently revised pathway65. Also, in the same network, rearrangement sequences leading to three other natural products were found – phomopsene66 (red lines and frame), allokutznerene66 (orange) and variediene67 (green); b, Details of all mechanistic steps for spiroviolene’s mechanistic route. For raw HopCat’s screenshots of the sequences leading to all four natural products, in traditional and atom-mapped visualization modalities, see Supplementary Section S5). Note: Akin to Fig. 4 and Extended Data Figs. 1, 2, none of the biosyntheses shown in this figure were considered when extracting mechanistic steps from literature examples.
Extended Data Fig. 4 Theoretical studies of model H- and C-shifts.
a, System setup. For all unique configurations of substituents R1-R4 (-H and -Me were considered), atom X was dragged along distance vector r so as to simulate the shift. Initial geometry was chosen such that the C-X bond was approximately perpendicular to the plane of the carbocation. All trajectories were subsequently verified by visual inspection. b, H-shifts (X = H). Top three panels represent symmetric shifts (such that the orders of initial and resulting carbocations are the same), with the order of carbocation increasing from the left to the right. In the bottom row, two leftmost panels represent shifts in which the carbocation changes order by one, whereas the rightmost panel represents an extreme example of transition between first-order and tertiary carbocations. c, C-shifts (X = Me). Top three panels represent symmetric shifts (such that the orders of initial and resulting carbocations are the same), with the order of carbocation increasing from left to right. In the bottom row, two leftmost panels represent shifts in which the carbocation changes order by one, whereas the rightmost panel represents an extreme example of transition between first-order and tertiary carbocations. d, Theoretical studies of carbocation association process. Each curve represents the SCS-MP2/aug-cc-pVDZ energy profile with PCM model of water, modelling approach of four nucleophiles (formaldehyde, water, methanol and ethene) towards CH3+ along vector R (scheme inserted in the top left of the panel). e, Boxplot representing experimental stabilities of carbocations taken from38 with respect to the CH3+ cation. The data was grouped according to the order of a carbocation (number of non-hydrogen atoms directly connected to the formally charged carbon atom), showing the general trend in the stability: increasing the order of a carbocation lowers the energy, on average, by 10-20 kcal/mol.
Extended Data Fig. 5 General synthetic scheme for the preparation of the precursors employed in Fig. 5.
An alkyl bromide is converted into the corresponding organozinc reagent by sequential treatment with t-BuLi and ZnCl2. This reagent is then used in a Negishi coupling with a vinyl iodide bearing a protected alcohol group. The coupling product is then deprotected to give the free alcohol, and the corresponding acetate is prepared by acetylation of this alcohol.
Supplementary information
Supplementary Information
Supplementary Information sections 1–10, including Tables 1–3 and Figs. 1–256.
Supplementary Video 1
HopCat’s visual tutorial. Video illustrating key stages of setting up and executing a search in HopCat’s WebApp. The video complements HopCat’s written tutorial in Supplementary information section 1.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Klucznik, T., Syntrivanis, LD., Baś, S. et al. Computational prediction of complex cationic rearrangement outcomes. Nature 625, 508–515 (2024). https://doi.org/10.1038/s41586-023-06854-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-023-06854-3