Skip to main content
Log in

From mundane to surprising nonadditivity: drivers and impact on ML models

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

Nonadditivity (NA) in Structure-Activity and Structure-Property Relationship (SAR) data is a rare but very information rich phenomenon. It can indicate conformational flexibility, structural rearrangements, and errors in assay results and structural assignment. While purely ligand-based conformational causes of NA are rather well understood and mundane, other factors are less so and cause surprising NA that has a huge influence on SAR analysis and ML model performance. We here report a systematic analysis across a wide range of properties (20 on-target biological activities and 4 physicochemical ADME-related properties) to understand the frequency of various different phenomena that may lead to NA. A set of novel descriptors were developed to characterize double transformation cycles and identify trends in NA. Double transformation cycles were classified into “surprising” and “mundane” categories, with the majority being classed as mundane. We also examined commonalities among surprising cycles, finding LogP differences to have the most significant impact on NA. A distinct behavior of NA for on-target sets compared to ADME sets was observed. Finally, we show that machine learning models struggle with highly nonadditive data, indicating that a better understanding of NA is an important future research direction.

Graphical Abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The code for performing nonadditivity analysis is available at https://github.com/Roche/NonadditivityAnalysis. While the datasets analyzed in this study are proprietary and not publicly accessible, the PDE10 dataset is provided in the repository to illustrate usage.

Abbreviations

NA:

Nonadditivity

MMP:

Matched molecular pair

DTC:

Double transformation cycle

SAR:

Structure–activity relationship

ML:

Machine learning

RF:

Random forest

CB2:

Cannabinoid type 2

BACE1:

Beta-Secretase 1

PDE10:

Phosphodiesterase type 10

DPP4:

Dipeptidyl peptidase-4

MAGL:

Monoacylglycerol lipase

DDR1:

Discoidin Domain Receptor Tyrosine Kinase 1

ADME:

Absorption, Distribution, Metabolism and Excretion

GLYT1:

Glycine transporter type-1

ATX:

Autotaxin

SMN2:

Survival of motor neuron 2

AEP:

Asparagine endopeptidase

References

  1. Gogishvili D, Nittinger E, Margreitter C, Tyrchan C (2021) Nonadditivity in public and inhouse data: implications for drug design. J Cheminformatics 13:47. https://doi.org/10.1186/s13321-021-00525-z

    Article  CAS  Google Scholar 

  2. Biela A, Betz M, Heine A, Klebe G (2012) Water makes the difference: rearrangement of water solvation layer triggers non-additivity of functional group contributions in protein-ligand binding. ChemMedChem 7:1423–1434. https://doi.org/10.1002/cmdc.201200206

    Article  CAS  PubMed  Google Scholar 

  3. Kramer C, Fuchs JE, Liedl KR (2015) Strong nonadditivity as a key structure–activity relationship feature: distinguishing structural changes from assay artifacts. J Chem Inf Model 55:483–494. https://doi.org/10.1021/acs.jcim.5b00018

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Gomez L, Xu R, Sinko W et al (2018) Mathematical and Structural characterization of strong nonadditive structure–activity relationship caused by protein conformational changes. J Med Chem 61:7754–7766. https://doi.org/10.1021/acs.jmedchem.8b00713

    Article  CAS  PubMed  Google Scholar 

  5. Kramer C (2019) Nonadditivity Analysis. J Chem Inf Model 59:4034–4042. https://doi.org/10.1021/acs.jcim.9b00631

    Article  CAS  PubMed  Google Scholar 

  6. Krummenacher D, He W, Kuhn B et al (2023) Discovery of orally available and Brain Penetrant AEP inhibitors. J Med Chem 66:17026–17043. https://doi.org/10.1021/acs.jmedchem.3c01804

    Article  CAS  PubMed  Google Scholar 

  7. Hunziker D, Reinehr S, Palmhof M et al (2022) Synthesis, characterization, and in vivo evaluation of a novel potent autotaxin-inhibitor. Front Pharmacol 12

  8. Hilpert H, Guba W, Woltering TJ et al (2013) β-Secretase (BACE1) inhibitors with high in vivo efficacy suitable for clinical evaluation in Alzheimer’s Disease. J Med Chem 56:3980–3995. https://doi.org/10.1021/jm400225m

    Article  CAS  PubMed  Google Scholar 

  9. Nettekoven M, Adam J-M, Bendels S et al (2016) Novel triazolopyrimidine-derived cannabinoid receptor 2 agonists as potential treatment for inflammatory kidney diseases. ChemMedChem 11:179–189. https://doi.org/10.1002/cmdc.201500218

    Article  CAS  PubMed  Google Scholar 

  10. Richter H, Satz AL, Bedoucha M et al (2019) DNA-Encoded Library-Derived DDR1 inhibitor prevents fibrosis and renal function loss in a genetic mouse model of Alport Syndrome. ACS Chem Biol 14:37–49. https://doi.org/10.1021/acschembio.8b00866

    Article  CAS  PubMed  Google Scholar 

  11. Lübbers T, Böhringer M, Gobbi L et al (2007) 1,3-Disubstituted 4-aminopiperidines as useful tools in the optimization of the 2-aminobenzo[a]quinolizine dipeptidyl peptidase IV inhibitors. Bioorg Med Chem Lett 17:2966–2970. https://doi.org/10.1016/j.bmcl.2007.03.072

    Article  CAS  PubMed  Google Scholar 

  12. Pinard E, Alanine A, Alberati D et al (2010) Selective GlyT1 inhibitors: Discovery of [4-(3-Fluoro-5-trifluoromethylpyridin-2-yl)piperazin-1-yl][5-methanesulfonyl-2-((S)-2,2,2-trifluoro-1-methylethoxy)phenyl]methanone (RG1678), a Promising Novel Medicine to treat Schizophrenia. J Med Chem 53:4603–4614. https://doi.org/10.1021/jm100210p

    Article  CAS  PubMed  Google Scholar 

  13. Tosstorff A, Rudolph MG, Cole JC et al (2022) A high quality, industrial data set for binding affinity prediction: performance comparison in different early drug discovery scenarios. J Comput Aided Mol Des 36:753–765. https://doi.org/10.1007/s10822-022-00478-x

    Article  CAS  PubMed  Google Scholar 

  14. Ratni H, Karp GM, Weetall M et al (2016) Specific Correction of Alternative Survival Motor Neuron 2 splicing by small molecules: Discovery of a potential Novel Medicine to treat spinal muscular atrophy. J Med Chem 59:6086–6100. https://doi.org/10.1021/acs.jmedchem.6b00459

    Article  CAS  PubMed  Google Scholar 

  15. Alsenz J, Kansy M (2007) High throughput solubility measurement in drug discovery and development. Adv Drug Deliv Rev 59:546–567. https://doi.org/10.1016/j.addr.2007.05.007

    Article  CAS  PubMed  Google Scholar 

  16. Wagner B, Fischer H, Kansy M et al (2015) Carrier mediated distribution system (CAMDIS): a new approach for the measurement of octanol/water distribution coefficients. Eur J Pharm Sci 68:68–77. https://doi.org/10.1016/j.ejps.2014.12.009

    Article  CAS  PubMed  Google Scholar 

  17. Chen X, Murawski A, Patel K et al (2008) A Novel Design of Artificial membrane for improving the PAMPA Model. Pharm Res 25:1511–1520. https://doi.org/10.1007/s11095-007-9517-8

    Article  CAS  PubMed  Google Scholar 

  18. Wildman SA, Crippen GM (1999) Prediction of Physicochemical parameters by Atomic contributions. J Chem Inf Comput Sci 39:868–873. https://doi.org/10.1021/ci990307l

    Article  CAS  Google Scholar 

  19. Kramer C, Dahl G, Tyrchan C, Ulander J (2016) A comprehensive company database analysis of biological assay variability. Drug Discov Today 21:1213–1221. https://doi.org/10.1016/j.drudis.2016.03.015

    Article  PubMed  Google Scholar 

  20. Pedregosa F, Varoquaux G, Gramfort A et al Scikit-learn: machine learning in Python. Mach Learn PYTHON

  21. Xiong Z, Wang D, Liu X et al (2020) Pushing the boundaries of molecular representation for Drug Discovery with the graph attention mechanism. J Med Chem 63:8749–8760. https://doi.org/10.1021/acs.jmedchem.9b00959

    Article  CAS  PubMed  Google Scholar 

  22. Paszke A, Gross S, Massa F et al (2019) PyTorch: an imperative style, High-Performance Deep Learning Library. Advances in neural information Processing systems. Curran Associates, Inc

  23. RDKit Open-source cheminformatics

  24. Dalke A, Hert J, Kramer C (2018) J Chem Inf Model 58:902–910. https://doi.org/10.1021/acs.jcim.8b00173. mmpdb: An Open-Source Matched Molecular Pair Platform for Large Multiproperty Data Sets

  25. Leach AG, Pilling EA, Rabow AA et al (2012) Enantiomeric pairs reveal that key medicinal chemistry parameters vary more than simple physical property based models can explain. MedChemComm 3:528–540. https://doi.org/10.1039/C2MD20010D

    Article  CAS  Google Scholar 

  26. Hall LH, Kier LB (1991) The Molecular Connectivity Chi indexes and Kappa shape indexes in Structure-Property Modeling. Reviews in Computational Chemistry. Wiley, Ltd, pp 367–422

    Chapter  Google Scholar 

  27. Kwapien K, Nittinger E, He J et al (2022) Implications of Additivity and Nonadditivity for Machine Learning and Deep Learning models in Drug Design. ACS Omega 7:26573–26581. https://doi.org/10.1021/acsomega.2c02738

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Kuhn B, Mohr P, Stahl M (2010) Intramolecular Hydrogen Bonding in Medicinal Chemistry. J Med Chem 53:2601–2611. https://doi.org/10.1021/jm100087s

    Article  CAS  PubMed  Google Scholar 

  29. Veber DF, Johnson SR, Cheng H-Y et al (2002) Molecular properties that influence the oral bioavailability of drug candidates. J Med Chem 45:2615–2623. https://doi.org/10.1021/jm020017n

    Article  CAS  PubMed  Google Scholar 

  30. Diukendjieva A, Tsakovska I, Alov P et al (2019) Advances in the prediction of gastrointestinal absorption: quantitative structure-activity relationship (QSAR) modelling of PAMPA permeability. Comput Toxicol 10:51–59. https://doi.org/10.1016/j.comtox.2018.12.008

    Article  Google Scholar 

  31. Dossetter AG (2012) A matched molecular pair analysis of in vitro human microsomal metabolic stability measurements for methylene substitution or replacements – identification of those transforms more likely to have beneficial effects. MedChemComm 3:1518. https://doi.org/10.1039/c2md20226c

    Article  CAS  Google Scholar 

  32. van Tilborg D, Alenicheva A, Grisoni F (2022) Exposing the Limitations of Molecular Machine Learning with Activity cliffs. J Chem Inf Model. https://doi.org/10.1021/acs.jcim.2c01073

    Article  PubMed  PubMed Central  Google Scholar 

  33. Tamura S, Miyao T, Bajorath J (2023) Large-scale prediction of activity cliffs using machine and deep learning methods of increasing complexity. J Cheminformatics 15:4. https://doi.org/10.1186/s13321-022-00676-7

    Article  Google Scholar 

  34. Sheridan RP (2015) The relative importance of Domain Applicability Metrics for estimating prediction errors in QSAR Varies with Training Set Diversity. J Chem Inf Model 55:1098–1107. https://doi.org/10.1021/acs.jcim.5b00110

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We would like to acknowledge each project team at Roche that contributed to the generation of the on-target data sets analyzed in this study, as well as to Björn Wagner and Kenichi Umehara for their contributions to the ADME data sets. We thank Michael Reutlinger for providing valued input and code for ML models.

Author information

Authors and Affiliations

Authors

Contributions

L.G. and N.M. are shared first authors. N.M. wrote the code and performed NA analysis. L.G. and C.K. supervised the study and wrote the paper. J.G.C. provided input to the design of the study and wrote the paper.

Corresponding author

Correspondence to Laura Guasch.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Laura Guasch and Niels Maeder are co-first authors.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guasch, L., Maeder, N., Cumming, J.G. et al. From mundane to surprising nonadditivity: drivers and impact on ML models. J Comput Aided Mol Des 38, 26 (2024). https://doi.org/10.1007/s10822-024-00566-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10822-024-00566-0

Keywords

Navigation