skip to main content
research-article
Open access

Neural Mixed Effects for Nonlinear Personalized Predictions

Published: 09 October 2023 Publication History

Abstract

Personalized prediction is a machine learning approach that predicts a person’s future observations based on their past labeled observations and is typically used for sequential tasks, e.g., to predict daily mood ratings. When making personalized predictions, a model can combine two types of trends: (a) trends shared across people, i.e., person-generic trends, such as being happier on weekends, and (b) unique trends for each person, i.e., person-specific trends, such as a stressful weekly meeting. Mixed effect models are popular statistical models to study both trends by combining person-generic and person-specific parameters. Though linear mixed effect models are gaining popularity in machine learning by integrating them with neural networks, these integrations are currently limited to linear person-specific parameters: ruling out nonlinear person-specific trends. In this paper, we propose Neural Mixed Effect (NME) models to optimize nonlinear person-specific parameters anywhere in a neural network in a scalable manner1. NME combines the efficiency of neural network optimization with nonlinear mixed effects modeling. Empirically, we observe that NME improves performance across six unimodal and multimodal datasets, including a smartphone dataset to predict daily mood and a mother-adolescent dataset to predict affective state sequences where half the mothers experience symptoms of depression. Furthermore, we evaluate NME for two model architectures, including for neural conditional random fields (CRF) to predict affective state sequences where the CRF learns nonlinear person-specific temporal transitions between affective states. Analysis of these person-specific transitions on the mother-adolescent dataset shows interpretable trends related to the mother’s depression symptoms.

Supplemental Material

PDF File
Appendix

References

[1]
Randy P Auerbach, Ranqing Lan, Hanga Galfalvy, Kira lqueza, Jeffrey F Cohn, Ryan Crowley, Katherine Durham, Karla Joyce, Lauren E Kahn, Rahil Kamath, Louis-Philippe Morency, Giovanna Porta, Apoorva Srinivasan, Jamie Zelazny, David A Brent, and Nicholas B Allen. 2023. Intensive Longitudinal Assessment of Adolescents to Predict Suicidal Thoughts and Behaviors. Journal of the American Academy of Child and Adolescent Psychiatry (2023).
[2]
Randy P Auerbach, Apoorva Srinivasan, Jaclyn S Kirshenbaum, J John Mann, and Stewart A Shankman. 2022. Geolocation features differentiate healthy from remitted depressed adults.Journal of psychopathology and clinical science 131, 4 (2022), 341.
[3]
Tadas Baltrusaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency. 2018. Openface 2.0: Facial behavior analysis toolkit. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 59–66.
[4]
Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. 2015. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software 67, 1 (2015), 1–48. https://doi.org/10.18637/jss.v067.i01
[5]
John Binder, Kevin Murphy, and Stuart Russell. 1997. Space-efficient inference in dynamic probabilistic networks. Bclr 1 (1997), t1.
[6]
Roel Bosker and Tom AB Snijders. 2011. Multilevel analysis: An introduction to basic and advanced multilevel modeling. Multilevel analysis (2011), 1–368.
[7]
Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation 42, 4 (2008), 335–359.
[8]
Rich Caruana. 1997. Multitask learning. Machine learning 28 (1997), 41–75.
[9]
E Comets, A Lavenu, and M Lavielle. 2011. SAEMIX, an R version of the SAEM algorithm. 20th meeting of the Population Approach Group in Europe, Athens, Greece (2011).
[10]
Joao FG de Freitas, Mahesan Niranjan, Andrew H. Gee, and Arnaud Doucet. 2000. Sequential Monte Carlo methods to train neural network models. Neural computation 12, 4 (2000), 955–993.
[11]
Bernard Delyon, Marc Lavielle, and Eric Moulines. 1999. Convergence of a stochastic approximation version of the EM algorithm. Annals of statistics (1999), 94–128.
[12]
Greg Durrett and Dan Klein. 2015. Neural CRF Parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 302–312.
[13]
Theodoros Evgeniou and Massimiliano Pontil. 2004. Regularized multi–task learning. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. 109–117.
[14]
Florian Eyben, Klaus Scherer, Björn Schuller, Johan Sundberg, Elisabeth André, Carlos Busso, Laurence Devillers, Julien Epps, Petri Laukka, Shrikanth Narayanan, and Khiet Phuong Truong. 2016. The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing. IEEE transactions on affective computing 7, 2 (4 2016), 190–202. https://doi.org/10.1109/TAFFC.2015.2457417 Open access.
[15]
Hyman Hops, Betsy Davis, and Nancy Longoria. 1995. Methodological issues in direct observation: Illustrations with the Living in Familial Environments (LIFE) coding system. Journal of Clinical Child Psychology 24, 2 (1995), 193–203.
[16]
Clayton Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the international AAAI conference on web and social media, Vol. 8. 216–225.
[17]
Nicholas C Jacobson and Yeon Joo Chung. 2020. Passive sensing of prediction of moment-to-moment depressed mood among undergraduates with clinical levels of depression sample using smartphones. Sensors 20, 12 (2020), 3572.
[18]
Belhal Karimi, Marc Lavielle, and Eric Moulines. 2020. f-SAEM: A fast Stochastic Approximation of the EM algorithm for nonlinear mixed effects models. Computational Statistics & Data Analysis 141 (2020), 123–138.
[19]
Pascal Kilian, Sangbeak Ye, and Augustin Kelava. 2023. Mixed effects in machine learning – A flexible mixedML framework to add random effects to supervised machine learning regression. Transactions on Machine Learning Research (2023). https://openreview.net/forum?id=MKZyHtmfwH
[20]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, Yoshua Bengio and Yann LeCun (Eds.).
[21]
Michele Knox, Cheryl King, Gregory L Hanna, Deirdre Logan, and Neera Ghaziuddin. 2000. Aggressive behavior in clinically depressed adolescents. Journal of the American Academy of Child & Adolescent Psychiatry 39, 5 (2000), 611–618.
[22]
John Lafferty, Andrew McCallum, and Fernando CN Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. (2001).
[23]
Nan M Laird and James H Ware. 1982. Random-effects models for longitudinal data. Biometrics (1982), 963–974.
[24]
I Lawrence and Kuei Lin. 1989. A concordance correlation coefficient to evaluate reproducibility. Biometrics (1989), 255–268.
[25]
Joshua J Levy, Carly A Bobak, Mustafa Nasir-Moin, Eren M Veziroglu, Scott M Palisoul, Rachael E Barney, Lucas A Salas, Brock C Christensen, Gregory J Tsongalis, and Louis J Vaickus. 2021. Mixed Effects Machine Learning Models for Colon Cancer Metastasis Prediction using Spatially Localized Immuno-Oncology Markers. In PACIFIC SYMPOSIUM ON BIOCOMPUTING 2022. World Scientific, 175–186.
[26]
Robert A Lewis, Asma Ghandeharioun, Szymon Fedor, Paola Pedrelli, Rosalind Picard, and David Mischoulon. 2023. Mixed Effects Random Forests for Personalised Predictions of Clinical Depression Severity. arXiv preprint arXiv:2301.09815 (2023).
[27]
Paul Pu Liang, Terrance Liu, Anna Cai, Michal Muszynski, Ryo Ishii, Nick Allen, Randy Auerbach, David Brent, Ruslan Salakhutdinov, and Louis-Philippe Morency. 2021. Learning Language and Multimodal Privacy-Preserving Markers of Mood from Mobile Data. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 4170–4187. https://doi.org/10.18653/v1/2021.acl-long.322
[28]
Mary J Lindstrom and Douglas M Bates. 1988. Newton—Raphson and EM algorithms for linear mixed-effects models for repeated-measures data. J. Amer. Statist. Assoc. 83, 404 (1988), 1014–1022.
[29]
Mary J Lindstrom and Douglas M Bates. 1990. Nonlinear mixed effects models for repeated measures data. Biometrics (1990), 673–687.
[30]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2020. RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://openreview.net/forum?id=SyxS0T4tvS.
[31]
Francesca Mandel, Riddhi Pratim Ghosh, and Ian Barnett. 2021. Neural networks for clustered and longitudinal data using mixed effects models. Biometrics (2021).
[32]
Thomas Mock. 2022. Tidy Tuesday: A weekly data project aimed at the R ecosystem. https://github.com/rfordatascience/tidytuesday
[33]
Nuno Moniz and Luís Torgo. 2018. Multi-source social feedback of online news feeds. arXiv preprint arXiv:1801.07055 (2018).
[34]
Benjamin W Nelson, Lisa Sheeber, Jennifer Pfeifer, and Nicholas B Allen. 2021. Psychobiological markers of allostatic load in depressed and nondepressed mothers and their adolescent offspring. Journal of Child Psychology and Psychiatry 62, 2 (2021), 199–211.
[35]
Che Ngufor, Holly Van Houten, Brian S Caffo, Nilay D Shah, and Rozalina G McCoy. 2019. Mixed effect machine learning: A framework for predicting longitudinal change in hemoglobin A1c. Journal of biomedical informatics 89 (2019), 56–67.
[36]
Joel S Owen and Jill Fiedler-Kelly. 2014. Introduction to population pharmacokinetic/pharmacodynamic analysis with nonlinear mixed effects models. John Wiley & Sons.
[37]
Theodore Papamarkou, Jacob Hinkle, M Todd Young, and David Womble. 2022. Challenges in Markov chain Monte Carlo for Bayesian neural networks. Statist. Sci. 37, 3 (2022), 425–442.
[38]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
[39]
Paola Pedrelli, Szymon Fedor, Asma Ghandeharioun, Esther Howe, Dawn F Ionescu, Darian Bhathena, Lauren B Fisher, Cristina Cusin, Maren Nyer, Albert Yeung, 2020. Monitoring changes in depression severity using wearable and mobile sensors. Frontiers in psychiatry 11 (2020), 584711.
[40]
James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. 2015. The development and psychometric properties of LIWC2015. Technical Report. University of Texas at Austin.
[41]
Niklas Pfister, Peter Bühlmann, Bernhard Schölkopf, and Jonas Peters. 2018. Kernel-based tests for joint independence. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 80, 1 (2018), 5–31.
[42]
José C Pinheiro and Douglas M Bates. 1995. Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of computational and Graphical Statistics 4, 1 (1995), 12–35.
[43]
José C Pinheiro and Douglas M Bates. 2000. Linear mixed-effects models: basic concepts and examples. Mixed-effects models in S and S-Plus (2000), 3–56.
[44]
Abhishek Pratap, David C Atkins, Brenna N Renn, Michael J Tanana, Sean D Mooney, Joaquin A Anguera, and Patricia A Areán. 2019. The accuracy of passive phone sensors in predicting daily mood. Depression and anxiety 36, 1 (2019), 72–81.
[45]
Shiquan Ren, Hong Lai, Wenjing Tong, Mostafa Aminzadeh, Xuezhang Hou, and Shenghan Lai. 2010. Nonparametric bootstrapping for hierarchical data. Journal of Applied Statistics 37, 9 (2010), 1487–1498.
[46]
Orli S Schwartz, Michelle L Byrne, Julian G Simmons, Sarah Whittle, Paul Dudgeon, Marie BH Yap, Lisa B Sheeber, and Nicholas B Allen. 2014. Parenting during early adolescence and adolescent-onset major depression: A 6-year prospective longitudinal study. Clinical Psychological Science 2, 3 (2014), 272–286.
[47]
Orli S Schwartz, Paul Dudgeon, Lisa B Sheeber, Marie BH Yap, Julian G Simmons, and Nicholas B Allen. 2011. Observed maternal responses to adolescent behaviour predict the onset of major depression. Behaviour research and therapy 49, 5 (2011), 331–338.
[48]
Jun Shi, Chengming Jiang, Aman Gupta, Mingzhou Zhou, Yunbo Ouyang, Qiang Charles Xiao, Qingquan Song, Yi (Alice) Wu, Haichao Wei, and Huiji Gao. 2022. Generalized Deep Mixed Models. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Washington DC, USA) (KDD ’22). Association for Computing Machinery, New York, NY, USA, 3869–3877. https://doi.org/10.1145/3534678.3539103
[49]
Giora Simchoni and Saharon Rosset. 2023. Integrating Random Effects in Deep Neural Networks. Journal of Machine Learning Research 24, 156 (2023), 1–57. http://jmlr.org/papers/v24/22-0501.html
[50]
Edward H Simpson. 1951. The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society: Series B (Methodological) 13, 2 (1951), 238–241.
[51]
Siyang Song, Zilong Shao, Shashank Jaiswal, Linlin Shen, Michel Valstar, and Hatice Gunes. 2022. Learning person-specific cognition from facial reactions for automatic personality recognition. IEEE Transactions on Affective Computing (2022).
[52]
Reeti Tandon, Sudeshna Adak, and Jeffrey A Kaye. 2006. Neural networks for longitudinal studies in Alzheimer’s disease. Artificial intelligence in medicine 36, 3 (2006), 245–255.
[53]
Sara Taylor, Natasha Jaques, Ehimwenma Nosakhare, Akane Sano, and Rosalind Picard. 2017. Personalized multitask learning for predicting tomorrow’s mood, stress, and health. IEEE Transactions on Affective Computing 11, 2 (2017), 200–213.
[54]
Minh-Ngoc Tran, Nghia Nguyen, David Nott, and Robert Kohn. 2017. Random Effects Models with Deep Neural Network Basis Functions: Methodology and Computation. Technical Report. University of Sydney Business School.
[55]
Russ Wolfinger. 1993. Covariance structure selection in general mixed models. Communications in statistics-Simulation and computation 22, 4 (1993), 1079–1106.
[56]
Torsten Wörtwein, Lisa B Sheeber, Nicholas Allen, Jeffrey F Cohn, and Louis-Philippe Morency. 2021. Human-Guided Modality Informativeness for Affective States. In Proceedings of the 2021 International Conference on Multimodal Interaction. 728–734.
[57]
Wrandrall. 2021. IMDB New Dataset. https://www.kaggle.com/datasets/wrandrall/imdb-new-dataset
[58]
Torsten Wörtwein, Lisa Sheeber, Nicholas Allen, Jeffrey Cohn, and Louis-Philippe Morency. 2022. Beyond Additive Fusion: Learning Non-Additive Multimodal Interactions. In Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 4681–4696. https://aclanthology.org/2022.findings-emnlp.344
[59]
Yunyang Xiong, Hyunwoo J Kim, and Vikas Singh. 2019. Mixed effects neural networks (menets) with applications to gaze estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7743–7752.

Cited By

View all
  • (2024)Momentary Depression-Severity Prediction in Acutely Depressed Patients undergoing Sleep Deprivation Therapy: Speech-based Machine Learning (Preprint)JMIR Mental Health10.2196/64578Online publication date: 20-Jul-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction
October 2023
858 pages
ISBN:9798400700552
DOI:10.1145/3577190
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 October 2023

Check for updates

Author Tags

  1. affective computing
  2. machine learning
  3. mixed effect models
  4. neural networks
  5. personalization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICMI '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)470
  • Downloads (Last 6 weeks)61
Reflects downloads up to 19 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Momentary Depression-Severity Prediction in Acutely Depressed Patients undergoing Sleep Deprivation Therapy: Speech-based Machine Learning (Preprint)JMIR Mental Health10.2196/64578Online publication date: 20-Jul-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media