Abstract
Coronary artery calcium (CAC) is a powerful tool to refine atherosclerotic cardiovascular disease (ASCVD) risk assessment. Despite its growing interest, contemporary public attitudes around CAC are not well-described in literature and have important implications for shared decision-making around cardiovascular prevention. We used an artificial intelligence (AI) pipeline consisting of a semi-supervised natural language processing model and unsupervised machine learning techniques to analyze 5,606 CAC-related discussions on Reddit. A total of 91 discussion topics were identified and were classified into 14 overarching thematic groups. These included the strong impact of CAC on therapeutic decision-making, ongoing non-evidence-based use of CAC testing, and the patient perceived downsides of CAC testing (e.g., radiation risk). Sentiment analysis also revealed that most discussions had a neutral (49.5%) or negative (48.4%) sentiment. The results of this study demonstrate the potential of an AI-based approach to analyze large, publicly available social media data to generate insights into public perceptions about CAC, which may help guide strategies to improve shared decision-making around ASCVD management and public health interventions.
Similar content being viewed by others
Atherosclerotic cardiovascular disease (ASCVD) remains the leading cause of death in the United States1. Earlier identification and intervention of ASCVD is critical for reducing its morbidity and mortality, as over a third of all ASCVD deaths occur in individuals with no prior symptoms1. Detection of coronary artery calcification (CAC) by a specialized computed tomography (CT) scan (“CAC scan”) can help guide patient and clinicians on shared-decision making around cardiovascular risk assessment2. As such, CAC scans are endorsed by multiple medical societies as power tools for personalizing cardiovascular risk and preventive therapy recommendations3,4. CAC may also be a strong motivator for improving health behaviors, including lifestyle changes and adherence to preventive therapies like statins5,6.
While public interest in CAC has grown over time, current public perceptions about CAC are not well-described7. Understanding these beliefs about CAC is critical, as it may help frame shared decision-making discussions and guide public health interventions around ASCVD. Artificial intelligence (AI)-enabled analysis of large volumes of social media data can provide an efficient approach for analyzing contemporary public opinions on common health-related topics and allow for a systematic evaluation of emerging themes8. Reddit is a free and widely used social media platform with over 52 million daily active users and over 30 billion views every month9. In this study, we leverage an artificial intelligence pipeline using natural language processing and unsupervised learning to characterize real-world perceptions about CAC using discussions on Reddit.
We extracted a total of 5606 unique CAC-related discussions (1017 posts, 4589 comments) from 3545 unique users across 990 subreddits from March 29, 2008, through May 21, 2023 (Supplementary Fig. 1). The largest number of discussions from a single author was 26, while 3463 (97.7%) authors contributed less than six discussions each. The subreddits with the most discussions were r/keto (7.5%), r/Cholesterol (7.0%), and r/AskDocs (5.8% of all discussions). The number of CAC-related discussions increased by an average of 57.2% yearly. Using a pretrained, sentence-level Bidirectional Encoder Representations from Transformers (BERT) model, we embedded these discussions into a vectorized language space, in which they were further dimensionally reduced and clustered to identify a total of 91 topics (Fig. 1). These topics were further clustered to identify 14 overarching groups. The largest topics and groups centered around CAC testing to evaluate symptoms (e.g., palpitations, chest pain, and anxiety) and de-risking non-ischemic cardiovascular disease (groups 1, 5); interpreting CAC scores in the context of lifestyle and lipid results (groups 2, 4, 8, and 10); and the disadvantages of CAC testing (e.g., financial cost, radiation risk) (Table 1). Other notable topics included indications for CAC testing (e.g., topics 10, 27, 42, 48), CAC and statins (e.g., topics 24, 31, 34), ketogenic diets can affect CAC (e.g., topics 16, 19), radiation exposure risk (e.g., topics 22, 45), insurance issues (e.g., topics 29, 30, 37, 50), and celebrities with CAC (e.g., topics 43, 55, 56). A separate pretrained BERT model was used to analyze the sentiment of each discussion, uncovering that 49.5% of discussions were neutral, 48.4% were negative, and 2.1% were positive. The average sentiment of all discussions remained stably neutral-to-negative (−0.42 – −0.50) each year from 2013 through 2023 (Supplementary Fig. 2).
Our AI-enabled analysis of public perceptions of CAC testing demonstrates how well our previously described algorithm for topic modeling generalizes to another clinical domain8. A powerful aspect of our pipeline is leveraging techniques in unsupervised machine learning that obviate the need for topic prespecification, which allows discovery of previously unexpected ideas (e.g., non-evidence-based use of CAC). Such topic modeling analyses can also provide clinical insights that may be further explored to test generated hypotheses. By harnessing the power of AI on pre-existing datasets, we demonstrate a fast, inexpensive method of gathering public opinions that would otherwise require time- and finance-intensive clinical registries and user surveys to collect. Through this efficient extraction and interpretation of large volumes of social media data, AI also offers the ability to continuously evaluate public sentiment over time, monitor for emerging topics, and stream clinical insights to key stakeholders that could impact clinical care.
Our study revealed several noteworthy insights about public perceptions around CAC testing. First, CAC testing had a strong impact on therapeutic decision-making. Many discussions emphasized the power of a CAC score of zero as way of de-risking individuals and avoiding statin therapy. While a CAC-based de-escalation strategy is supported by practice guidelines, the presence of other risk-enhancing lifestyle or clinical factors (e.g., diabetes) may affect these decisions10. Conversely, many discussions where a non-zero CAC was noted demonstrated how these findings helped motivate lifestyle changes. Ultimately, CAC interpretation is nuanced, and our study highlights that public discussions around interpretation of CAC results may not always be guideline-concordant, underscoring the need for patient and clinician shared-decision making.
Second, there were several discussions surrounding non-evidence-based uses of CAC testing, including for evaluation of patients with cardiac symptoms, such as chest pain and palpitations. This may be discordant with current clinical guidelines, which endorse the use of CAC testing in primary prevention among asymptomatic patients, particularly those with intermediate ASCVD risk3. Many discussions also misattributed the negative predictive value of a CAC scan to evaluate non-specific symptoms typically not related to ASCVD risk assessment, which may further misrepresent the current indications for CAC to the public. Future work may focus on evaluating the dynamics of how such misinformation can be amplified in social media frameworks and ultimately help determine optimal strategies for containing their spread.
Third, we identified discussions regarding the disadvantages of CAC testing, including out-of-pocket costs due to lack of insurance coverage and radiation exposure. However, many individuals still found value in CAC testing despite costs and radiation. The cost-effectiveness of CAC has been reported elsewhere in the literature11. Although the radiation risk associated with CAC testing is minimal, similar to ambient radiation from living in large cities12, our work identified that patients may be concerned about this risk when deciding to pursue CAC testing.
Finally, we found that the sentiment around CAC-related discussions was mostly neutral-to-negative. This is consistent with prior studies evaluating healthcare discussions on Reddit, which identify a negative tone and expressions of sadness, fear, and anger that is believed to reflect the underlying patient experience in a complex healthcare environment13. This negativity bias is well reported in the media and can impact health outcomes14, suggesting the importance of public health efforts to moderate misinformation15.
This study should be interpreted in context of its limitations. Discussions in this study reflect views of Reddit users, who have historically been younger and may not be broadly representative of patients at high risk of ASCVD16; however, CAC testing is most appropriate for lower and intermediate risk individuals. While a variety of search terms were used, this dataset may not capture all CAC-related discussions on Reddit if individuals use other terms to refer to CAC. Clustering techniques we employed may reflect linguistic concordance to determine similarity rather than clinical concordance, which may lead to seemingly redundant topics and groups. This limitation highlights how AI can augment, but not replace, researchers in analyzing large datasets, and opens the door to consider how more advanced NLP techniques, like large language models, can improve this pipeline.
In this AI-enabled qualitative study of discussions on Reddit, we identified contemporary public perceptions and sentiments around CAC, which included the impact of CAC on therapeutic decision-making, non-evidence-based use of CAC testing, and the perceived downsides of CAC testing. The themes uncovered from this study highlight potential areas of patient concern and misinformation that can be addressed to improve shared decision-making around ASCVD management, improve statin adherence rates, and reduce ASCVD morbidity and mortality.
Methods
Dataset
Reddit (www.reddit.com) was used as the data source for this study17. It is composed of communities called ‘subreddits’ which are prefixed by “r/” and are focused on specific topics (e.g., r/AskDoctors, r/WorldNews, r/Keto). Users may interact with the platform by creating a “post” to initiate a new discussion thread and by commenting on other users’ posts as part of discussions (“comments”). Most subreddits, including all posts and comments contained within them, are openly accessible and visible without having to create a Reddit user account.
To create a list of CAC-related discussions from Reddit, an Application Programming Interface (API) called PushShift was used to search all the posts and comments on Reddit for case-insensitive matching on the following commonly used terms for CAC scans: “coronary artery calcium”, “coronary calcium”, “cac score”, “calcium score”, and “heart scan”7,18.
This study was deemed exempt from ethical review since it did not involve human subjects as defined in 45 United States’ Code of Federal Regulations (CFR) 46.102(f) or 21 CFR 50.3(g).
Data analysis
Details around topic modeling and sentiment analysis in this paper are described elsewhere8. Briefly, after preprocessing, discussions are embedded into a numerical representation using a pretrained, sentence-level Bidirectional Encoder Representations from Transformers (BERT) model called all-MiniLM-L6-v219, which has been trained on over 600 million Reddit posts and a dataset containing over 12 million papers from medical journals. This embedding was then simplified into a smaller representation using the Uniform Mapping Approximation and Projection algorithm to improve clustering performance into topics using Spectral Clustering. Since topics may be similar in content but be differentiated by other embedded features from the model (e.g., linguistic style, tone), a subsequent clustering analysis was performed to find overarching themes of discussion (“groups”). The number of topics and groups were automatically determined based on optimizing the Silhouette Coefficient and Davies-Bouldin Index, which are mathematical measures of how similar discussions are within a cluster relative to how similar those discussions are to those in other clusters. A separate BERT model, RoBERTa, pretrained on social media posts, was used to classify sentiment (i.e., “positive”, “neutral’, or “negative” classification of text).
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
The data used in this manuscript are available at https://github.com/sssomani/cac_reddit.
Code availability
The code used in this manuscript is available at https://github.com/sssomani/cac_reddit.
References
Tsao, C. W. et al. Heart disease and stroke statistics-2023 update: a report from the American heart association. Circulation 147, e93–e621 (2023).
Greenland, P. & Lloyd-Jones, D. M. Role of coronary artery calcium testing for risk assessment in primary prevention of atherosclerotic cardiovascular disease. JAMA Cardiol. 7, 219 (2022).
Grundy, S. M. et al. 2018 AHA/ACC/AACVPR/AAPA/ABC/ACPM/ADA/AGS/APhA/ASPC/NLA/PCNA guideline on the management of blood cholesterol: a report of the American college of cardiology/American heart association task force on clinical practice guidelines. Circulation 139, e1082–e1143 (2019).
Golub, I. S. et al. Major global coronary artery calcium guidelines. JACC Cardiovasc. Imaging 16, 98–117 (2023).
Sandhu, A. T. et al. Incidental coronary artery calcium: opportunistic screening of previous nongated chest computed tomography scans to improve statin rates (NOTIFY-1 project). Circulation 147, 703–714 (2023).
Muhlestein, J. B. et al. Effect on patient adherence to primary prevention recommendations for statin therapy based on the national guidelines-supported pooled cohort risk equation or a coronary artery calcium score: preliminary findings from the vanguard study for the corcal randomized clinical outcomes trial. J. Am. Coll. Cardiol. 75, 5 (2020).
Dzaye, O. et al. Temporal trends and interest in coronary artery calcium scoring over time: an infodemiology study. Mayo Clin. Proc. Innov. Qual. Outcomes 5, 456–465 (2021).
Somani, S., van Buchem, M. M., Sarraju, A., Hernandez-Boussard, T. & Rodriguez, F. Artificial intelligence-enabled analysis of statin-related topics and sentiments on social media. JAMA Netw. Open. 6, e239747 (2023).
Curry, D. Reddit Revenue and Usage Statistics (2023) https://www.businessofapps.com/data/reddit-statistics/ (2020).
Patel, J. et al. Assessment of coronary artery calcium scoring to guide statin therapy allocation according to risk-enhancing factors. JAMA Cardiol. 6, 1161 (2021).
Venkataraman, P. et al. Cost-effectiveness of coronary artery calcium scoring in people with a family history of coronary disease. JACC Cardiovasc. Imaging 14, 1206–1217 (2021).
Gerber, T. C. & Gibbons, R. J. Weighing the risks and benefits of cardiac imaging with ionizing radiation. JACC Cardiovasc. Imaging 3, 528–535 (2010).
Maleki, N., Padmanabhan, B. & Dutta, K. The effect of monetary incentives on health care social media content: study based on topic modeling and sentiment analysis. J. Med. Internet Res. 25, e44307 (2023).
Indremo, M., Jodensvi, A. C., Arinell, H., Isaksson, J. & Papadopoulos, F. C. Association of media coverage on transgender health with referrals to child and adolescent gender identity clinics in Sweden. JAMA Netw. Open. 5, e2146531 (2022).
Trethewey, S. P. Medical misinformation on social media. Circulation 140, 1131–1133 (2019).
Stocking, G., Holcomb, J. & Mitchell, A. 1. Reddit News Users More Likely to be Male, Young and Digital in Their News Preferences https://www.pewresearch.org/journalism/2016/02/25/reddit-news-users-more-likely-to-be-male-young-and-digital-in-their-news-preferences/ (2016).
Reddit. Dive Into Anything https://www.reddit.com (2023).
Rodriguez, F. et al. Readability of online patient educational materials for coronary artery calcium scans and implications for health disparities. J. Am. Heart Assoc. 9, e017372 (2020).
Hugging Face. Sentence-Transformers/all-MiniLM-L6-v2 https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 (2023).
Acknowledgements
Dr. Rodriguez was funded by grants from the NIH National Heart, Lung, and Blood Institute (1K01HL144607; R01HL168188), the American Heart Association/Harold Amos Faculty Development program, and the Doris Duke Charitable Foundation (Grant #2022051). These funding organizations played no role in study design, data analysis, manuscript preparation, or the decision to submit for publication.
Author information
Authors and Affiliations
Contributions
S.S. and F.R. conceived and designed the study. S.S. acquired the data. S.S., S.B., A.W.P. and S.S.J. helped analyze the data. S.S. drafted the manuscript. All authors reviewed the data and revised the manuscript.
Corresponding author
Ethics declarations
Competing interests
F.R. reports consulting relationships with Healthpals, Novartis, NovoNordisk (CEC), Movano Health, Esperion Therapeutics, Kento Health, Inclusive Health, Arrowhead Pharmaceuticals, HeartFlow, and Edwards outside the submitted work. The remaining authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Somani, S., Balla, S., Peng, A.W. et al. Contemporary attitudes and beliefs on coronary artery calcium from social media using artificial intelligence. npj Digit. Med. 7, 83 (2024). https://doi.org/10.1038/s41746-024-01077-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41746-024-01077-w
This article is cited by
-
Artificial Intelligence in Cardiovascular Disease Prevention: Is it Ready for Prime Time?
Current Atherosclerosis Reports (2024)