Abstract
Clinical phenotyping is often a foundational requirement for obtaining datasets necessary for the development of digital health applications. Traditionally done via manual abstraction, this task is often a bottleneck in development due to time and cost requirements, therefore raising significant interest in accomplishing this task via in-silico means. Nevertheless, current in-silico phenotyping development tends to be focused on a single phenotyping task resulting in a dearth of reusable tools supporting cross-task generalizable in-silico phenotyping. In addition, in-silico phenotyping remains largely inaccessible for a substantial portion of potentially interested users. Here, we highlight the barriers to the usage of in-silico phenotyping and potential solutions in the form of a framework of several desiderata as observed during our implementation of such tasks. In addition, we introduce an example implementation of said framework as a software application, with a focus on ease of adoption, cross-task reusability, and facilitating the clinical phenotyping algorithm development process.
Similar content being viewed by others
Introduction
The rapid proliferation of the Electronic Health Record (EHR) and the associated availability of voluminous digitized clinical data has led to tremendous interest in the development of digital health applications. Crucial to this is the ability to subset patients using clinical inclusion and exclusion criteria: commonly referred to as clinical phenotyping, patient screening, or cohort retrieval1,2 (see Fig. 1). Traditionally conducted manually, there has been great interest in accelerating phenotyping via in-silico means3,4. Cross-task generalizable solutions for in-silico phenotyping, however, are not widespread5.
In this work, we introduce Intelligent Machine for Patient Accrual and Classification Tasks (IMPACT), a framework and an example implementation highlighting desiderata for accessible and re-usable in-silico phenotyping tools as observed through our efforts in delivering in-silico phenotyping solutions.
The IMPACT framework for accessible in-silico clinical phenotyping
Variations in task-specific factors such as complexity, required information, and desired results6 have hindered implementation of task-generalizable phenotyping solutions7,8. Here, we present several desiderata for in-silico phenotyping tools, as well as existing approaches, where applicable.
Desideratum I: Be infrastructure-flexible and scalable
Adapting software products is generally easier than switching computing infrastructure, necessitating flexibility in data inputs/outputs and computing infrastructure. This can be accomplished through built-in support for various popular setups, for both data repository type (e.g., SQL, Elasticsearch9, MongoDB10, BigQuery11, Fast Health Interoperability Resources (FHIR)12 datastores) and model (e.g., Observational Medical Outcomes Partnership (OMOP)13 and PCORnet14 Common Data Models (CDMs)).
In addition, tools must be scalable as it would otherwise be unfeasible to run phenotyping across largescale datasets without significant engineering effort/time, particularly when involving data sources requiring natural language processing (NLP) or image processing to extract clinical information.
Desideratum II: Support both ranked score and boolean retrieval schemes
Determining patient classification as a boolean true/false may not always be ideal. Instead, score-based ranking on closeness of match may be appropriate15, particularly during algorithm refinement due to missing evidence (e.g., relevant information not present in data sources used). Boolean retrieval, where patients are classified as either fully matching or not matching a given phenotype, fails to produce results when missing evidence is present. Conversely, ranked retrieval will surface patients that may be missing only a subset of the criteria for further review. Boolean retrieval, however, may still be appropriate once an algorithm matures (e.g., for large-scale cohort accrual), necessitating support for both retrieval modes.
Clinical CDMs such as OMOP13 and PCORnet14 possess boolean retrieval capabilities. Ranked-based retrieval, however, is relatively less prevalent, and approaches focus on unstructured text. Examples of such efforts include the Electronic Medical Record Search Engine (EMERSE)16 and Cohort Retrieval Enhanced by the Analysis of TExt (CREATE)17 systems, as well as the adoption of various open-source frameworks such as Apache Lucene18, Solr19, and Elasticsearch9 for institution-specific implementations.
Desideratum III: Support multi-modal retrieval and result integration
Fully determining whether a patient matches a phenotype may not always be possible with the information contained within any single data source, requiring additional data sources, e.g., for information documented in clinical narratives20,21,22 as opposed to within structured EHR data records, or information from radiology images and associated reports23,24,25.
In addition, traditional EHR-based data sources are potentially biased in that underserved/underrepresented populations will be similarly underrepresented in the data, a significant concern for data-driven downstream applications26,27,28. Inclusion of additional data sources helps ameliorate this issue. For instance, if the site doing the in-silico phenotyping is a tertiary medical institution, a substantial amount of history will not be available structurally (e.g., only available via scanned images or clinical text). If only a structured data source is used for phenotyping, the results will be biased as rural/underrepresented populations may have a substantial history captured in text or image29 and thus inaccessible to the phenotyping algorithm.
Multi-modal computation of complex phenotype definitions consequently complicates in-silico implementation. Manual overhead is introduced via identification of additional necessary data sources, query refinement to local data representations, scoring, and result integration.
These processes should therefore be supported within the tool itself, rather than being left to manual efforts. While solutions do exist for multi-server querying in the general domain (e.g., cross-server joins in SQL), such solutions tend to be difficult to setup, be limited to a single data type, and have scoring be done on a per-data source basis, thus leading to retrieval not being truly multi-modal.
Desideratum IV: Support extensions such that textual phenotype definitions can be autonomously converted into local code sets for review
Many phenotype definitions are distributed as textual descriptions30. For in-silico phenotyping, these textual descriptors are typically manually translated into equivalent institutional data source-computable representations31,32. Similarly, even for those phenotypes distributed as computable representations33,34,35, said representations will typically also need further refinement prior to local use, particularly if natural language processing (NLP) is involved36. Such conversions/refinements (e.g., disease names to International Classification of Diseases 10 codes, or appropriate textual variants for NLP-derived data) are typically done over multiple iterations3, bottlenecking new algorithm implementation.
Collectively predefining valuesets that correspond to a specific phenotype criterion before distribution of the phenotype definition has been proposed37. Usage, however, may not always be feasible for implementing institutions. For instance, while the Logical Observation Identifiers Names and Codes (LOINC) vocabulary is used to codify lab tests, some institutions may use an institution-local code-set without a LOINC mapping. Incorporating standard vocabularies in CDMs such as the OMOP CDM13 partially addresses this issue, but requiring usage of the CDM violates Desideratum I, and implementations are non-uniform5. In addition, the information required for a phenotyping task may not always be fully representable in the CDM. Explicitly defining such valuesets, while helpful as an initial reference point, will therefore often still require additional manual conversion.
To reduce manual burden, increase mapping reusability, and accelerate the implementation of new phenotype definitions, tools should therefore provide the capability to autonomously convert textual descriptions into local representations. An interface should be provided for abstractors to review/refine conversions. In addition, the capability for individual institutions to implement mappings to local datasets from textual descriptions should be provided. Existing examples of such autonomous mapping systems include Eligibility criteria Information Extraction (EliIE)30 and Criteria2Query38. General clinical NLP systems such as MedTagger39 and the Clinical Text Analysis Knowledge Extraction System (cTAKES)40 are also repurposable for this task.
Desideratum V: Maximize reusability and data reproducibility, minimize technical overhead, and enhance downstream generalizability
The domain expertize of typical users of phenotyping tools differs from those that would possess the knowledge to integrate tools with local data sources, and extract information from said data sources. Ideally, as the latter setup process tends to be the bottlenecking step for in-silico phenotyping algorithm implementation, toolsets should be reusable across multiple phenotyping tasks.
Beyond toolset reusability, however, individual phenotyping projects should also be reusable, from both monoinstitutional and multiinstitutional perspectives. As cohort retrieval is typically only an intermediate, but bottlenecking, step for other downstream applications, the ability to easily reuse identified cohorts is highly desirable to reduce duplicate development/phenotyping efforts31,41,42,43.
In addition, given that data reproducibility has been found critically lacking for datasets44,45,46,47, there is substantial benefit in centralized storage of both in-silico phenotyping algorithms and retrieved cohorts within a common toolset for later re-use and/or re-execution.
Finally, while cross-institution sharing of retrieved cohorts is unlikely due to privacy concerns, a common framework with sharable definitions will dramatically facilitate multi-institution phenotyping execution, facilitating development and evaluation of cross-institutionally generalizable digital health applications8,32,48.
These considerations are one of the motivations behind clinical CDMs such as OMOP13 and PCORnet14.
Desideratum VI: Reflect that in-silico phenotyping is an iterative, human-in-the-loop process
The human interpretation and translation process from textual definitions to local data source representations can be highly subjective, leading to inter-abstractor variation both within and without a clinical institution32,49,50.
Consequently, iterative definition refinement is required. This may involve manual review by multiple clinical abstractors to identify missing data elements and adjudicate disagreements in definition interpretations, repeating until adequate performance is achieved51.
To support such algorithm development, refinement, and implementation processes, tools must therefore support: (a) editing/refining phenotype definitions, (b) surfacing evidence supporting classification for review, and (c) identifying abstraction differences for adjudication.
Graphical user frontends supporting querying against the various clinical common data models (e.g., OHDSI Atlas52) support accessible editing phenotyping definitions and reviewing returned results. Such systems, however, typically lack support for presenting supporting evidence and relevance judgement, leading to the development of systems such as PRAI53 and CREATE17.
An example IMPACT implementation
Here, we present a full-stack in-silico phenotyping solution implementing these desiderata consisting of:
-
A web-based frontend user interface (UI) for phenotyping criteria definition and execution, as well as result relevance judgement and adjudication
-
A middleware component supporting cohort management, phenotype definition and abstractor judgement retention, patient evidence retrieval, textual descriptions translation, and job scheduling.
-
A backend that performs data source information retrieval and scoring, FHIR mapping, and writes match status, patient scores, and associated evidence to a database.
An overview of the system architecture using an example fully on-premises deployment is provided in Fig. 2. Additional example diagrams using other infrastructure setups can be found on our GitHub https://www.github.com/OHNLP/IMPACT. In the ensuing subsections, we will detail how IMPACT implements our listed desiderata.
Infrastructurally-agnostic, scalable, ranking-based patient-phenotype matching
To address scalability while maintaining flexibility across differing infrastructure setups, we implemented the backend using Apache Beam54, which is usable both across a wide variety of horizontally scaling frameworks, as well as on a single machine. For more details on horizontal scaling and the specific frameworks supported by the example IMPACT implementation, please refer to the Supplementary Information.
For ranked scoring, we leverage a modification of BM25 + 55,56 to score patients relative to how well they match the phenotype, where each patient is treated as a “document” and clinical entities such as a diagnosis or a lab test are “tokens” within said “document”. Firstly, leaf criterion (i.e., is not a combinatorial boolean condition such as “must have all of”, “at least n of”, “none of”, or similar, but rather a description of a condition, medication, etc.) are grouped such that they are of the same clinical entity type, and BM25+ scoring is run separately for each. Specifically, the base BM25+ score for a given patient P and leaf criterion ci can be calculated as shown in eq. (1):
where N is the number of patients in the data source, n(ci) is the number of patients that leaf criterion ci matches, f(ci, P) is the number of distinct records for which patient matches criterion ci, |P| is the patient term length (i.e., number of entities of the same clinical data type (condition, medication, etc) as ci), avgplen is the average |P| across all patients in the cohort. The BM25+ scores of leaf criteria are then combined based on the boolean logic as defined by the phenotype definition. For OR (“must have at least n of”), the mean of the top scores of child criteria is used. For AND (“must have all of”), the mean score of all children is used. For NOT (“must not have”), the maximum of all child scores is multiplied by −1. For more details on the BM25+ algorithm, its selection as our default scoring algorithm, and associated hyperparameters, please refer to the Supplementary Information. A Java application programming interface (API) is also provided for implementing custom scoring algorithms.
Data source flexibility via FHIR conversions, CDM support, and JSON-based plug and play configuration
For IMPACT, we chose to use HL7 Fast Health Interoperability Resources (FHIR) R412 data structures as our internal representation for clinical data. For more details on FHIR and why it was chosen, please refer to the Supplementary Information.
So long as a mapping function can be written to produce FHIR resources, any data source can be used in IMPACT. To facilitate adoption, we supply built-in functions for common use cases. For SQL/JDBC compatible data sources, a configurable mapping function is provided that allows users to specify SQL queries and associated FHIR mappings via JavaScript Object Notation (JSON) config. For on-demand clinical NLP (i.e., artifacts extracted at runtime), we build upon our previous work57 to provide a clinical information extraction mapping function that extracts clinical entities to text and converts them58,59 to appropriate FHIR resources. Built-in support and mapping functions for the OMOP13 (including NLP tables) and PCORnet14 CDMs are also provided that allow for immediate, out-of-the-box, use with minimal additional configuration. Custom mapping functions can also be included via implementation of a Java API.
IMPACT supports cross-server data integration by allowing for an arbitrary number of data sources to be queried on any given phenotyping task so long as common patient IDs are used (or can be mapped) and a FHIR mapping function is defined. The data sources and mappings used for scoring are specified as part of a JSON configuration and can be customized on a per-project basis via the frontend GUI. Individual patient scores are computed per-data source and are then combined using a weighted summation (please refer to the Supplementary Information section on BM25+ scoring for more details).
Autonomous NLP-based conversion of textual phenotype definitions
To generate data source-computable representations from textual definitions, the middleware component contains an integrated MedTagger39,57 pipeline to perform named entity recognition and entity linking to Unified Medical Language System (UMLS)60 concept codes (CUIs). For more information on the UMLS, coding systems, and the necessity of codeset mapping, please refer to the Supplementary Information. Each leaf criterion (i.e., some clinical entity that is part of the phenotype definition, as opposed to non-leaf criterion, which refers to the boolean logics such as “must have all/one/none of …” that links multiple leaf criterion together) automatically goes through this pipeline to generate a UMLS CUI code set if no computable representations are provided. This process can also be manually triggered by the end user. The UMLS CUIs are then converted to local data source formats depending on data source configurations. IMPACT offers built in mapping to any UMLS source vocabulary, to the OHDSI Athena Vocabulary61, as well any UMLS subset for the on-demand NLP data source. In addition, manual mappings from UMLS CUIs can be provided via configuration. End users may also extend our Java API to implement their own mapping function.
The generated representations are then grouped by data source and displayed in the frontend web interface for refinement by clinical abstractors.
Re-usable infrastructure and phenotype representations and associated implications on data reproducibility and downstream algorithm generalizability
Thus far, we have primarily discussed backend components that must be setup on initial deployment. Once this setup is complete, the system can be re-used across a large variety of phenotyping tasks without additional setup/technical expertize required (unless the addition of more data sources is desired), thus greatly accelerating implementation of new phenotyping algorithms. In addition, common re-usable infrastructure greatly accelerates porting to multiinstitutional settings, facilitating generalizable algorithm development.
The retention of abstractor curated representations of a phenotype by the middleware component enables later re-use. To maximize re-use, users may choose to publicize these collections of representations within the IMPACT platform and share with other users at the same institution.
Central storage of the refined algorithms and datasets on the middleware server also greatly enhances data provenance/reproducibility. Should the algorithm need to be re-ran (e.g., for updated data temporally), the original local representations and associated refinements are retained, as well as a specific record of which datasets/data sources were queried in the original retrieval. Similarly, should it be desired to re-use the retrieved patient cohort itself, the retrieved cohort along with human judgements and associated query metadata is retained for immediate download.
Human in the loop evidence review and adjudication
The web frontend offers an interface for phenotype definition (Fig. 3) and displays a list of patients sorted by match score (Fig. 4), with the option to switch to boolean filtering. Upon patient selection, the user is presented with the definition. The abstractor can view the evidence and judge their correctness for each definition criterion (Fig. 5). Switching to adjudication mode lists judgment conflicts between all abstractors.
These capabilities bring several benefits. Firstly, having the relevant evidence aggregated and presented to the adjudicator by matching phenotype criterion accelerates determination of whether a given patient matches the query phenotype. In addition, to perform iterative refinement and fine-tuning of phenotyping algorithms, algorithm errors (and evidence associated with said errors) must first be identified Having disagreement/adjudication functions built into the interface greatly facilitates this process. Finally, this interface/human-in-the-loop approach allows for the inclusion of external contextual information that may be absent from or contradict the clinical documentation itself, which may be helpful for certain use cases, e.g., “patient was contacted for a clinical trial, indicated that he had an undocumented positive/disqualifying smoking status”.
Discussion
The desiderata presented here are not comprehensive: they are the results of our observations while implementing in-silico phenotyping, but experiences will vary. As such, we anticipate evolution in the framework as part of our open science efforts as feedback from users is incorporated. In addition, individual approaches to the various desiderata exist, but to our knowledge are spread across disparate toolsets and not integrated into a common solution. For example, while Atlas does offer phenotyping query execution, it is limited to using the OMOP CDM and does not support text retrieval. Similarly, EMERSE offers querying on text but has limited flexibility for working with multi-modal queries. Our current implementation is therefore intended to serve as a baseline that works reasonably and is easy to adopt/extend, but may not be state-of-the-art. To facilitate customization with other approaches, the application allows for modular component swapping.
A trade-off of infrastructure flexibility is runtime performance. Specifically, FHIR mapping is done on-demand to obviate instantiating a new data warehouse. Around 90%, per instrumentation, of runtime is spent on FHIR mapping. For reference, our observed performance using 128 central processing unit cores was 6 h for 1.9 million patients (with structured data and NLP). While this is still a significant improvement over manual efforts, pre-mapping/storing FHIR resources into a data store such as MongoDB or Elasticsearch, obviating on-demand mapping, would be more efficient.
Finally, while evaluations have previously been done on individual component implementations, a full evaluation in aggregate would be helpful. Due to the characteristics inherent to the phenotyping task, a meaningful systemic evaluation would require multiinstitutional deployment of the application and gold standard corpora development for each site across a variety of phenotyping tasks. For more details on this, please refer to the Supplementary Information. We have left such efforts to future work.
Conclusions
Rapid in-silico clinical phenotyping on large datasets is of critical importance to accelerate research and development in the digital health domain. In this article, we have outlined some underlying complications hindering implementation of in-silico phenotyping and presented a framework, accompanied by an example implementation, addressing them.
Data availability
Data used as part of our use-case testing for the IMPACT implementation is considered protected health information and would be difficult to share with anyone not involved in an IRB-approved collaboration with the Mayo Clinic. We do, however, provide manually generated synthetic data that can be used as a stand-in to evaluate front-end GUI functionality. Said synthetic data is distributed alongside the IMPACT software application code.
Code availability
The IMPACT implementation is open-source, code for which can be found at https://www.github.com/OHNLP/IMPACT. Please note that this repository is only the parent/tracking repository, and that IMPACT has several subcomponents each in their own GitHub repository. Links to the repositories for these subcomponents can be found in the README of the parent repository.
References
Weng, C., Tu, S. W., Sim, I. & Richesson, R. Formal representation of eligibility criteria: a literature review. J. Biomed. Inf. 43, 451–467 (2010).
Richesson, R. L., Horvath, M. M. & Rusincovitch, S. A. Clinical research informatics and electronic health record data. Yearb. Med. Inf. 9, 215–223 (2014).
Thadani, S. R., Weng, C., Bigger, J. T., Ennever, J. F. & Wajngurt, D. Electronic screening improves efficiency in clinical trial recruitment. J. Am. Med. Inf. Assoc. 16, 869–873 (2009).
Pathak, J., Kho, A. N. & Denny, J. C. Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. J. Am. Med. Inf. Assoc. 20, e206–e211 (2013).
Campion, T. R., Craven, C. K., Dorr, D. A. & Knosp, B. M. Understanding enterprise data warehouses to support clinical and translational research. J. Am. Med. Inf. Assoc. 27, 1352–1358 (2020).
Ross, J., Tu, S., Carini, S. & Sim, I. Analysis of eligibility criteria complexity in clinical trials. Summit Transl. Bioinform. 2010, 46–50 (2010).
Madigan, D. et al. Evaluating the impact of database heterogeneity on observational study results. Am. J. Epidemiol. 178, 645–651 (2013).
Fu, S. et al. Assessment of Data Quality Variability across Two EHR Systems through a Case Study of Post-Surgical Complications. AMIA Annu Symp. Proc. 2022, 196–205 (2022).
Elasticsearch B.V. Elasticsearch, https://github.com/elasticsearch/elasticsearch (2015).
MongoDB Inc. The MongoDB Database, https://github.com/mongodb/mongo (2009).
Google Inc. BigQuery: Enterprise Data Warehouse, https://cloud.google.com/bigquery (2011).
Health Level 7 International. Fast Healthcare Interoperability Resources (FHIR), https://hl7.org/fhir/R4/ (2019).
Overhage, J. M., Ryan, P. B., Reich, C. G., Hartzema, A. G. & Stang, P. E. Validation of a common data model for active safety surveillance research. J. Am. Med. Inf. Assoc. 19, 54–60 (2012).
Fleurence, R. L. et al. Launching PCORnet, a national patient-centered clinical research network. J. Am. Med. Inf. Assoc. 21, 578–582 (2014).
Yadav, H., Du, Z. & Joachims, T. Policy-Gradient Training of Fair and Unbiased Ranking Functions. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM SIGIR 2021, 1044–1053 (2021).
Hanauer, D. A. EMERSE: The Electronic Medical Record Search Engine. AMIA Annu. Symp. Proc. 2006 Annual Symposium of the American Medical Informatics Association, 941 (2006).
Liu, S. et al. Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation. JMIR Med. Inf. 8, e17376 (2020).
Apache Software Foundation. Apache Lucene, https://lucene.apache.org/ (2022).
Shahi, D. Apache Solr: A Practical Approach to Enterprise Search. (APress, 2015).
Wang, Y. et al. Clinical information extraction applications: A literature review. J. Biomed. Inform. 77, 34–49 (2018).
Fu, S. et al. Ascertainment of Delirium Status Using Natural Language Processing From Electronic Health Records. J. Gerontol. A Biol. Sci. Med Sci. 77, 524–530 (2022).
Sagheb, E. et al. Use of Natural Language Processing Algorithms to Identify Common Data Elements in Operative Notes for Knee Arthroplasty. J. Arthroplast. 36, 922–926 (2021).
Gao, F. et al. SD-CNN: A shallow-deep CNN for improved breast cancer diagnosis. Comput Med. Imaging Graph. 70, 53–62 (2018).
Sun, L. et al. Breast Mass Detection in Mammography Based on Image Template Matching and CNN. Sensors (Basel) 21 (2021). https://doi.org/10.3390/s21082855
Che, H., Brown, L. G., Foran, D. J., Nosher, J. L. & Hacihaliloglu, I. Liver disease classification from ultrasound using multi-scale CNN. Int J. Comput. Assist Radio. Surg. 16, 1537–1548 (2021).
Juhn, Y. J. et al. Assessing socioeconomic bias in machine learning algorithms in health care: a case study of the HOUSES index. J. Am. Med. Inf. Assoc. 29, 1142–1151 (2022).
Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).
Rajkomar, A., Hardt, M., Howell, M. D., Corrado, G. & Chin, M. H. Ensuring Fairness in Machine Learning to Advance Health Equity. Ann. Intern. Med. 169, 866–872 (2018).
Moon, S. et al. Salience of Medical Concepts of Inside Clinical Texts and Outside Medical Records for Referred Cardiovascular Patients. J. Health. Inf. Res. 3, 200–219 (2019).
Kang, T. et al. EliIE: An open-source information extraction system for clinical trial eligibility criteria. J. Am. Med. Inf. Assoc. 24, 1062–1071 (2017).
Gilbert, E. H., Lowenstein, S. R., Koziol-McLain, J., Barta, D. C. & Steiner, J. Chart reviews in emergency medicine research: Where are the methods? Ann. Emerg. Med. 27, 305–308 (1996).
Fu, S. et al. Assessment of the impact of EHR heterogeneity for clinical research through a case study of silent brain infarction. BMC Med Inf. Decis. Mak. 20, 60 (2020).
Pagali, S. R., Kumar, R., Fu, S., Sohn, S. & Yousufuddin, M. Natural Language Processing CAM Algorithm Improves Delirium Detection Compared With Conventional Methods. Am. J. Med. Qual. (2022). https://doi.org/10.1097/JMQ.0000000000000090
Safarova, M. S., Liu, H. & Kullo, I. J. Rapid identification of familial hypercholesterolemia from electronic health records: The SEARCH study. J. Clin. Lipido. 10, 1230–1239 (2016).
Zeng, Z., Deng, Y., Li, X., Naumann, T. & Luo, Y. Natural Language Processing for EHR-Based Computational Phenotyping. IEEE/ACM Trans. Comput. Biol. Bioinform. 16, 139–153 (2019).
Sohn, S. et al. Clinical documentation variations and NLP system portability: a case study in asthma birth cohorts across institutions. J. Am. Med. Inf. Assoc. 25, 353–359 (2018).
Bodenreider, O. et al. The NLM value set authority center. Stud. Health Technol. Inf. 192, 1224 (2013).
Yuan, C. et al. Criteria2Query: a natural language interface to clinical databases for cohort definition. J. Am. Med. Inf. Assoc. 26, 294–305 (2019).
Liu, H. et al. An information extraction framework for cohort identification using electronic health records. AMIA Jt Summits Transl. Sci. Proc. 2013, 149–153 (2013).
Savova, G. K. et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J. Am. Med. Inf. Assoc. 17, 507–513 (2010).
Vassar, M. & Holzmann, M. The retrospective chart review: important methodological considerations. J. Educ. Eval. Health Prof. 10, 12 (2013).
Grishman, R., Huttunen, S. & Yangarber, R. Information extraction for enhanced access to disease outbreak reports. J. Biomed. Inf. 35, 236–246 (2002).
South, B. R. et al. Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease. BMC Bioinforma. 10, S12 (2009).
Anderson, W. P. Reproducibility: Stamp out shabby research conduct. Nature 519, 158 (2015).
Baker, D., Lidster, K., Sottomayor, A. & Amor, S. Reproducibility: Research-reporting standards fall short. Nature 492, 41 (2012).
Begley, C. G., Buchan, A. M. & Dirnagl, U. Robust research: Institutions must do their part for reproducibility. Nature 525, 25–27 (2015).
Kolker, E. et al. Reproducibility: In praise of open research measures. Nature 498, 170 (2013).
Chapman, W. W. et al. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J. Am. Med. Inf. Assoc. 18, 540–543 (2011).
Musen, M. A., Rohn, J. A., Fagan, L. M. & Shortliffe, E. H. Knowledge engineering for a clinical trial advice system: uncovering errors in protocol specification. Bull. Cancer 74, 291–296 (1987).
Leung, L. Y. et al. Agreement between neuroimages and reports for natural language processingbased detection of silent brain infarcts and white matter disease. BMC Neurol. 21, 189 (2021).
Fu, S. et al. Clinical concept extraction: A methodology review. J. Biomed. Inf. 109, 103526 (2020).
Observational Health Data Sciences and Informatics. OHDSI/Atlas - an Open Source Software Tool for Researchers to Conduct Scientific Analyses on Standardized Observational Data, https://github.com/OHDSI/Atlas (2022).
Wu, S. et al. in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016 3412-3416 (European Language Resources Association (ELRA), Portoroz, Slovenia, 2016).
Apache Software Foundation. Apache Beam, https://beam.apache.org/ (2022).
Zaragoza, H. & Robertson, S. The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends® Inf. Retr. 3, 333–389 (2009).
Lv, Y. & Zhai, C. Lower-bounding term frequency normalization. Proceedings of the 20th ACM international conference on Information and knowledge management. CIKM '11, 7–16 (2011).
Wen, A. et al. Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Clinic NLP-as-a-service implementation. NPJ Digit. Med. 2, 130 (2019).
Hong, N. et al. Integrating Structured and Unstructured EHR Data Using an FHIR-based Type System: A Case Study with Medication Data. AMIA Jt Summits Transl. Sci. Proc. 2017, 74–83 (2018).
Hong, N. et al. Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data. JAMIA Open 2, 570–579 (2019).
Bodenreider, O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267–D270 (2004).
Observational Health Data Sciences and Informatics. Athena: Observational Health Data Sciences and Informatics – OHDSI, https://athena.ohdsi.org/ (2022).
Acknowledgements
Research reported in this publication was supported by the National Center for Advancing Translational Science of the National Institutes of Health under award number U01TR002062 and by the National Library of Medicine under award number R01LM011934. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Use case testing of the IMPACT framework implementation was approved by the Mayo Clinic institutional review board (IRB # 20-001137) for human subject research. We gratefully acknowledge Michael Lin, Carmen Vodislav, Robert Gehrke, Kathryn Cook, David Strauss, Dania Helgeson, Thomas Kingsley, and Alexander Ryu from the Mayo Clinic for their constructive feedback during the IMPACT front-end development process. In addition, we gratefully acknowledge Samuel A McKinven for his editorial support with this paper.
Author information
Authors and Affiliations
Contributions
A.W., H.H.: Equal contribution to this paper. A.W., H.H., S.F., S.L., K.M., H.L.: Designed and Implemented Framework. A.W., H.H., S.F., L.W.: Use case testing. K.E.R., S.D.B., W.R.H., H.L.: Direction on framework design and cohort retrieval approaches. H.L.: Project leadership. All authors reviewed and contributed expertize to this paper.
Corresponding author
Ethics declarations
Competing interests
Author H.L. is an Editorial Board Member of npj Digital Medicine. They played no role in the peer review or decision to publish this paper. The authors declare no further financial or non-financial competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wen, A., He, H., Fu, S. et al. The IMPACT framework and implementation for accessible in silico clinical phenotyping in the digital era. npj Digit. Med. 6, 132 (2023). https://doi.org/10.1038/s41746-023-00878-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41746-023-00878-9