[HTML][HTML] Automatic extraction of ICD-O-3 primary sites from cancer pathology reports

R Kavuluru, I Hands, EB Durbin…�- AMIA Summits on�…, 2013 - ncbi.nlm.nih.gov
AMIA Summits on Translational Science Proceedings, 2013ncbi.nlm.nih.gov
Although registry specific requirements exist, cancer registries primarily identify reportable
cases using a combination of particular ICD-O-3 topography and morphology codes
assigned to cancer case abstracts of which free text pathology reports form a main
component. The codes are generally extracted from pathology reports by trained human
coders, sometimes with the help of software programs. Here we present results that improve
on the state-of-the-art in automatic extraction of 57 generic sites from pathology reports�…
Abstract
Although registry specific requirements exist, cancer registries primarily identify reportable cases using a combination of particular ICD-O-3 topography and morphology codes assigned to cancer case abstracts of which free text pathology reports form a main component. The codes are generally extracted from pathology reports by trained human coders, sometimes with the help of software programs. Here we present results that improve on the state-of-the-art in automatic extraction of 57 generic sites from pathology reports using three representative machine learning algorithms in text classification. We use a dataset of 56,426 reports arising from 35 labs that report to the Kentucky Cancer Registry. Employing unigrams, bigrams, and named entities as features, our methods achieve a class-based micro F-score of 0.9 and macro F-score of 0.72. To our knowledge, this is the best result on extracting ICD-O-3 codes from pathology reports using a large number of possible codes. Given the large dataset we use (compared to other similar efforts) with reports from 35 different labs, we also expect our final models to generalize better when extracting primary sites from previously unseen reports.
ncbi.nlm.nih.gov
Showing the best result for this search. See all results