Development of NLP Algorithms or Other Mechanisma to Capture Molecular Markers from Tumors

This project applies natural language processing (NLP) to advance the capabilities of the Cancer Research Network’s (CRN’s) Informatics Core, and to foster innovative research consistent with the mission of the Epidemiology of Prognosis & Outcomes Scientific Working Group (EPO). The proposed pilot develops and validates NLP systems that extract high-value information from unstructured clinical text and simplifies sharing these systems via self-installing Web-based deployment models, dramatically lowering the technical requirements of local adoption. Three compelling information extraction tasks have been chosen to illustrate the power of these NLP systems: extracting Oncotype DX scores, extracting KRAS molecular marker status, and identifying ICU utilization, a serious and costly complication of cancer treatment which has is understudied outside of clinical trials. We will use the developed systems to conduct real-world tests at Kaiser Colorado of two of the three NLP systems developed at Group Health (Oncotype DX and KRAS) using gold-standard, preexisting chart abstraction. Key products of this project include portable, re-usable, NLP systems for extracting specific information from progress notes, and NLP-extracted pilot data to support R01 or comparable grant applications addressing questions related to adherence to cancer treatment guidelines and/or investigating complications associated with different cancer regimens. We will also develop a draft data schema for incorporating NLP-extracted data into the Virtual Data Warehouse (VDW). The proposed project will advance the work of EPO in pursuit of its mission to make CRN the most complete and comprehensive data resource for cancer prognosis, treatment and outcomes research in the United States.

<< Back to the list of CRN projects