The healthcare delivery systems affiliated with the CRN have an ethical and legal obligation to safeguard the confidentiality of medical information of their individual members and patients. The CRN operates as a distributed data network, meaning that each site retains its own data. There is no central data repository. This data structure protects the confidentiality of patient and provider data. However, the CRN has developed standardized data resources to increase the quality and efficiency of research using electronic data. These include the Virtual Data Warehouse (VDW), cancer counters, and other site-specific extracts of electronic medical records. Some CRN Sites have substantial experience with natural language processing. The CRN also operates under an NIH Certificate of Confidentiality that further shields CRN research information containing patient or provider identification from third party discovery.
The CRN Informatics Core is responsible for maintaining and strengthening the CRN's data infrastructure, with a clear focus on cancer-related data and programming elements. The Informatics Core monitors data availability and quality across sites, and develops and maintains resources for scientists using or planning to use CRN data resources. To carry out these functions, the Informatics Core works closely with all other CRN committees, CRN Sites, and the Health Care Systems Research Network.
The Virtual Data Warehouse (VDW)
The VDW is a distributed data warehouse. That is, it is a federated database that is comprised of standardized datasets stored behind security firewalls at each participating CRN site. The datasets include variables with identical names, formats, and specifications (including definitions, labels, and coding). Individual-level data at each CRN site remains under local control. The VDW is supported by a set of informatics tools - hardware and software - facilitating storage, retrieval, processing, and managing VDW datasets. A set of access policies and procedures govern use of VDW resources. Documentation of all elements of the VDW is also maintained by the CRN Informatics Core, working closely with the HCSRN VDW team to ensure complementarity rather than duplication of work. The VDW is described in more detail on the HCSRN's Collaboration Toolkit and in the table below.
Virtual Data Warehouse Elements
|Table Name||Description||Key Variables|
|Enrollment||Start and stop periods of enrollment||MRN, StartDate, EndDate, PrimaryCareProvider|
|Demographics||MRN, gender, BirthDate, Race|
|Utilization||A record per ambulatory visit or hospitalization||MRN, DXs, PXs, Providers, Dates. Encounter Type|
|Outpatient RX||MRN, NDC, FillDate, RxMD, Amount|
|Tumor||Tumor Registry||MRN, DxDate, Site, Stage, Morphology|
|Laboratory Results||MRN, Dates, TestType, PX, Result, OrderMD|
|Vital Signs||MRN, MeasureDate, Height, Weight, Systolic, Diastolic|
|Social History||MRN, MeasureDate, TobaccoUse, AlcoholUse|
|Death||MRN, DeathDate, Cause of Death, Source|
|Provider||Links to provider codes in Utilization, RX, Lab, Enrollment||ProviderCode, Specialty, BirthYear, GraduationYear, Gender, Race|
To facilitate efficient study planning, the CRN created and maintains the Cancer Counter that includes aggregated patient counts by tumor site, morphology, stage, health plan, vital status, race, gender, and Hispanic ethnicity. The Cancer Counter has proven to be invaluable for quick estimation of potential study population size for new and developing cancer research proposals.
The CRN is instituting a standardized distributed query tool, PopMedNet, to extend the Cancer Counter concept to other types of medical utilization and enrollment data. The CRN is implementing "CRNnet", which will serve as a new platform to perform secure distributed querying using the PopMedNet query engine. A lead scientist of the CRN Informatics Core is a key player in the development and maintenance of PopMedNet. The PopMedNet software platform is used by several other large-scale distributed data and research networks, including the FDA Mini-Sentinel and the NIH Collaborative Distributed Research Network. This standardized approach to distributed querying will facilitate collaboration inside and outside CRN.
Electronic Medical Records (EMRs) and Natural Language Processing (NLP)
More CRN sites use EpicCare® than any other EMR system. EMRs allow researchers to manipulate and standardize free-text clinical data such as clinical assessment findings, image interpretations, pathology evaluations, hospital discharge summaries, and consultant evaluations. In addition to the standard physician user-interface, the EMRs also have a patient interface, where patients can view items in their medical record (such as visit summaries and laboratory test results), send secure messages to their physicians, and enter information into a health risk assessment survey or other survey instrument. This provides the CRN with opportunities for innovative interventions. Natural Language Processing (NLP) helps investigators to identify the variety of sentences, clauses, words, symbols, and abbreviations that represent synonyms for a concept of research interest.
Contact the CRN Informatics Core
For questions about the Informatics Core, please contact the Core's project manager, Monica Fujii, MPH