National Cancer Institute   US National Institutes of Health | www.cancer.gov
US National Institutes of Health www.cancer.gov

Informatics & Research Tools

The healthcare delivery systems affiliated with the CRN have an ethical and legal obligation to safeguard the confidentiality of medical information of their individual members and patients. The CRN operates as a distributed data network, meaning that each site retains its own data. There is no central data repository. This data structure protects the confidentiality of patient and provider data. However, the CRN has developed standardized data resources to increase the quality and efficiency of research using electronic data. These include the Virtual Data Warehouse (VDW), cancer counters, and other site-specific extracts of electronic medical records. Some CRN Sites have substantial experience with natural language processing. The CRN also operates under an NIH Certificate of Confidentiality that further shields CRN research information containing patient or provider identification from third party discovery.

Informatics Core

The CRN Informatics Core is responsible for maintaining and strengthening the CRN's data infrastructure, with a clear focus on cancer-related data and programming elements. The Informatics Core monitors data availability and quality across sites, and develops and maintains resources for scientists using or planning to use CRN data resources. To carry out these functions, the Informatics Core works closely with all other CRN committees, CRN Sites, and the HMO Research Network.

The Virtual Data Warehouse (VDW)

The VDW is a distributed data warehouse. That is, it is a federated database that is comprised of standardized datasets stored behind security firewalls at each participating CRN site. The datasets include variables with identical names, formats, and specifications (including definitions, labels, and coding). Individual-level data at each CRN site remains under local control. The VDW is supported by a set of informatics tools - hardware and software - facilitating storage, retrieval, processing, and managing VDW datasets. A set of access policies and procedures govern use of VDW resources. Documentation of all elements of the VDW is also maintained by the CRN Informatics Core, working closely with the HMORN VDW team to ensure complementarity rather than duplication of work. The VDW is described in more detail on the HMORN's Collaboration Toolkit and in the table below.

Virtual Data Warehouse Elements

Table Name Description Key Variables
Enrollment Start and stop periods of enrollment MRN, StartDate, EndDate, PrimaryCareProvider
Demographics   MRN, gender, BirthDate, Race
Utilization A record per ambulatory visit or hospitalization MRN, DXs, PXs, Providers, Dates. Encounter Type
Outpatient RX   MRN, NDC, FillDate, RxMD, Amount
Tumor Tumor Registry MRN, DxDate, Site, Stage, Morphology
Laboratory Results   MRN, Dates, TestType, PX, Result, OrderMD
Vital Signs   MRN, MeasureDate, Height, Weight, Systolic, Diastolic
Social History   MRN, MeasureDate, TobaccoUse, AlcoholUse
Death   MRN, DeathDate, Cause of Death, Source
Provider Links to provider codes in Utilization, RX, Lab, Enrollment ProviderCode, Specialty, BirthYear, GraduationYear, Gender, Race

Cancer Counters

To facilitate efficient study planning, the CRN created and maintains the Cancer Counter that includes aggregated patient counts by tumor site, morphology, stage, health plan, vital status, race, gender, and Hispanic ethnicity. The Cancer Counter has proven to be invaluable for quick estimation of potential study population size for new and developing cancer research proposals.

The CRN is instituting a standardized distributed query tool, PopMedNet, to extend the Cancer Counter concept to other types of medical utilization and enrollment data. The CRN is implementing "CRNnet", which will serve as a new platform to perform secure distributed querying using the PopMedNet query engine. A lead scientist of the CRN Informatics Core is a key player in the development and maintenance of PopMedNet. The PopMedNet software platform is used by several other large-scale distributed data and research networks, including the FDA Mini-Sentinel and the NIH Collaborative Distributed Research Network. This standardized approach to distributed querying will facilitate collaboration inside and outside CRN.

Electronic Medical Records (EMRs) and Natural Language Processing (NLP)

More CRN sites use EpicCare® than any other EMR system. EMRs allow researchers to manipulate and standardize free-text clinical data such as clinical assessment findings, image interpretations, pathology evaluations, hospital discharge summaries, and consultant evaluations. In addition to the standard physician user-interface, the EMRs also have a patient interface, where patients can view items in their medical record (such as visit summaries and laboratory test results), send secure messages to their physicians, and enter information into a health risk assessment survey or other survey instrument. This provides the CRN with opportunities for innovative interventions. Natural Language Processing (NLP) helps investigators to identify the variety of sentences, clauses, words, symbols, and abbreviations that represent synonyms for a concept of research interest.

Contact the CRN Informatics Core

For questions about the Informatics Core, please contact the Core's project manager, Monica Fujii, MPH

Home | About CRN | Collaborating with CRN | Project Portfolio
Scientific & Data Resources | CRN Publications | Links to Related Web Sites