This resource provides an easy way to compare two popular data models used in US clinical research community: PCORnet Common Data Model (CDM) and HCSRN Virtual Data Warehouse data model, which is used by Cancer Research Network (CRN). Update: CAPriCORN data model was added and will be maintained as a separate branch with the version number that matches PCORnet Common Data Model version number.
Written by Tomas Mackevicius
The idea to create data model mapping tables was born when SHCC MU-NCORP together with CCH Collaborative Research Unit joined Cancer Care Delivery Research (CCDR) initiative. CCH Collaborative Research Unit already participates in PCORI sponsored CAPriCORN project, so it made sense to evaluate data models and see if data is similar enough to be reused.
Each institution will have to evaluate available data sources in order to prepare data sets compatible with the VDW data model. The process will be unique to each site, but if site already participates in other research projects, reusing existing data sets might be a real time saver.
Initial statistics are pretty encouraging - 51% of fields in similar tables of PCORnet CDM and VDW data models are compatible!
Download Data Model Mapping Tables - Last updated: 4/12/2016
These data model mapping tables will be helpful not only for the sites that are currently involved in PCORI sponsored projects, but for all sites participating in any similar research projects. Mapping tables provide you with overlook of what other data might be beneficial to collect, especially if you're planning to be involved in PCORI projects in the future.
As you can see in the example bellow, we also use these mapping tables to track availability of data on our own servers/data sets. This way we start building data dictionary dedicated to research projects.
It takes time to discover data
Even if you already have whole infrastructure in place, considerable amount of time might be required to discover actual data and get access to it. Some EMR systems have highly-normalized database structure that makes very difficult to find data you need. Some vendors might not even share full data dictionary of EMR with you, or if they would share data dictionary, you may find field names being very cryptic... most likely you will need to work with your vendor on a case-by-case basis to pinpoint specific data fields.
For example, we use Cerner EMR system which contains more than 10.000 tables. Even if we would know that certain data is being captured, we may need to get in touch with the IT department or EMR vendor, to get exact location where that information is actually stored.
Insights about structuring data
Recently I had an opportunity to meet Eugene M. Sadhu, Visiting Senior Research Specialist @UIC, who is also involved in CAPriCORN project development. He shared some valuable insights that can be beneficial to everyone in research field.
First, when designing data sets or creating subsets to match certain data model, we have to be careful not to lose important meta data. For example, if we record medical procedures, but do not record the date/time of those procedures, we lose the ability to correlate procedures with other related events. In PCORnet Common Data Model, procedure date was included only in version 2.0.
My take on this is to think prospectively and collect that important meta data even if it is not required at this time, because very likely it will become a part of a data model in the future. It might be not a big problem for the sites where full data sets can be accessed at any time, but our experience shows that sometimes you will be presented with limited opportunities to get data you need and in this case it is always beneficial to request more than less.
Second, when designing data models, we have to look at what queries will have to be executed and if the particular database structure (normalized vs denormalized or some hybrid approach) is beneficial for this. Researchers might prefer normalized tables, because it makes statistical analysis easier, but this can make certain queries become very complex. On the other hand denormalized data structure gives you elegant, fast queries and might give you a better understanding about dynamics of the recorded events.
Data model mapping will be completed in two stages:
- Mapping will be done considering field names and meaning (completed in v1.0).
- Additional information about the differences of field properties and field meaning will be collected.
- A possibility to use Mirth Connect for the mapping of EMR data to data models used in research.
|4/12/2016||3.0||Updated to current data model versions|
|4/9/2015||2.0||Added CAPriCORN data model|