N3C-Formatted OMOP2OBO Mappings
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7249165
下载链接
链接失效反馈官方服务:
资源简介:
OMOP2OBO Mappings - N3C OMOP to OBO Working group
This repository stores OMOP2OBO mappings which have been processed for use within the National COVID Cohort Collaborative (N3C) Enclave. The version of the mappings stored in this repository have been specifically formatted for use within the N3C Enclave.
N3C OMOP to OBO Working Group: https://covid.cd2h.org/ontology
Accessing the N3C-Formatted Mappings
You can access the three OMOP2OBO HPO mapping files in the Enclave from the Knowledge store using the following link: https://unite.nih.gov/workspace/compass/view/ri.compass.main.folder.1719efcf-9a87-484f-9a67-be6a29598567.
The mapping set includes three files, but you only need to merge the following two files with existing data in the Enclave in order to be able to create the concept sets:
OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_expression_items.csv
OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_version.csv
The first file OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_expression_items.csv, contains columns for the OMOP concept ids and codes as well as specifies information like whether or not the OMOP concept’s descendants should be included when deriving the concept sets (defaults to FALSE). The other file OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_version.csv, contains details on the mapping’s label (i.e., the HPO curie and label in the concept_set_id field) and its provenance/evidence (the specific column to access for this information is called intention).
Creating Concept Sets
Merge these files together on the column named codeset_id and then join them with existing Enclave tables like concept and condition_occurrence to populate the actual concept sets. The name of the concept set can be obtained from the OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_version.csv file and is stored as a string in the column called concept_set_id. Although not ideal (but is the best way to approach this currently given what fields are available in the Enclave), to get the HPO CURIE and label will require applying a regex to this column.
An example mapping is shown below (highlighting some of the most useful columns):
codeset_id: 900000000
concept_set_id: [OMOP2OBO] hp_0002031-abnormal_esophagus_morphology
concept: 23868
code: 69771008
codeSystem: SNOMED
includeDescendants: False
intention:
Mixed - This mapping was created using the OMOP2OBO mapping algorithm (https://github.com/callahantiff/OMOP2OBO).
The Mapping Category and Evidence supporting the mappings are provided below, by OMOP concept:
23868
*******
Mapping Category: Automatic Exact - Concept
------------------------------------------------
Mapping Provenance
------------------
OBO_DbXref-OMOP_ANCESTOR_SOURCE_CODE:snomed_69771008 | OBO_DbXref-OMOP_CONCEPT_SOURCE_CODE:snomed_69771008 | CONCEPT_SIMILARITY:HP_0002031_0.713
Release Notes - v2.0.0
Preparation
In order to import data into the Enclave, the following items are needed:
Obtain API Token, which will be included in the authorization header (stored as GitHub Secret)
Obtain username hash from the Enclave
OMOP2OBO Mappings (v1.5.0)
Data
Concept Set Container (concept_set_container): CreateNewConceptSet
Concept Set Version (code_sets): CreateNewDraftOMOPConceptSetVersion
Concept Set Expression Items (concept_set_version_item): addCodeAsVersionExpression
Script
n3c_mapping_conversion.py
Generated Output
Need to have the codeset_id filled from self-generation (ideally, from a conserved range) prior to beginning any of the API steps. The current list of assigned identifiers is stored in the file named omop2obo_enclave_codeset_id_dict_v2.0.0.json. Note that in order to accommodate the 1:Many mappings the codeset ids were re-generated and rather than being ampped to HPO concepts, they are mapped to SNOMED-CT concepts. This creates a cleaner mapping and will easily scale to future mapping builds.
To be consistent with OMOP tools, specifically Atlas, we have also created Atlas-formatted json files for each mapping, which are stored in the zipped directory named atlas_json_files_v2.0.0.zip. Note that as mentioned above, to enable the representation of 1:Many mappings the filenames are no longer named after HPO concepts they are now named with the OMOP concept_id and label and additional fields have been added within the JSON files that includes the HPO ids, labels, mapping category, mapping logic, and mapping evidence.
File 1: concept_set_container
Generated Data: OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_container.csv
Columns:
concept_set_id
concept_set_name
intention
assigned_informatician
assigned_sme
project_id
status
stage
n3c_reviewer
alias
archived
created_by
created_at
File 2: concept_set_expression_items
Generated Data: OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_expression_items.csv
Columns:
codeset_id
concept_id
code
codeSystem
ontology_id
ontology_label
mapping_category
mapping_logic
mapping_evidence
isExcluded
includeDescendants
includeMapped
item_id
annotation
created_by
created_at
File 3: concept_set_version
Generated Data: OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_version.csv
Columns:
codeset_id
concept_set_id
concept_set_version_title
project
source_application
source_application_version
created_at
atlas_json
most_recent_version
comments
intention
limitations
issues
update_message
status
has_review
reviewed_by
created_by
provenance
atlas_json_resource_url
parent_version_id
is_draft
Generated Output:
OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_container.csv
OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_expression_items.csv
OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_version.csv
atlas_json_files_v2.0.0.zip
omop2obo_enclave_codeset_id_dict_v2.0.0.json
创建时间:
2022-10-27



