OMOP2OBO Measurement Mappings
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/6774443
下载链接
链接失效反馈官方服务:
资源简介:
OMOP2OBO Measurement Mappings V1.0
The mappings in this repository were created between OMOP standard measurement concepts (i.e., LOINC) to the Human Phenotype Ontology (HPO), Chemical Entities of Biological Interest (CheBI), Vaccine Ontology (VO), National Center for Biotechnology Information Taxon Ontology (NCBITaxon), Protein Ontology (PRO), Cell Ontology (CL), and the Uber-anatomy Ontology (UBERON).
For each measurement, all levels of the test result (results above, below, and within a reference range) were mapped, not only those deemed clinically relevant. Results outside of a reference range, but not currently deemed clinically relevant (as advised by the literature or consultation via domain expert), were annotated to the nearest relevant ontology concept ancestor. For example, when annotating the results of a test for Asparagus IgE Ab RAST class [Presence] in Serum (LOINC:15547-3), a result above a reference range would be annotated with an increased anti-plant-based food allergen IgE antibody level (HP:0410228). While a low level of this antibody may not be deemed clinically relevant, it is still outside of the provided reference range and thus was annotated to the nearest applicable concept ancestor, abnormal immunoglobulin level (HP:0010701). There is one exception to this rule: all measured drugs and toxins (entities not normally found in the human body) with normal results (results that were not outside of a given reference range) were annotated to the same HP concept as the clinically relevant result and logically negated. For example, Amphetamine [Presence] in Urine by Screen (LOINC:19343-3), a positive finding was mapped to a positive urine amphetamine test (HP:0500112) and a negative finding was mapped to a positive urine amphetamine test and logically negated (NOT HP:0500112).
LOINC2HPO currently aligns LOINC to HP. The current work extends existing LOINC2HPO annotations to match the OMOP2OBO mappings in the following two ways: (1) annotations were updated if new and/or more specific concepts had been added to the HP; and (2) existing mappings were expanded to include the measurement substance (body fluids, tissues, and organs via Uberon), the entity being measured (chemicals, metabolites, or hormones via ChEBI; cell types via CL; and proteins and protein complexes via PR), and the species of the measured entities (organism taxonomy via NCBITaxon). Consistent with LOINC2HPO, all measurements lacking sufficient specimen detail (those measured in non-specific body substances) were annotated as “Unspecified Sample” and all measurements without a valid result type were annotated as “Not Mapped test Type”. All modifications to the original LOINC2HPO annotations were meticulously recorded in the mapping evidence field enabling users to easily identify when an original LOINC2HPO annotation had been updated.
For this OMOP domain, the owl:complementOf (“not” and was used to model normal test results), owl:intersectionOf (“and”), and owl:unionOf (“or”) constructors were used to construct semantically expressive mappings.
Mapping Details
Mappings included in this set were generated automatically using OMOP2OBO or through the use of a Bag-of-words embedding model using TF-IDF. Cosine similarity is used to compute similarity scores between all pairwise combinations of OMOP and OBO concepts and ancestor concepts. To improve the efficiency of this process, the algorithm searches only the top 𝑛 most similar results and keeps the top 75th percentile among all pairs with scores >= 0.25. Manually created mappings are also included.
Mapping Categories
Automatic Exact - Concept: Exact label or synonym, dbXRef, or expert validated mapping @ concept-level; 1:1
Automatic Exact - Ancestor: Exact label or synonym, dbXRef, or expert validated mapping @ concept ancestor-level; 1:1
Automatic Constructor - Concept: Exact label or synonym, dbXRef, cosine similarity, or expert validated mapping @ concept-level; 1:Many
Automatic Constructor - Ancestor: Exact label or synonym, dbXRef, cosine similarity, or expert validated mapping @ concept-level; 1:Many
Manual: Hand mapping created using expert suggested resources; 1:1
Manual Constructor: Hand mapping created using expert suggested resources; 1:Many
Concept Similarity: score suggested mapping -- manually verified
UnMapped: No suitable mapping or not mapped type
Mapping Statistics
Additional statistics have been provided for the mappings and are shown in the table below. This table presents the counts of OMOP concepts by mapping category and ontology:
Mapping Category
HPO
UBERON
ChEBI
CL
PR
NCBITaxon
Automatic Exact - Concept
20
1981
268
129
19
286
Automatic Constructor - Concept
49
5
0
24
0
0
Automatic Exact - Ancestor
43
426
1149
5
5
207
Automatic Constructor - Ancestor
0
1
12
1
0
0
Concept Similarity
113
50
160
35
45
56
Manual
10663
319
1446
185
1590
2357
Manual Constructor
49
1118
528
18
133
196
UnMapped
184
184
529
3688
2296
982
Provenance and Versioning: The V1.0 deposited mappings were created by OMOP2OBO v1.0.0 on October 2022 using the OMOP Common Data Model V5.0 and OBO Foundry ontologies downloaded on September 14, 2020.
Caveats: Please note that these are the original mappings that were created for the preprint. They have not been updated to current versions of the ontologies. In our experience, this should result in very few errors, but we do suggest that you check the ontology concepts used against current versions of each ontology before using them.
Important Resources and Documentation
GitHub: OMOP2OBO
Project Wiki: OMOP2OBO - wiki
Zenodo Community: OMOP2OBO
Preprint Manuscript: 10.5281/zenodo.5716421
OMOP2OBO 测量映射 V1.0
本仓库中的映射构建于OMOP标准测量概念(即逻辑观察标识符名称和代码(LOINC))与多类本体之间,涵盖人类表型本体(Human Phenotype Ontology, HPO)、生物兴趣化学实体(Chemical Entities of Biological Interest, ChEBI)、疫苗本体(Vaccine Ontology, VO)、美国国家生物技术信息中心分类学本体(National Center for Biotechnology Information Taxon Ontology, NCBITaxon)、蛋白质本体(Protein Ontology, PRO)、细胞本体(Cell Ontology, CL)以及通用解剖本体(Uber-anatomy Ontology, UBERON)。
针对每项测量,映射覆盖了检测结果的全部区间(参考范围上限、下限及区间内结果),而非仅纳入临床相关结果。对于超出参考范围但当前未被认定为临床相关的结果(依据文献或领域专家咨询建议),需将其映射至最近的相关本体概念祖先。例如,针对“血清中天冬酰胺IgE抗体RAST检测[存在]”(LOINC:15547-3)的结果注释时,超出参考范围的结果将被标注为“植物源性食物过敏原特异性IgE抗体水平升高”(HP:0410228)。尽管该抗体水平偏低可能不具备临床相关性,但因其仍超出给定参考范围,故需映射至最近的适用概念祖先——“免疫球蛋白水平异常”(HP:0010701)。本规则存在一项例外:所有正常结果(未超出给定参考范围的检测结果)中的被测药物和毒素(人体正常情况下不存在的物质),需映射至与临床相关结果相同的HPO概念,并进行逻辑取反。例如,“尿液安非他明筛查检测[存在]”(LOINC:19343-3)中,阳性结果被映射至“尿液安非他明检测阳性”(HP:0500112),阴性结果则被映射至该概念并进行逻辑取反(NOT HP:0500112)。
现有LOINC2HPO工具仅实现LOINC至HPO的对齐。本研究将现有LOINC2HPO注释从以下两方面进行扩展,以匹配OMOP2OBO映射规则:(1)若人类表型本体(HPO)新增或更新了更具体的概念,则更新原有注释;(2)扩展现有映射,纳入测量物质(通过UBERON映射的体液、组织与器官)、被测实体(通过ChEBI映射的化学物质、代谢物或激素;通过CL映射的细胞类型;通过PRO映射的蛋白质与蛋白质复合物)以及被测实体的物种(通过NCBITaxon映射的生物体分类学信息)。与LOINC2HPO保持一致,所有缺乏足够样本细节的测量(在非特异性体液中完成的检测)将被标注为“样本未指定”,所有无有效结果类型的测量将被标注为“未映射检测类型”。对原始LOINC2HPO注释的所有修改均会被细致记录在映射证据字段中,方便用户快速识别原始LOINC2HPO注释的更新情况。
针对本OMOP领域,我们使用了OWL补集(owl:complementOf,对应“非”)、OWL交集(owl:intersectionOf,对应“与”)以及OWL并集(owl:unionOf,对应“或”)构造器,以构建语义表达能力更强的映射。
映射细节
本数据集包含的映射可通过OMOP2OBO自动生成,或通过基于词频-逆文档频率(TF-IDF)的词袋嵌入模型生成。我们使用余弦相似度计算所有OMOP概念与OBO本体概念及祖先概念的两两组合相似度得分。为提升流程效率,算法仅检索前𝑛个最相似的结果,并保留所有相似度得分≥0.25的配对中排名前75%的结果。本数据集同时纳入人工创建的映射。
映射类别
- 自动精确-概念级:概念层面的精确标签或同义词、数据库交叉引用(dbXRef)或专家验证映射;1:1
- 自动精确-祖先级:概念祖先层面的精确标签或同义词、数据库交叉引用(dbXRef)或专家验证映射;1:1
- 自动构造-概念级:概念层面的精确标签或同义词、数据库交叉引用(dbXRef)、余弦相似度或专家验证映射;1:多
- 自动构造-祖先级:概念层面的精确标签或同义词、数据库交叉引用(dbXRef)、余弦相似度或专家验证映射;1:多
- 手动:基于专家推荐资源创建的手动映射;1:1
- 手动构造:基于专家推荐资源创建的手动映射;1:多
- 概念相似度:经人工验证的得分推荐映射
- 未映射:无合适映射或未映射检测类型
映射统计
本数据集提供了映射的相关统计信息,如下表所示。该表按映射类别与本体统计了OMOP概念的数量:
| 映射类别 | HPO | UBERON | ChEBI | CL | PR | NCBITaxon |
|------------------------|--------|--------|-------|------|------|-----------|
| 自动精确-概念级 | 20 | 1981 | 268 | 129 | 19 | 286 |
| 自动构造-概念级 | 49 | 5 | 0 | 24 | 0 | 0 |
| 自动精确-祖先级 | 43 | 426 | 1149 | 5 | 5 | 207 |
| 自动构造-祖先级 | 0 | 1 | 12 | 1 | 0 | 0 |
| 概念相似度 | 113 | 50 | 160 | 35 | 45 | 56 |
| 手动 | 10663 | 319 | 1446 | 185 | 1590 | 2357 |
| 手动构造 | 49 | 1118 | 528 | 18 | 133 | 196 |
| 未映射 | 184 | 184 | 529 | 3688 | 2296 | 982 |
来源与版本控制
本次提交的V1.0版映射由OMOP2OBO v1.0.0于2022年10月创建,所用OMOP通用数据模型为V5.0版,所用OBO Foundry本体均下载于2020年9月14日。
注意事项
请注意,本数据集为预印本阶段创建的原始映射,尚未更新至各本体的最新版本。根据我们的经验,此类情况导致的错误极少,但建议您在使用前对照各本体的最新版本检查所用的本体概念。
重要资源与文档
- GitHub:OMOP2OBO
- 项目维基:OMOP2OBO - wiki
- Zenodo社区:OMOP2OBO
- 预印本手稿:10.5281/zenodo.5716421
创建时间:
2023-03-29



