tBiomedL: Larger Semantic Table Annotations Benchmark for Biomedical Domain
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10283118
下载链接
链接失效反馈官方服务:
资源简介:
tBiomedL is a dataset for tabular data to knowledge graph matching. It is derived for the Biodiversity domain and has two types of tables. On the one hand, Horizontal Relational Tables are where each table represents a collection of entities. On the other hand, Entity Tables represent a single entity. We supported ground truth data from Wikidata as a target knowledge graph (KG).
tBiomedL is generated by KG2Tables using five levels of a recursive hierarchy of related concepts in Wikidata. It is the successor work of tBiomed
tBiomedL contains 860,479 entity and horizontal tables, while this repository contains only a sample of 1% of the total of the entire benchmark with its ground truth data (gt). The Full size of this dataset is 27 GB. We will update this repository with the full dataset, including the test fold with its ground truth data in the Future.
Please get in touch if you are interested in the full dataset,
The supported tasks for semantic table annotations are:
Topic Detection (TD) links the entire table to an entity or a class from the target KG.
Cell Entity Annotation (CEA) maps individual table cells to entities from the target KG.
Column Type Annotation (CTA) links individual table columns to classes from the target KG.
Column Property Annotation (CPA) detects the relations between column pairs from the target knowledge graph.
Row Annotation (RA) annotates the entire row to a KG entity or property.
tBiomedL是一款面向表格数据-知识图谱匹配任务的数据集,其聚焦生物多样性(Biodiversity)领域,包含两类表格。其一为横向关系表格(Horizontal Relational Tables),每张表格代表一组实体集合;其二为实体表格(Entity Tables),每张表格仅对应单个实体。本数据集以维基数据(Wikidata)作为目标知识图谱(Knowledge Graph, KG),并提供了对应的真值标注数据。
tBiomedL由KG2Tables基于维基数据中相关概念的五层递归层级结构生成,是tBiomed的后续迭代工作。
tBiomedL总计包含860,479个实体表格与横向关系表格,本仓库仅发布了全部基准数据集1%的样本及其真值标注数据(ground truth, gt)。该数据集完整体积为27 GB,我们将在未来更新本仓库以发布完整数据集,其中包含带有真值标注的测试折(test fold)。
若您对完整数据集感兴趣,请与我们联系。
本数据集支持的语义表格标注任务包括:
1. 主题检测(Topic Detection, TD):将整张表格链接至目标知识图谱中的某一实体或类;
2. 单元格实体标注(Cell Entity Annotation, CEA):将单个表格单元格映射至目标知识图谱中的实体;
3. 列类型标注(Column Type Annotation, CTA):将单个表格列链接至目标知识图谱中的类;
4. 列属性标注(Column Property Annotation, CPA):识别目标知识图谱中列对之间的关联关系;
5. 行标注(Row Annotation, RA):将整行标注为知识图谱中的实体或属性。
创建时间:
2023-12-07



