gabrielaltay/pubtator-central-bigbio-kb-2022-12-18
收藏Hugging Face2023-01-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/gabrielaltay/pubtator-central-bigbio-kb-2022-12-18
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: string
- name: document_id
dtype: string
- name: passages
list:
- name: id
dtype: string
- name: type
dtype: string
- name: text
sequence: string
- name: offsets
sequence:
list: int32
- name: entities
list:
- name: id
dtype: string
- name: type
dtype: string
- name: text
sequence: string
- name: offsets
sequence:
list: int32
- name: normalized
list:
- name: db_name
dtype: string
- name: db_id
dtype: string
- name: events
list:
- name: id
dtype: string
- name: type
dtype: string
- name: trigger
struct:
- name: text
sequence: string
- name: offsets
sequence:
list: int32
- name: arguments
list:
- name: role
dtype: string
- name: ref_id
dtype: string
- name: coreferences
list:
- name: id
dtype: string
- name: entity_ids
sequence: string
- name: relations
list:
- name: id
dtype: string
- name: type
dtype: string
- name: arg1_id
dtype: string
- name: arg2_id
dtype: string
- name: normalized
list:
- name: db_name
dtype: string
- name: db_id
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 101493304127
num_examples: 33653973
- name: validation
num_bytes: 2115702473
num_examples: 701124
- name: test
num_bytes: 2117460487
num_examples: 701125
download_size: 49786905438
dataset_size: 105726467087
---
# Dataset Card for "pubtator-central-bigbio-kb-2022-12-18"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
gabrielaltay
原始信息汇总
数据集概述
数据集名称
- 名称:pubtator-central-bigbio-kb-2022-12-18
数据集特征
- id:字符串类型
- document_id:字符串类型
- passages:列表类型,包含以下子特征:
- id:字符串类型
- type:字符串类型
- text:字符串序列类型
- offsets:整数序列列表类型
- entities:列表类型,包含以下子特征:
- id:字符串类型
- type:字符串类型
- text:字符串序列类型
- offsets:整数序列列表类型
- normalized:列表类型,包含以下子特征:
- db_name:字符串类型
- db_id:字符串类型
- events:列表类型,包含以下子特征:
- id:字符串类型
- type:字符串类型
- trigger:结构类型,包含以下子特征:
- text:字符串序列类型
- offsets:整数序列列表类型
- arguments:列表类型,包含以下子特征:
- role:字符串类型
- ref_id:字符串类型
- coreferences:列表类型,包含以下子特征:
- id:字符串类型
- entity_ids:字符串序列类型
- relations:列表类型,包含以下子特征:
- id:字符串类型
- type:字符串类型
- arg1_id:字符串类型
- arg2_id:字符串类型
- normalized:列表类型,包含以下子特征:
- db_name:字符串类型
- db_id:字符串类型
- text:字符串类型
数据集分割
- 训练集:
- 字节数:101493304127
- 示例数:33653973
- 验证集:
- 字节数:2115702473
- 示例数:701124
- 测试集:
- 字节数:2117460487
- 示例数:701125
数据集大小
- 下载大小:49786905438
- 数据集大小:105726467087



