madaanpulkit/tab-wnut-flat
收藏Hugging Face2023-12-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/madaanpulkit/tab-wnut-flat
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: text
dtype: string
- name: tokens
sequence: string
- name: token_spans
sequence:
sequence: int64
- name: tags
sequence:
class_label:
names:
'0': '0'
'1': B-DIRECT-CODE
'2': I-DIRECT-CODE
'3': B-DIRECT-PERSON
'4': I-DIRECT-PERSON
'5': B-QUASI-DATETIME
'6': I-QUASI-DATETIME
'7': B-QUASI-PERSON
'8': I-QUASI-PERSON
'9': B-QUASI-LOC
'10': I-QUASI-LOC
'11': B-QUASI-QUANTITY
'12': I-QUASI-QUANTITY
'13': B-QUASI-CODE
'14': I-QUASI-CODE
'15': B-QUASI-ORG
'16': I-QUASI-ORG
'17': B-QUASI-DEM
'18': I-QUASI-DEM
'19': B-QUASI-MISC
'20': I-QUASI-MISC
'21': B-DIRECT-ORG
'22': I-DIRECT-ORG
'23': B-DIRECT-DATETIME
'24': I-DIRECT-DATETIME
'25': B-DIRECT-LOC
'26': I-DIRECT-LOC
'27': B-DIRECT-MISC
'28': I-DIRECT-MISC
'29': B-DIRECT-DEM
'30': I-DIRECT-DEM
- name: doc_id
dtype: string
splits:
- name: train
num_bytes: 67834874
num_examples: 1112
- name: dev
num_bytes: 19919192
num_examples: 541
- name: test
num_bytes: 20147904
num_examples: 555
download_size: 18198795
dataset_size: 107901970
---
# Dataset Card for "tab-wnut-flat"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
madaanpulkit
原始信息汇总
数据集概述
特征信息
- text: 数据类型为字符串。
- tokens: 序列类型,数据类型为字符串。
- token_spans: 序列类型,内部包含整数序列。
- tags: 序列类型,包含类别标签,标签名称如下:
- 0: 0
- 1: B-DIRECT-CODE
- 2: I-DIRECT-CODE
- 3: B-DIRECT-PERSON
- 4: I-DIRECT-PERSON
- 5: B-QUASI-DATETIME
- 6: I-QUASI-DATETIME
- 7: B-QUASI-PERSON
- 8: I-QUASI-PERSON
- 9: B-QUASI-LOC
- 10: I-QUASI-LOC
- 11: B-QUASI-QUANTITY
- 12: I-QUASI-QUANTITY
- 13: B-QUASI-CODE
- 14: I-QUASI-CODE
- 15: B-QUASI-ORG
- 16: I-QUASI-ORG
- 17: B-QUASI-DEM
- 18: I-QUASI-DEM
- 19: B-QUASI-MISC
- 20: I-QUASI-MISC
- 21: B-DIRECT-ORG
- 22: I-DIRECT-ORG
- 23: B-DIRECT-DATETIME
- 24: I-DIRECT-DATETIME
- 25: B-DIRECT-LOC
- 26: I-DIRECT-LOC
- 27: B-DIRECT-MISC
- 28: I-DIRECT-MISC
- 29: B-DIRECT-DEM
- 30: I-DIRECT-DEM
- doc_id: 数据类型为字符串。
数据分割
- train: 包含1112个样本,大小为67834874字节。
- dev: 包含541个样本,大小为19919192字节。
- test: 包含555个样本,大小为20147904字节。
数据集大小
- 下载大小: 18198795字节
- 数据集总大小: 107901970字节



