five

PhaniManda/autotrain-data-demo-on-token-classification

收藏
Hugging Face2023-06-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/PhaniManda/autotrain-data-demo-on-token-classification
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - token-classification --- # AutoTrain Dataset for project: demo-on-token-classification ## Dataset Description This dataset has been automatically processed by AutoTrain for project demo-on-token-classification. ### Languages The BCP-47 code for the dataset's language is unk. ## Dataset Structure ### Data Instances A sample from this dataset looks as follows: ```json [ { "tokens": [ "I", "will", "be", "traveling", "to", "Tokyo", "next", "month." ], "tags": [ 13, 13, 13, 13, 13, 1, 0, 5 ] }, { "tokens": [ "The", "company", "Apple", "Inc.", "is", "based", "in", "California." ], "tags": [ 13, 13, 3, 9, 13, 13, 13, 1 ] } ] ``` ### Dataset Fields The dataset has the following fields (also called "features"): ```json { "tokens": "Sequence(feature=Value(dtype='string', id=None), length=-1, id=None)", "tags": "Sequence(feature=ClassLabel(names=['B-DATE', 'B-LOC', 'B-MISC', 'B-ORG', 'B-PER', 'I-DATE', 'I-DATE,', 'I-LOC', 'I-MISC', 'I-ORG', 'I-ORG,', 'I-PER', 'I-PER,', 'O'], id=None), length=-1, id=None)" } ``` ### Dataset Splits This dataset is split into a train and validation split. The split sizes are as follow: | Split name | Num samples | | ------------ | ------------------- | | train | 21 | | valid | 9 |
提供机构:
PhaniManda
原始信息汇总

数据集概述

数据集名称

AutoTrain Dataset for project: demo-on-token-classification

任务类别

  • 词元分类(token-classification)

语言

  • 语言代码:unk

数据集结构

数据实例

  • 每个实例包含:
    • tokens: 文本序列
    • tags: 标签序列,对应标签包括:B-DATE, B-LOC, B-MISC, B-ORG, B-PER, I-DATE, I-DATE,, I-LOC, I-MISC, I-ORG, I-ORG,, I-PER, I-PER,, O

数据集字段

  • tokens: 字符串序列
  • tags: 分类标签序列

数据集分割

  • 训练集样本数:21
  • 验证集样本数:9
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作