PhaniManda/autotrain-data-demo-on-token-classification
收藏Hugging Face2023-06-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/PhaniManda/autotrain-data-demo-on-token-classification
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- token-classification
---
# AutoTrain Dataset for project: demo-on-token-classification
## Dataset Description
This dataset has been automatically processed by AutoTrain for project demo-on-token-classification.
### Languages
The BCP-47 code for the dataset's language is unk.
## Dataset Structure
### Data Instances
A sample from this dataset looks as follows:
```json
[
{
"tokens": [
"I",
"will",
"be",
"traveling",
"to",
"Tokyo",
"next",
"month."
],
"tags": [
13,
13,
13,
13,
13,
1,
0,
5
]
},
{
"tokens": [
"The",
"company",
"Apple",
"Inc.",
"is",
"based",
"in",
"California."
],
"tags": [
13,
13,
3,
9,
13,
13,
13,
1
]
}
]
```
### Dataset Fields
The dataset has the following fields (also called "features"):
```json
{
"tokens": "Sequence(feature=Value(dtype='string', id=None), length=-1, id=None)",
"tags": "Sequence(feature=ClassLabel(names=['B-DATE', 'B-LOC', 'B-MISC', 'B-ORG', 'B-PER', 'I-DATE', 'I-DATE,', 'I-LOC', 'I-MISC', 'I-ORG', 'I-ORG,', 'I-PER', 'I-PER,', 'O'], id=None), length=-1, id=None)"
}
```
### Dataset Splits
This dataset is split into a train and validation split. The split sizes are as follow:
| Split name | Num samples |
| ------------ | ------------------- |
| train | 21 |
| valid | 9 |
提供机构:
PhaniManda
原始信息汇总
数据集概述
数据集名称
AutoTrain Dataset for project: demo-on-token-classification
任务类别
- 词元分类(token-classification)
语言
- 语言代码:unk
数据集结构
数据实例
- 每个实例包含:
tokens: 文本序列tags: 标签序列,对应标签包括:B-DATE, B-LOC, B-MISC, B-ORG, B-PER, I-DATE, I-DATE,, I-LOC, I-MISC, I-ORG, I-ORG,, I-PER, I-PER,, O
数据集字段
tokens: 字符串序列tags: 分类标签序列
数据集分割
- 训练集样本数:21
- 验证集样本数:9



