uzair921/SKILLSPAN
收藏Hugging Face2024-05-19 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/uzair921/SKILLSPAN
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: tokens
sequence: string
- name: ner_tags
sequence:
class_label:
names:
'0': O
'1': B-Skill
'2': I-Skill
splits:
- name: train
num_bytes: 1652752
num_examples: 3076
- name: validation
num_bytes: 715196
num_examples: 1397
- name: test
num_bytes: 758463
num_examples: 1523
download_size: 550979
dataset_size: 3126411
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
---
This dataset is primarily used for natural language processing tasks, particularly Named Entity Recognition (NER). It includes two main features: tokens and ner_tags. tokens is a sequence of words or symbols in the text, while ner_tags is a corresponding sequence of tags used to identify the beginning (B-Skill) and inside (I-Skill) of skills. The dataset is divided into train, validation, and test sets, containing 3076, 1397, and 1523 samples respectively. The total download size of the dataset is 550979 bytes, and the actual size is 3126411 bytes.
提供机构:
uzair921
原始信息汇总
数据集概述
特征
- tokens:
- 类型: 字符串序列
- ner_tags:
- 类型: 序列
- 类别标签:
- 0: O
- 1: B-Skill
- 2: I-Skill
数据分割
- train:
- 样本数量: 3076
- 字节数: 1652752
- validation:
- 样本数量: 1397
- 字节数: 715196
- test:
- 样本数量: 1523
- 字节数: 758463
数据集大小
- 下载大小: 550979 字节
- 总大小: 3126411 字节
配置
- config_name: default
- 数据文件路径:
- train: data/train-*
- validation: data/validation-*
- test: data/test-*
- 数据文件路径:



