Sirawipa/hosxp_ner_code

Name: Sirawipa/hosxp_ner_code
Creator: Sirawipa
Published: 2024-07-19 09:41:30
License: 暂无描述

Hugging Face2024-07-19 更新2024-07-13 收录

下载链接：

https://hf-mirror.com/datasets/Sirawipa/hosxp_ner_code

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集主要用于命名实体识别（NER）任务，包含两个特征：words和ner。words是文本中的单词序列，而ner是对应的命名实体标签序列，包含多种实体类型如B-USAGE、I-USAGE、O等。数据集分为训练集和验证集，训练集有1537个样本，验证集有171个样本。数据集的下载大小为53355字节，总大小为598101字节。

This dataset is primarily used for Named Entity Recognition (NER) tasks and contains two features: words and ner. words is a sequence of words in the text, and ner is the corresponding sequence of named entity labels, including various entity types such as B-USAGE, I-USAGE, O, etc. The dataset is divided into a training set and a validation set, with 1537 samples in the training set and 171 samples in the validation set. The download size of the dataset is 53355 bytes, and the total size is 598101 bytes.

提供机构：

Sirawipa

原始信息汇总

数据集概述

特征

words: 字符串序列
ner: 命名实体识别序列，包含以下类别标签：
- 0: B-USAGE
- 1: I-USAGE
- 2: O
- 3: B-DOSE
- 4: B-FREQ
- 5: I-FREQ
- 6: B-TIME
- 7: I-TIME
- 8: B-UNIT
- 9: I-UNIT
- 10: I-DOSE

数据集划分

train:
- 样本数量: 1537
- 字节数: 538383
validation:
- 样本数量: 171
- 字节数: 59718

数据集大小

下载大小: 53355 字节
数据集总大小: 598101 字节

配置

default:
- 训练数据路径: data/train-*
- 验证数据路径: data/validation-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集