Daxtra/EducationParsingSFT-Roberta-BIOES-Augmentations

Name: Daxtra/EducationParsingSFT-Roberta-BIOES-Augmentations
Creator: Daxtra
Published: 2025-03-27 14:51:46
License: 暂无描述

Hugging Face2025-03-27 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/Daxtra/EducationParsingSFT-Roberta-BIOES-Augmentations

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了文本数据，每个样本包括input_ids、attention_mask、labels、tokens和text五个字段。input_ids和attention_mask是整数序列，用于模型输入和注意力机制；labels是整数序列，可能用于监督学习任务的标签；tokens是分词后的文本序列；text是原始文本数据。数据集分为10个部分，每部分大约30000个样本，适合进行大规模文本处理任务。

The dataset includes text data, with each sample consisting of input_ids, attention_mask, labels, tokens, and text fields. input_ids and attention_mask are integer sequences used for model input and attention mechanism; labels are integer sequences that might be used as supervision labels for learning tasks; tokens are tokenized text sequences; and text is the original textual data. The dataset is divided into 10 parts, each with about 30,000 samples, suitable for large-scale text processing tasks.

提供机构：

Daxtra

5,000+

优质数据集

54 个

任务类型

进入经典数据集