Daxtra/EducationParsingSFT-Roberta-BIOES-Augmentations
收藏Hugging Face2025-03-27 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Daxtra/EducationParsingSFT-Roberta-BIOES-Augmentations
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了文本数据,每个样本包括input_ids、attention_mask、labels、tokens和text五个字段。input_ids和attention_mask是整数序列,用于模型输入和注意力机制;labels是整数序列,可能用于监督学习任务的标签;tokens是分词后的文本序列;text是原始文本数据。数据集分为10个部分,每部分大约30000个样本,适合进行大规模文本处理任务。
The dataset includes text data, with each sample consisting of input_ids, attention_mask, labels, tokens, and text fields. input_ids and attention_mask are integer sequences used for model input and attention mechanism; labels are integer sequences that might be used as supervision labels for learning tasks; tokens are tokenized text sequences; and text is the original textual data. The dataset is divided into 10 parts, each with about 30,000 samples, suitable for large-scale text processing tasks.
提供机构:
Daxtra



