tsch00001/wikipedia-eu-shuffled

Name: tsch00001/wikipedia-eu-shuffled
Creator: tsch00001
Published: 2025-01-26 15:40:35
License: 暂无描述

Hugging Face2025-01-26 更新2025-02-15 收录

下载链接：

https://hf-mirror.com/datasets/tsch00001/wikipedia-eu-shuffled

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含文本数据，每个样本都包括原始文本内容、文本的分词序列以及分词数量。数据集仅包含训练集部分，共有416,347个样本，总大小约为1.37GB。数据集适用于需要文本分析和处理的NLP任务。

This dataset contains text data, with each sample including the original text content, a tokenized sequence of the text, and the number of tokens. The dataset consists only of the training set, with a total of 416,347 samples and a total size of approximately 1.37GB. The dataset is suitable for NLP tasks that require text analysis and processing.

提供机构：

tsch00001

5,000+

优质数据集

54 个

任务类型

进入经典数据集