tsch00001/wikipedia-ar-shuffled

Name: tsch00001/wikipedia-ar-shuffled
Creator: tsch00001
Published: 2025-01-26 16:57:20
License: 暂无描述

Hugging Face2025-01-26 更新2025-02-15 收录

下载链接：

https://hf-mirror.com/datasets/tsch00001/wikipedia-ar-shuffled

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含文本内容（text）、文本分词后的序列（tokens）和文本中的分词数量（token_count）三个特征。训练集（train）包含约1219201个样本，总文件大小约为6.18GB。数据集适用于自然语言处理任务，如文本分类、分词等。

The dataset includes three features: text content (text), tokenized sequence of the text (tokens), and the count of tokens in the text. The training set (train) contains approximately 1,219,201 samples with a total file size of about 6.18GB. This dataset is suitable for natural language processing tasks such as text classification, tokenization, etc.

提供机构：

tsch00001

5,000+

优质数据集

54 个

任务类型

进入经典数据集