iammytoo/wiki20240601

Name: iammytoo/wiki20240601
Creator: iammytoo
Published: 2024-06-19 03:44:47
License: 暂无描述

Hugging Face2024-06-19 更新2024-06-29 收录

下载链接：

https://hf-mirror.com/datasets/iammytoo/wiki20240601

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含多个特征字段，如id、标题、文本、段落、摘要、wikitext、创建日期、修改日期、模板和URL。段落字段进一步细分为段落ID、标签、文本和标题。数据集分为训练集，包含6,943,871个样本，总大小为106,858,849,511字节。下载大小为53,895,429,870字节。

The dataset includes multiple feature fields such as id, title, text, paragraphs, abstract, wikitext, date_created, date_modified, templates, and URL. The paragraphs field is further divided into paragraph_id, tag, text, and title. The dataset is divided into a training set containing 6,943,871 samples with a total size of 106,858,849,511 bytes. The download size is 53,895,429,870 bytes.

提供机构：

iammytoo

原始信息汇总

数据集概述

数据集特征

id: 数据类型为 int64
title: 数据类型为 string
text: 数据类型为 string
paragraphs: 包含以下子特征
- paragraph_id: 数据类型为 int64
- tag: 数据类型为 string
- text: 数据类型为 string
- title: 数据类型为 string
abstract: 数据类型为 string
wikitext: 数据类型为 string
date_created: 数据类型为 string
date_modified: 数据类型为 string
templates: 数据类型为 sequence 的 string
url: 数据类型为 string

数据集分割

train: 包含 6,943,871 个样本，占用 106,858,849,511 字节

数据集大小

下载大小: 53,895,429,870 字节
数据集大小: 106,858,849,511 字节

配置信息

config_name: default
data_files:
- split: train
- path: data/train-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集