orgcatorg/wikipedia
收藏Hugging Face2024-09-02 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/orgcatorg/wikipedia
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: bn
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: en_url
dtype: string
- name: en_title
dtype: string
- name: en_text
dtype: string
splits:
- name: train
num_bytes: 1167115208
num_examples: 156143
download_size: 441690826
dataset_size: 1167115208
- config_name: hi
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: en_url
dtype: string
- name: en_title
dtype: string
- name: en_text
dtype: string
splits:
- name: train
num_bytes: 793684300
num_examples: 166726
download_size: 302408181
dataset_size: 793684300
- config_name: id
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 1177273270
num_examples: 688206
download_size: 610697793
dataset_size: 1177273270
- config_name: ms
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 442552369
num_examples: 373189
download_size: 220484368
dataset_size: 442552369
- config_name: th
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: en_url
dtype: string
- name: en_title
dtype: string
- name: en_text
dtype: string
splits:
- name: train
num_bytes: 1185165416
num_examples: 165827
download_size: 460749899
dataset_size: 1185165416
- config_name: tl
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: en_url
dtype: string
- name: en_title
dtype: string
- name: en_text
dtype: string
splits:
- name: train
num_bytes: 135399058
num_examples: 47470
download_size: 73818762
dataset_size: 135399058
- config_name: vi
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: en_url
dtype: string
- name: en_title
dtype: string
- name: en_text
dtype: string
splits:
- name: train
num_bytes: 1938478921
num_examples: 1294721
download_size: 896915549
dataset_size: 1938478921
configs:
- config_name: bn
data_files:
- split: train
path: bn/train-*
- config_name: hi
data_files:
- split: train
path: hi/train-*
- config_name: id
data_files:
- split: train
path: id/train-*
- config_name: ms
data_files:
- split: train
path: ms/train-*
- config_name: th
data_files:
- split: train
path: th/train-*
- config_name: tl
data_files:
- split: train
path: tl/train-*
- config_name: vi
data_files:
- split: train
path: vi/train-*
---
提供机构:
orgcatorg
原始信息汇总
数据集概述
配置名称:bn
- 特征:
- id: 字符串类型
- url: 字符串类型
- title: 字符串类型
- text: 字符串类型
- 分割:
- 训练集:
- 数据量: 1038091832 字节
- 示例数: 152346
- 训练集:
- 下载大小: 370416687 字节
- 数据集大小: 1038091832 字节
配置名称:default
- 特征:
- id: 字符串类型
- url: 字符串类型
- title: 字符串类型
- text: 字符串类型
- 分割:
- 训练集:
- 数据量: 442552369 字节
- 示例数: 373189
- 训练集:
- 下载大小: 220484368 字节
- 数据集大小: 442552369 字节
配置名称:hi
- 特征:
- id: 字符串类型
- url: 字符串类型
- title: 字符串类型
- text: 字符串类型
- 分割:
- 训练集:
- 数据量: 688967167 字节
- 示例数: 165061
- 训练集:
- 下载大小: 243873402 字节
- 数据集大小: 688967167 字节
配置名称:th
- 特征:
- id: 字符串类型
- url: 字符串类型
- title: 字符串类型
- text: 字符串类型
- 分割:
- 训练集:
- 数据量: 1049519117 字节
- 示例数: 164082
- 训练集:
- 下载大小: 384393642 字节
- 数据集大小: 1049519117 字节
配置名称:vi
- 特征:
- id: 字符串类型
- url: 字符串类型
- title: 字符串类型
- text: 字符串类型
- 分割:
- 训练集:
- 数据量: 1639428813 字节
- 示例数: 1293101
- 训练集:
- 下载大小: 741817372 字节
- 数据集大小: 1639428813 字节



