mismayil/tr_wikipedia
收藏Hugging Face2024-05-15 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/mismayil/tr_wikipedia
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
dataset_info:
features:
- name: id
dtype: int64
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 986434058
num_examples: 530000
- name: validation
num_bytes: 9689504
num_examples: 4454
download_size: 546073184
dataset_size: 996123562
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
---
The dataset includes four features: id (int64), url (string), title (string), and text (string). It is divided into two parts: a training set (containing 530,000 samples with a total size of 986,434,058 bytes) and a validation set (containing 4,454 samples with a total size of 9,689,504 bytes). The total download size of the dataset is 546,073,184 bytes, and the total dataset size is 996,123,562 bytes. The dataset is configured as default, with the training and validation data files stored in the paths data/train-* and data/validation-* respectively.
提供机构:
mismayil
原始信息汇总
数据集概述
许可证
- MIT
数据集信息
特征
- id: 数据类型为
int64 - url: 数据类型为
string - title: 数据类型为
string - text: 数据类型为
string
分割
- train:
- 字节数: 986434058
- 样本数: 530000
- validation:
- 字节数: 9689504
- 样本数: 4454
大小
- 下载大小: 546073184
- 数据集大小: 996123562
配置
- default:
- 数据文件:
- train:
data/train-* - validation:
data/validation-*
- train:
- 数据文件:



