bourdoiscatie/wikipedia_fr_2022_250K
收藏Hugging Face2024-06-05 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/bourdoiscatie/wikipedia_fr_2022_250K
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: title
dtype: string
- name: text
dtype: string
- name: wiki_id
dtype: int32
- name: views
dtype: float32
- name: paragraph_id
dtype: int32
- name: langs
dtype: int32
- name: emb
sequence: float32
splits:
- name: train
num_bytes: 1160521080
num_examples: 250000
download_size: 1258146076
dataset_size: 1160521080
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
The dataset includes features such as title (string type), text (string type), wiki_id (32-bit integer type), views (32-bit float type), paragraph_id (32-bit integer type), langs (32-bit integer type), and emb (32-bit float sequence type). The dataset is divided into a training set with 250000 samples, with a download size of 1258146076 bytes and an actual size of 1160521080 bytes.
提供机构:
bourdoiscatie
原始信息汇总
数据集概述
数据集特征
- title:字符串类型
- text:字符串类型
- wiki_id:整数类型,32位
- views:浮点数类型,32位
- paragraph_id:整数类型,32位
- langs:整数类型,32位
- emb:序列类型,浮点数类型,32位
数据集划分
- 训练集 (train):
- 样本数量:250000
- 数据大小:1160521080字节
数据集大小
- 下载大小:1258146076字节
- 数据集总大小:1160521080字节
配置信息
- 配置名称:default
- 数据文件:
- 划分:train
- 路径:data/train-*



