pietrolesci/wikitext-103-raw-v1_gpt2-20k
收藏Hugging Face2023-11-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/pietrolesci/wikitext-103-raw-v1_gpt2-20k
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: packed
features:
- name: input_ids
sequence: int64
- name: uid
dtype: int64
splits:
- name: test
num_bytes: 2313688
num_examples: 1124
- name: train
num_bytes: 968728180
num_examples: 470257
- name: validation
num_bytes: 2027524
num_examples: 985
download_size: 0
dataset_size: 973069392
- config_name: surprisals
features:
- name: surprisals
sequence: float64
- name: token_ids
sequence: int64
- name: uids
dtype: int64
- name: batch_idx
dtype: int64
- name: step
dtype: int64
splits:
- name: step_10999_validation
num_bytes: 4050320
num_examples: 985
- name: step_10999_train
num_bytes: 1933696784
num_examples: 470257
- name: step_3999_validation
num_bytes: 4050320
num_examples: 985
- name: step_3999_train
num_bytes: 1933696784
num_examples: 470257
- name: step_0_validation
num_bytes: 4050320
num_examples: 985
- name: step_0_train
num_bytes: 1933696784
num_examples: 470257
- name: step_999_validation
num_bytes: 4050320
num_examples: 985
- name: step_999_train
num_bytes: 1933696784
num_examples: 470257
- name: step_4999_train
num_bytes: 1933696784
num_examples: 470257
- name: step_4999_validation
num_bytes: 4050320
num_examples: 985
- name: step_1999_train
num_bytes: 1933696784
num_examples: 470257
- name: step_1999_validation
num_bytes: 4050320
num_examples: 985
- name: train
num_bytes: 1933696784
num_examples: 470257
- name: step_8999_train
num_bytes: 1933696784
num_examples: 470257
- name: step_8999_validation
num_bytes: 4050320
num_examples: 985
- name: step_7999_train
num_bytes: 1933696784
num_examples: 470257
- name: step_7999_validation
num_bytes: 4050320
num_examples: 985
- name: step_13999_train
num_bytes: 1933696784
num_examples: 470257
- name: step_13999_validation
num_bytes: 4050320
num_examples: 985
- name: step_2999_validation
num_bytes: 4050320
num_examples: 985
- name: step_2999_train
num_bytes: 1933696784
num_examples: 470257
- name: step_11999_train
num_bytes: 1933696784
num_examples: 470257
- name: step_11999_validation
num_bytes: 4050320
num_examples: 985
- name: step_12999_validation
num_bytes: 4050320
num_examples: 985
- name: step_12999_train
num_bytes: 1933696784
num_examples: 470257
- name: step_6999_train
num_bytes: 1933696784
num_examples: 470257
- name: step_6999_validation
num_bytes: 4050320
num_examples: 985
- name: step_9999_train
num_bytes: 1933696784
num_examples: 470257
- name: step_9999_validation
num_bytes: 4050320
num_examples: 985
- name: step_5999_validation
num_bytes: 4050320
num_examples: 985
- name: step_5999_train
num_bytes: 1933696784
num_examples: 470257
download_size: 21176694739
dataset_size: 30999903344
configs:
- config_name: packed
data_files:
- split: test
path: packed/test-*
- split: train
path: packed/train-*
- split: validation
path: packed/validation-*
- config_name: surprisals
data_files:
- split: step_10999_validation
path: surprisals/step_10999_validation-*
- split: step_10999_train
path: surprisals/step_10999_train-*
- split: step_3999_validation
path: surprisals/step_3999_validation-*
- split: step_3999_train
path: surprisals/step_3999_train-*
- split: step_0_validation
path: surprisals/step_0_validation-*
- split: step_0_train
path: surprisals/step_0_train-*
- split: step_999_validation
path: surprisals/step_999_validation-*
- split: step_999_train
path: surprisals/step_999_train-*
- split: step_4999_train
path: surprisals/step_4999_train-*
- split: step_4999_validation
path: surprisals/step_4999_validation-*
- split: step_1999_train
path: surprisals/step_1999_train-*
- split: step_1999_validation
path: surprisals/step_1999_validation-*
- split: train
path: surprisals/train-*
- split: step_8999_train
path: surprisals/step_8999_train-*
- split: step_8999_validation
path: surprisals/step_8999_validation-*
- split: step_7999_train
path: surprisals/step_7999_train-*
- split: step_7999_validation
path: surprisals/step_7999_validation-*
- split: step_13999_train
path: surprisals/step_13999_train-*
- split: step_13999_validation
path: surprisals/step_13999_validation-*
- split: step_2999_validation
path: surprisals/step_2999_validation-*
- split: step_2999_train
path: surprisals/step_2999_train-*
- split: step_11999_train
path: surprisals/step_11999_train-*
- split: step_11999_validation
path: surprisals/step_11999_validation-*
- split: step_12999_validation
path: surprisals/step_12999_validation-*
- split: step_12999_train
path: surprisals/step_12999_train-*
- split: step_6999_train
path: surprisals/step_6999_train-*
- split: step_6999_validation
path: surprisals/step_6999_validation-*
- split: step_9999_train
path: surprisals/step_9999_train-*
- split: step_9999_validation
path: surprisals/step_9999_validation-*
- split: step_5999_validation
path: surprisals/step_5999_validation-*
- split: step_5999_train
path: surprisals/step_5999_train-*
---
# Dataset Card for "wikitext-103-raw-v1_gpt2-20k"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
pietrolesci
原始信息汇总
数据集概述
数据集配置
-
配置名称: packed
- 特征:
input_ids: 序列类型,int64uid: 数据类型,int64
- 分割:
test: 字节数 2313688,样本数 1124train: 字节数 968728180,样本数 470257validation: 字节数 2027524,样本数 985
- 下载大小: 0
- 数据集大小: 973069392
- 特征:
-
配置名称: surprisals
- 特征:
surprisals: 序列类型,float64token_ids: 序列类型,int64uids: 数据类型,int64batch_idx: 数据类型,int64step: 数据类型,int64
- 分割:
step_10999_validation: 字节数 4050320,样本数 985step_10999_train: 字节数 1933696784,样本数 470257step_3999_validation: 字节数 4050320,样本数 985step_3999_train: 字节数 1933696784,样本数 470257step_0_validation: 字节数 4050320,样本数 985step_0_train: 字节数 1933696784,样本数 470257step_999_validation: 字节数 4050320,样本数 985step_999_train: 字节数 1933696784,样本数 470257step_4999_train: 字节数 1933696784,样本数 470257step_4999_validation: 字节数 4050320,样本数 985step_1999_train: 字节数 1933696784,样本数 470257step_1999_validation: 字节数 4050320,样本数 985train: 字节数 1933696784,样本数 470257step_8999_train: 字节数 1933696784,样本数 470257step_8999_validation: 字节数 4050320,样本数 985step_7999_train: 字节数 1933696784,样本数 470257step_7999_validation: 字节数 4050320,样本数 985step_13999_train: 字节数 1933696784,样本数 470257step_13999_validation: 字节数 4050320,样本数 985step_2999_validation: 字节数 4050320,样本数 985step_2999_train: 字节数 1933696784,样本数 470257step_11999_train: 字节数 1933696784,样本数 470257step_11999_validation: 字节数 4050320,样本数 985step_12999_validation: 字节数 4050320,样本数 985step_12999_train: 字节数 1933696784,样本数 470257step_6999_train: 字节数 1933696784,样本数 470257step_6999_validation: 字节数 4050320,样本数 985step_9999_train: 字节数 1933696784,样本数 470257step_9999_validation: 字节数 4050320,样本数 985step_5999_validation: 字节数 4050320,样本数 985step_5999_train: 字节数 1933696784,样本数 470257
- 下载大小: 21176694739
- 数据集大小: 30999903344
- 特征:
数据文件配置
-
配置名称: packed
- 数据文件:
test: 路径packed/test-*train: 路径packed/train-*validation: 路径packed/validation-*
- 数据文件:
-
配置名称: surprisals
- 数据文件:
step_10999_validation: 路径surprisals/step_10999_validation-*step_10999_train: 路径surprisals/step_10999_train-*step_3999_validation: 路径surprisals/step_3999_validation-*step_3999_train: 路径surprisals/step_3999_train-*step_0_validation: 路径surprisals/step_0_validation-*step_0_train: 路径surprisals/step_0_train-*step_999_validation: 路径surprisals/step_999_validation-*step_999_train: 路径surprisals/step_999_train-*step_4999_train: 路径surprisals/step_4999_train-*step_4999_validation: 路径surprisals/step_4999_validation-*step_1999_train: 路径surprisals/step_1999_train-*step_1999_validation: 路径surprisals/step_1999_validation-*train: 路径surprisals/train-*step_8999_train: 路径surprisals/step_8999_train-*step_8999_validation: 路径surprisals/step_8999_validation-*step_7999_train: 路径surprisals/step_7999_train-*step_7999_validation: 路径surprisals/step_7999_validation-*step_13999_train: 路径surprisals/step_13999_train-*step_13999_validation: 路径surprisals/step_13999_validation-*step_2999_validation: 路径surprisals/step_2999_validation-*step_2999_train: 路径surprisals/step_2999_train-*step_11999_train: 路径surprisals/step_11999_train-*step_11999_validation: 路径surprisals/step_11999_validation-*step_12999_validation: 路径surprisals/step_12999_validation-*step_12999_train: 路径surprisals/step_12999_train-*step_6999_train: 路径surprisals/step_6999_train-*step_6999_validation: 路径surprisals/step_6999_validation-*step_9999_train: 路径surprisals/step_9999_train-*step_9999_validation: 路径surprisals/step_9999_validation-*step_5999_validation: 路径surprisals/step_5999_validation-*step_5999_train: 路径surprisals/step_5999_train-*
- 数据文件:



