Kyle1668/pythia-semantic-memorization-perplexities
收藏Hugging Face2023-09-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Kyle1668/pythia-semantic-memorization-perplexities
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: memories.deduped.12b
path: data/memories.deduped.12b-*
- split: memories.duped.12b
path: data/memories.duped.12b-*
- split: memories.duped.6.9b
path: data/memories.duped.6.9b-*
- split: pile.duped.6.9b
path: data/pile.duped.6.9b-*
- split: memories.duped.70m
path: data/memories.duped.70m-*
- split: memories.duped.160m
path: data/memories.duped.160m-*
- split: memories.duped.410m
path: data/memories.duped.410m-*
- split: pile.duped.70m
path: data/pile.duped.70m-*
- split: pile.duped.160m
path: data/pile.duped.160m-*
- split: pile.duped.410m
path: data/pile.duped.410m-*
- split: memories.duped.1.4b
path: data/memories.duped.1.4b-*
- split: memories.duped.1b
path: data/memories.duped.1b-*
- split: memories.duped.2.8b
path: data/memories.duped.2.8b-*
- split: pile.duped.1.4b
path: data/pile.duped.1.4b-*
- split: pile.duped.1b
path: data/pile.duped.1b-*
- split: pile.duped.2.8b
path: data/pile.duped.2.8b-*
- split: pile.duped.12b
path: data/pile.duped.12b-*
- split: memories.deduped.70m
path: data/memories.deduped.70m-*
- split: memories.deduped.160m
path: data/memories.deduped.160m-*
- split: memories.deduped.410m
path: data/memories.deduped.410m-*
- split: pile.deduped.70m
path: data/pile.deduped.70m-*
- split: pile.deduped.160m
path: data/pile.deduped.160m-*
- split: pile.deduped.410m
path: data/pile.deduped.410m-*
- split: memories.deduped.6.9b
path: data/memories.deduped.6.9b-*
- split: pile.deduped.6.9b
path: data/pile.deduped.6.9b-*
- split: pile.deduped.12b
path: data/pile.deduped.12b-*
- split: memories.deduped.2.8b
path: data/memories.deduped.2.8b-*
- split: pile.deduped.2.8b
path: data/pile.deduped.2.8b-*
- split: memories.deduped.1.4b
path: data/memories.deduped.1.4b-*
- split: memories.deduped.1b
path: data/memories.deduped.1b-*
- split: pile.deduped.1.4b
path: data/pile.deduped.1.4b-*
- split: pile.deduped.1b
path: data/pile.deduped.1b-*
dataset_info:
features:
- name: index
dtype: int32
- name: prompt_perplexity
dtype: float32
- name: generation_perplexity
dtype: float32
- name: sequence_perplexity
dtype: float32
splits:
- name: memories.deduped.12b
num_bytes: 29939456
num_examples: 1871216
- name: memories.duped.12b
num_bytes: 38117248
num_examples: 2382328
- name: memories.duped.6.9b
num_bytes: 33935616
num_examples: 2120976
- name: pile.duped.6.9b
num_bytes: 80000000
num_examples: 5000000
- name: memories.duped.70m
num_bytes: 7423248
num_examples: 463953
- name: memories.duped.160m
num_bytes: 11034768
num_examples: 689673
- name: memories.duped.410m
num_bytes: 15525456
num_examples: 970341
- name: pile.duped.70m
num_bytes: 80000000
num_examples: 5000000
- name: pile.duped.160m
num_bytes: 80000000
num_examples: 5000000
- name: pile.duped.410m
num_bytes: 80000000
num_examples: 5000000
- name: memories.duped.1.4b
num_bytes: 21979552
num_examples: 1373722
- name: memories.duped.1b
num_bytes: 20098256
num_examples: 1256141
- name: memories.duped.2.8b
num_bytes: 26801232
num_examples: 1675077
- name: pile.duped.1.4b
num_bytes: 80000000
num_examples: 5000000
- name: pile.duped.1b
num_bytes: 80000000
num_examples: 5000000
- name: pile.duped.2.8b
num_bytes: 80000000
num_examples: 5000000
- name: pile.duped.12b
num_bytes: 80000000
num_examples: 5000000
- name: memories.deduped.70m
num_bytes: 6583168
num_examples: 411448
- name: memories.deduped.160m
num_bytes: 9299120
num_examples: 581195
- name: memories.deduped.410m
num_bytes: 12976624
num_examples: 811039
- name: pile.deduped.70m
num_bytes: 80000000
num_examples: 5000000
- name: pile.deduped.160m
num_bytes: 80000000
num_examples: 5000000
- name: pile.deduped.410m
num_bytes: 80000000
num_examples: 5000000
- name: memories.deduped.6.9b
num_bytes: 26884704
num_examples: 1680294
- name: pile.deduped.6.9b
num_bytes: 80000000
num_examples: 5000000
- name: pile.deduped.12b
num_bytes: 80000000
num_examples: 5000000
- name: memories.deduped.2.8b
num_bytes: 21683376
num_examples: 1355211
- name: pile.deduped.2.8b
num_bytes: 80000000
num_examples: 5000000
- name: memories.deduped.1.4b
num_bytes: 16769552
num_examples: 1048097
- name: memories.deduped.1b
num_bytes: 16525840
num_examples: 1032865
- name: pile.deduped.1.4b
num_bytes: 80000000
num_examples: 5000000
- name: pile.deduped.1b
num_bytes: 80000000
num_examples: 5000000
download_size: 1891778367
dataset_size: 1595577216
---
# Dataset Card for "pythia-semantic-memorization-perplexities"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
配置:
- 配置名称:default
数据文件:
- 切分:memories.deduped.12b,路径:data/memories.deduped.12b-*
- 切分:memories.duped.12b,路径:data/memories.duped.12b-*
- 切分:memories.duped.6.9b,路径:data/memories.duped.6.9b-*
- 切分:pile.duped.6.9b,路径:data/pile.duped.6.9b-*
- 切分:memories.duped.70m,路径:data/memories.duped.70m-*
- 切分:memories.duped.160m,路径:data/memories.duped.160m-*
- 切分:memories.duped.410m,路径:data/memories.duped.410m-*
- 切分:pile.duped.70m,路径:data/pile.duped.70m-*
- 切分:pile.duped.160m,路径:data/pile.duped.160m-*
- 切分:pile.duped.410m,路径:data/pile.duped.410m-*
- 切分:memories.duped.1.4b,路径:data/memories.duped.1.4b-*
- 切分:memories.duped.1b,路径:data/memories.duped.1b-*
- 切分:memories.duped.2.8b,路径:data/memories.duped.2.8b-*
- 切分:pile.duped.1.4b,路径:data/pile.duped.1.4b-*
- 切分:pile.duped.1b,路径:data/pile.duped.1b-*
- 切分:pile.duped.2.8b,路径:data/pile.duped.2.8b-*
- 切分:pile.duped.12b,路径:data/pile.duped.12b-*
- 切分:memories.deduped.70m,路径:data/memories.deduped.70m-*
- 切分:memories.deduped.160m,路径:data/memories.deduped.160m-*
- 切分:memories.deduped.410m,路径:data/memories.deduped.410m-*
- 切分:pile.deduped.70m,路径:data/pile.deduped.70m-*
- 切分:pile.deduped.160m,路径:data/pile.deduped.160m-*
- 切分:pile.deduped.410m,路径:data/pile.deduped.410m-*
- 切分:memories.deduped.6.9b,路径:data/memories.deduped.6.9b-*
- 切分:pile.deduped.6.9b,路径:data/pile.deduped.6.9b-*
- 切分:pile.deduped.12b,路径:data/pile.deduped.12b-*
- 切分:memories.deduped.2.8b,路径:data/memories.deduped.2.8b-*
- 切分:pile.deduped.2.8b,路径:data/pile.deduped.2.8b-*
- 切分:memories.deduped.1.4b,路径:data/memories.deduped.1.4b-*
- 切分:memories.deduped.1b,路径:data/memories.deduped.1b-*
- 切分:pile.deduped.1.4b,路径:data/pile.deduped.1.4b-*
- 切分:pile.deduped.1b,路径:data/pile.deduped.1b-*
数据集信息:
特征:
- 名称:index,数据类型:int32(32位整数)
- 名称:prompt_perplexity(提示困惑度),数据类型:float32(32位浮点数)
- 名称:generation_perplexity(生成困惑度),数据类型:float32(32位浮点数)
- 名称:sequence_perplexity(序列困惑度),数据类型:float32(32位浮点数)
切分详情:
- 名称:memories.deduped.12b,字节数:29939456,样本数量:1871216
- 名称:memories.duped.12b,字节数:38117248,样本数量:2382328
- 名称:memories.duped.6.9b,字节数:33935616,样本数量:2120976
- 名称:pile.duped.6.9b,字节数:80000000,样本数量:5000000
- 名称:memories.duped.70m,字节数:7423248,样本数量:463953
- 名称:memories.duped.160m,字节数:11034768,样本数量:689673
- 名称:memories.duped.410m,字节数:15525456,样本数量:970341
- 名称:pile.duped.70m,字节数:80000000,样本数量:5000000
- 名称:pile.duped.160m,字节数:80000000,样本数量:5000000
- 名称:pile.duped.410m,字节数:80000000,样本数量:5000000
- 名称:memories.duped.1.4b,字节数:21979552,样本数量:1373722
- 名称:memories.duped.1b,字节数:20098256,样本数量:1256141
- 名称:memories.duped.2.8b,字节数:26801232,样本数量:1675077
- 名称:pile.duped.1.4b,字节数:80000000,样本数量:5000000
- 名称:pile.duped.1b,字节数:80000000,样本数量:5000000
- 名称:pile.duped.2.8b,字节数:80000000,样本数量:5000000
- 名称:pile.duped.12b,字节数:80000000,样本数量:5000000
- 名称:memories.deduped.70m,字节数:6583168,样本数量:411448
- 名称:memories.deduped.160m,字节数:9299120,样本数量:581195
- 名称:memories.deduped.410m,字节数:12976624,样本数量:811039
- 名称:pile.deduped.70m,字节数:80000000,样本数量:5000000
- 名称:pile.deduped.160m,字节数:80000000,样本数量:5000000
- 名称:pile.deduped.410m,字节数:80000000,样本数量:5000000
- 名称:memories.deduped.6.9b,字节数:26884704,样本数量:1680294
- 名称:pile.deduped.6.9b,字节数:80000000,样本数量:5000000
- 名称:pile.deduped.12b,字节数:80000000,样本数量:5000000
- 名称:memories.deduped.2.8b,字节数:21683376,样本数量:1355211
- 名称:pile.deduped.2.8b,字节数:80000000,样本数量:5000000
- 名称:memories.deduped.1.4b,字节数:16769552,样本数量:1048097
- 名称:memories.deduped.1b,字节数:16525840,样本数量:1032865
- 名称:pile.deduped.1.4b,字节数:80000000,样本数量:5000000
- 名称:pile.deduped.1b,字节数:80000000,样本数量:5000000
下载总大小:1891778367 字节
数据集总大小:1595577216 字节
# “pythia-semantic-memorization-perplexities” 数据集卡片
[需补充更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
Kyle1668
原始信息汇总
数据集概述
数据集配置
- config_name: default
- data_files: 包含多个子集,每个子集有不同的文件路径和大小,如
memories.deduped.12b,memories.duped.12b等。
数据集信息
-
features:
- name: index, prompt_perplexity, generation_perplexity, sequence_perplexity
- dtype: int32, float32, float32, float32
-
splits:
- name: 多个子集名称,如
memories.deduped.12b,memories.duped.12b等。 - num_bytes: 各子集的字节数,例如
memories.deduped.12b为29939456字节。 - num_examples: 各子集的示例数,例如
memories.deduped.12b有1871216个示例。
- name: 多个子集名称,如
-
download_size: 1891778367
-
dataset_size: 1595577216



