five

Kyle1668/pythia-semantic-memorization-perplexities

收藏
Hugging Face2023-09-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Kyle1668/pythia-semantic-memorization-perplexities
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: default data_files: - split: memories.deduped.12b path: data/memories.deduped.12b-* - split: memories.duped.12b path: data/memories.duped.12b-* - split: memories.duped.6.9b path: data/memories.duped.6.9b-* - split: pile.duped.6.9b path: data/pile.duped.6.9b-* - split: memories.duped.70m path: data/memories.duped.70m-* - split: memories.duped.160m path: data/memories.duped.160m-* - split: memories.duped.410m path: data/memories.duped.410m-* - split: pile.duped.70m path: data/pile.duped.70m-* - split: pile.duped.160m path: data/pile.duped.160m-* - split: pile.duped.410m path: data/pile.duped.410m-* - split: memories.duped.1.4b path: data/memories.duped.1.4b-* - split: memories.duped.1b path: data/memories.duped.1b-* - split: memories.duped.2.8b path: data/memories.duped.2.8b-* - split: pile.duped.1.4b path: data/pile.duped.1.4b-* - split: pile.duped.1b path: data/pile.duped.1b-* - split: pile.duped.2.8b path: data/pile.duped.2.8b-* - split: pile.duped.12b path: data/pile.duped.12b-* - split: memories.deduped.70m path: data/memories.deduped.70m-* - split: memories.deduped.160m path: data/memories.deduped.160m-* - split: memories.deduped.410m path: data/memories.deduped.410m-* - split: pile.deduped.70m path: data/pile.deduped.70m-* - split: pile.deduped.160m path: data/pile.deduped.160m-* - split: pile.deduped.410m path: data/pile.deduped.410m-* - split: memories.deduped.6.9b path: data/memories.deduped.6.9b-* - split: pile.deduped.6.9b path: data/pile.deduped.6.9b-* - split: pile.deduped.12b path: data/pile.deduped.12b-* - split: memories.deduped.2.8b path: data/memories.deduped.2.8b-* - split: pile.deduped.2.8b path: data/pile.deduped.2.8b-* - split: memories.deduped.1.4b path: data/memories.deduped.1.4b-* - split: memories.deduped.1b path: data/memories.deduped.1b-* - split: pile.deduped.1.4b path: data/pile.deduped.1.4b-* - split: pile.deduped.1b path: data/pile.deduped.1b-* dataset_info: features: - name: index dtype: int32 - name: prompt_perplexity dtype: float32 - name: generation_perplexity dtype: float32 - name: sequence_perplexity dtype: float32 splits: - name: memories.deduped.12b num_bytes: 29939456 num_examples: 1871216 - name: memories.duped.12b num_bytes: 38117248 num_examples: 2382328 - name: memories.duped.6.9b num_bytes: 33935616 num_examples: 2120976 - name: pile.duped.6.9b num_bytes: 80000000 num_examples: 5000000 - name: memories.duped.70m num_bytes: 7423248 num_examples: 463953 - name: memories.duped.160m num_bytes: 11034768 num_examples: 689673 - name: memories.duped.410m num_bytes: 15525456 num_examples: 970341 - name: pile.duped.70m num_bytes: 80000000 num_examples: 5000000 - name: pile.duped.160m num_bytes: 80000000 num_examples: 5000000 - name: pile.duped.410m num_bytes: 80000000 num_examples: 5000000 - name: memories.duped.1.4b num_bytes: 21979552 num_examples: 1373722 - name: memories.duped.1b num_bytes: 20098256 num_examples: 1256141 - name: memories.duped.2.8b num_bytes: 26801232 num_examples: 1675077 - name: pile.duped.1.4b num_bytes: 80000000 num_examples: 5000000 - name: pile.duped.1b num_bytes: 80000000 num_examples: 5000000 - name: pile.duped.2.8b num_bytes: 80000000 num_examples: 5000000 - name: pile.duped.12b num_bytes: 80000000 num_examples: 5000000 - name: memories.deduped.70m num_bytes: 6583168 num_examples: 411448 - name: memories.deduped.160m num_bytes: 9299120 num_examples: 581195 - name: memories.deduped.410m num_bytes: 12976624 num_examples: 811039 - name: pile.deduped.70m num_bytes: 80000000 num_examples: 5000000 - name: pile.deduped.160m num_bytes: 80000000 num_examples: 5000000 - name: pile.deduped.410m num_bytes: 80000000 num_examples: 5000000 - name: memories.deduped.6.9b num_bytes: 26884704 num_examples: 1680294 - name: pile.deduped.6.9b num_bytes: 80000000 num_examples: 5000000 - name: pile.deduped.12b num_bytes: 80000000 num_examples: 5000000 - name: memories.deduped.2.8b num_bytes: 21683376 num_examples: 1355211 - name: pile.deduped.2.8b num_bytes: 80000000 num_examples: 5000000 - name: memories.deduped.1.4b num_bytes: 16769552 num_examples: 1048097 - name: memories.deduped.1b num_bytes: 16525840 num_examples: 1032865 - name: pile.deduped.1.4b num_bytes: 80000000 num_examples: 5000000 - name: pile.deduped.1b num_bytes: 80000000 num_examples: 5000000 download_size: 1891778367 dataset_size: 1595577216 --- # Dataset Card for "pythia-semantic-memorization-perplexities" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

配置: - 配置名称:default 数据文件: - 切分:memories.deduped.12b,路径:data/memories.deduped.12b-* - 切分:memories.duped.12b,路径:data/memories.duped.12b-* - 切分:memories.duped.6.9b,路径:data/memories.duped.6.9b-* - 切分:pile.duped.6.9b,路径:data/pile.duped.6.9b-* - 切分:memories.duped.70m,路径:data/memories.duped.70m-* - 切分:memories.duped.160m,路径:data/memories.duped.160m-* - 切分:memories.duped.410m,路径:data/memories.duped.410m-* - 切分:pile.duped.70m,路径:data/pile.duped.70m-* - 切分:pile.duped.160m,路径:data/pile.duped.160m-* - 切分:pile.duped.410m,路径:data/pile.duped.410m-* - 切分:memories.duped.1.4b,路径:data/memories.duped.1.4b-* - 切分:memories.duped.1b,路径:data/memories.duped.1b-* - 切分:memories.duped.2.8b,路径:data/memories.duped.2.8b-* - 切分:pile.duped.1.4b,路径:data/pile.duped.1.4b-* - 切分:pile.duped.1b,路径:data/pile.duped.1b-* - 切分:pile.duped.2.8b,路径:data/pile.duped.2.8b-* - 切分:pile.duped.12b,路径:data/pile.duped.12b-* - 切分:memories.deduped.70m,路径:data/memories.deduped.70m-* - 切分:memories.deduped.160m,路径:data/memories.deduped.160m-* - 切分:memories.deduped.410m,路径:data/memories.deduped.410m-* - 切分:pile.deduped.70m,路径:data/pile.deduped.70m-* - 切分:pile.deduped.160m,路径:data/pile.deduped.160m-* - 切分:pile.deduped.410m,路径:data/pile.deduped.410m-* - 切分:memories.deduped.6.9b,路径:data/memories.deduped.6.9b-* - 切分:pile.deduped.6.9b,路径:data/pile.deduped.6.9b-* - 切分:pile.deduped.12b,路径:data/pile.deduped.12b-* - 切分:memories.deduped.2.8b,路径:data/memories.deduped.2.8b-* - 切分:pile.deduped.2.8b,路径:data/pile.deduped.2.8b-* - 切分:memories.deduped.1.4b,路径:data/memories.deduped.1.4b-* - 切分:memories.deduped.1b,路径:data/memories.deduped.1b-* - 切分:pile.deduped.1.4b,路径:data/pile.deduped.1.4b-* - 切分:pile.deduped.1b,路径:data/pile.deduped.1b-* 数据集信息: 特征: - 名称:index,数据类型:int32(32位整数) - 名称:prompt_perplexity(提示困惑度),数据类型:float32(32位浮点数) - 名称:generation_perplexity(生成困惑度),数据类型:float32(32位浮点数) - 名称:sequence_perplexity(序列困惑度),数据类型:float32(32位浮点数) 切分详情: - 名称:memories.deduped.12b,字节数:29939456,样本数量:1871216 - 名称:memories.duped.12b,字节数:38117248,样本数量:2382328 - 名称:memories.duped.6.9b,字节数:33935616,样本数量:2120976 - 名称:pile.duped.6.9b,字节数:80000000,样本数量:5000000 - 名称:memories.duped.70m,字节数:7423248,样本数量:463953 - 名称:memories.duped.160m,字节数:11034768,样本数量:689673 - 名称:memories.duped.410m,字节数:15525456,样本数量:970341 - 名称:pile.duped.70m,字节数:80000000,样本数量:5000000 - 名称:pile.duped.160m,字节数:80000000,样本数量:5000000 - 名称:pile.duped.410m,字节数:80000000,样本数量:5000000 - 名称:memories.duped.1.4b,字节数:21979552,样本数量:1373722 - 名称:memories.duped.1b,字节数:20098256,样本数量:1256141 - 名称:memories.duped.2.8b,字节数:26801232,样本数量:1675077 - 名称:pile.duped.1.4b,字节数:80000000,样本数量:5000000 - 名称:pile.duped.1b,字节数:80000000,样本数量:5000000 - 名称:pile.duped.2.8b,字节数:80000000,样本数量:5000000 - 名称:pile.duped.12b,字节数:80000000,样本数量:5000000 - 名称:memories.deduped.70m,字节数:6583168,样本数量:411448 - 名称:memories.deduped.160m,字节数:9299120,样本数量:581195 - 名称:memories.deduped.410m,字节数:12976624,样本数量:811039 - 名称:pile.deduped.70m,字节数:80000000,样本数量:5000000 - 名称:pile.deduped.160m,字节数:80000000,样本数量:5000000 - 名称:pile.deduped.410m,字节数:80000000,样本数量:5000000 - 名称:memories.deduped.6.9b,字节数:26884704,样本数量:1680294 - 名称:pile.deduped.6.9b,字节数:80000000,样本数量:5000000 - 名称:pile.deduped.12b,字节数:80000000,样本数量:5000000 - 名称:memories.deduped.2.8b,字节数:21683376,样本数量:1355211 - 名称:pile.deduped.2.8b,字节数:80000000,样本数量:5000000 - 名称:memories.deduped.1.4b,字节数:16769552,样本数量:1048097 - 名称:memories.deduped.1b,字节数:16525840,样本数量:1032865 - 名称:pile.deduped.1.4b,字节数:80000000,样本数量:5000000 - 名称:pile.deduped.1b,字节数:80000000,样本数量:5000000 下载总大小:1891778367 字节 数据集总大小:1595577216 字节 # “pythia-semantic-memorization-perplexities” 数据集卡片 [需补充更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
Kyle1668
原始信息汇总

数据集概述

数据集配置

  • config_name: default
  • data_files: 包含多个子集,每个子集有不同的文件路径和大小,如memories.deduped.12b, memories.duped.12b等。

数据集信息

  • features:

    • name: index, prompt_perplexity, generation_perplexity, sequence_perplexity
    • dtype: int32, float32, float32, float32
  • splits:

    • name: 多个子集名称,如memories.deduped.12b, memories.duped.12b等。
    • num_bytes: 各子集的字节数,例如memories.deduped.12b为29939456字节。
    • num_examples: 各子集的示例数,例如memories.deduped.12b有1871216个示例。
  • download_size: 1891778367

  • dataset_size: 1595577216

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作