five

HuggingFaceM4/general-pmd-synthetic-testing

收藏
Hugging Face2022-10-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/HuggingFaceM4/general-pmd-synthetic-testing
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: bigscience-openrail-m --- This dataset is designed to be used in testing. It's derived from general-pmd/localized_narratives__ADE20k dataset The current splits are: `['100.unique', '100.repeat', '300.unique', '300.repeat', '1k.unique', '1k.repeat', '10k.unique', '10k.repeat']`. The `unique` ones ensure uniqueness across `text` entries. The `repeat` ones are repeating the same 10 unique records: - these are useful for memory leaks debugging as the records are always the same and thus remove the record variation from the equation. The default split is `100.unique` The full process of this dataset creation, including which records were used to build it, is documented inside [general-pmd-synthetic-testing.py](https://huggingface.co/datasets/HuggingFaceM4/general-pmd-synthetic-testing/blob/main/general-pmd-synthetic-testing.py)
提供机构:
HuggingFaceM4
原始信息汇总

数据集概述

数据集来源

  • 本数据集源自 general-pmd/localized_narratives__ADE20k 数据集。

数据集用途

  • 设计用于测试目的。

数据集结构

  • 数据集分为多个子集,包括:
    • 100.unique
    • 100.repeat
    • 300.unique
    • 300.repeat
    • 1k.unique
    • 1k.repeat
    • 10k.unique
    • 10k.repeat

子集特性

  • unique 子集确保 text 条目的唯一性。
  • repeat 子集重复相同的10条记录,用于内存泄漏调试,以消除记录变量。

默认子集

  • 默认子集为 100.unique

数据集创建过程

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作