HuggingFaceM4/general-pmd-synthetic-testing

Name: HuggingFaceM4/general-pmd-synthetic-testing
Creator: HuggingFaceM4
Published: 2022-10-07 03:12:13
License: 暂无描述

Hugging Face2022-10-07 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/HuggingFaceM4/general-pmd-synthetic-testing

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: bigscience-openrail-m --- This dataset is designed to be used in testing. It's derived from general-pmd/localized_narratives__ADE20k dataset The current splits are: `['100.unique', '100.repeat', '300.unique', '300.repeat', '1k.unique', '1k.repeat', '10k.unique', '10k.repeat']`. The `unique` ones ensure uniqueness across `text` entries. The `repeat` ones are repeating the same 10 unique records: - these are useful for memory leaks debugging as the records are always the same and thus remove the record variation from the equation. The default split is `100.unique` The full process of this dataset creation, including which records were used to build it, is documented inside [general-pmd-synthetic-testing.py](https://huggingface.co/datasets/HuggingFaceM4/general-pmd-synthetic-testing/blob/main/general-pmd-synthetic-testing.py)

提供机构：

HuggingFaceM4

原始信息汇总

数据集概述

数据集来源

本数据集源自 general-pmd/localized_narratives__ADE20k 数据集。

数据集用途

设计用于测试目的。

数据集结构

数据集分为多个子集，包括：
- 100.unique
- 100.repeat
- 300.unique
- 300.repeat
- 1k.unique
- 1k.repeat
- 10k.unique
- 10k.repeat

子集特性

unique 子集确保 text 条目的唯一性。
repeat 子集重复相同的10条记录，用于内存泄漏调试，以消除记录变量。

默认子集

默认子集为 100.unique。

数据集创建过程

数据集的完整创建过程，包括使用的记录，详细记录在 general-pmd-synthetic-testing.py 文件中。

5,000+

优质数据集

54 个

任务类型

进入经典数据集