HuggingFaceM4/general-pmd-synthetic-testing
收藏Hugging Face2022-10-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/HuggingFaceM4/general-pmd-synthetic-testing
下载链接
链接失效反馈官方服务:
资源简介:
---
license: bigscience-openrail-m
---
This dataset is designed to be used in testing. It's derived from general-pmd/localized_narratives__ADE20k dataset
The current splits are: `['100.unique', '100.repeat', '300.unique', '300.repeat', '1k.unique', '1k.repeat', '10k.unique', '10k.repeat']`.
The `unique` ones ensure uniqueness across `text` entries.
The `repeat` ones are repeating the same 10 unique records: - these are useful for memory leaks debugging as the records are always the same and thus remove the record variation from the equation.
The default split is `100.unique`
The full process of this dataset creation, including which records were used to build it, is documented inside [general-pmd-synthetic-testing.py](https://huggingface.co/datasets/HuggingFaceM4/general-pmd-synthetic-testing/blob/main/general-pmd-synthetic-testing.py)
提供机构:
HuggingFaceM4
原始信息汇总
数据集概述
数据集来源
- 本数据集源自
general-pmd/localized_narratives__ADE20k数据集。
数据集用途
- 设计用于测试目的。
数据集结构
- 数据集分为多个子集,包括:
100.unique100.repeat300.unique300.repeat1k.unique1k.repeat10k.unique10k.repeat
子集特性
unique子集确保text条目的唯一性。repeat子集重复相同的10条记录,用于内存泄漏调试,以消除记录变量。
默认子集
- 默认子集为
100.unique。
数据集创建过程
- 数据集的完整创建过程,包括使用的记录,详细记录在 general-pmd-synthetic-testing.py 文件中。



