Mitsua/safe-commons-pd-3m
收藏Hugging Face2025-02-13 更新2024-12-21 收录
下载链接:
https://hf-mirror.com/datasets/Mitsua/safe-commons-pd-3m
下载链接
链接失效反馈官方服务:
资源简介:
Safe Commons PD 3M是一个平衡且安全使用的公共领域/CC0图像数据集。所有图像和文本均来自Wikimedia Commons和Wikidata,并经过严格筛选。图像许可为公共领域或CC0,文本许可为CC0或CC BY-SA。数据集中不包含任何合成数据(如AI生成的图像或描述)。为确保最高级别的安全性,数据集构建过程中避免了使用现有预训练模型的知识泄露,并采用了基于Wikidata的词汇数据库进行筛选,同时进行了人工视觉检查。数据集的目标是创建一个既高质量又安全的公共领域图像和文本数据集,避免使用AI生成的内容,并通过多种方法提高数据集的多样性和平衡性。
Safe Commons PD 3M is a balanced and safe-to-use public domain / CC0 images dataset. All images and texts come from Wikimedia Commons and Wikidata with strict filtering. The images are licensed under public domain or CC0, and the texts are licensed under CC0 or CC BY-SA. The dataset does not contain any synthetic data (AI generated images or captions). To ensure the highest level of safety, the dataset building process avoids using AI-generated captions, aesthetic scoring, or CLIP score filtering. The dataset is filtered through word-level filtering and manual visual checks to ensure safe use. The dataset also undergoes image deduplication, diversification balancing, and machine translation to improve the quality and diversity of the dataset.
提供机构:
Mitsua



