five

terminusresearch/ideogram-75k

收藏
Hugging Face2024-07-12 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/terminusresearch/ideogram-75k
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: agpl-3.0 --- # Ideogram-75k ## Dataset Details This dataset is not authorised by, curated by, or related to Ideogram. #### This dataset contains the `ideogram-25k` dataset contents. Do not use both! ### Dataset Description - **Curated by:** @pseudoterminalx - **License:** AGPLv3. **Note**: All models created using this dataset are a derivative of it, and must have an open release under a permissible or copyleft license. ### Dataset Sources Pulled ~75,000 images from Ideogram, a proprietary image generation service that excels at typography. ## Uses - Fine-tuning or training text-to-image models and classifiers - Analysis of Ideogram user bias ## Dataset Structure - Filenames are an SHA256 hash of the image data, and can be used to verify the integrity. - The `caption` column was obtained by asking Microsoft Florence2 (ft) to accurately describe what it sees. ## Dataset Creation ### Curation Rationale Ideogram's users focus on typography generations, which makes it a suitable source for a lot of high quality typography data. As a synthetic data source, its outputs are free of copyright concerns. #### Data Collection and Processing Used a custom Selenium application in Python that monitors the Ideogram service for posts and immediately saves them to disk. Data is deduplicated by its SHA256 hash. ## Bias, Risks, and Limitations As the captions all currently come from a single synthetic source, the bias of the Llava 34B captioner is present throughout this dataset. More captions will be added. ## Citation If there is any model built using this dataset or any further augmentations (eg. new captions) are added, this page & Terminus Research should be cited.
提供机构:
terminusresearch
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作