five

bigdata-pw/Dataception

收藏
Hugging Face2024-08-14 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/bigdata-pw/Dataception
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: dataception data_files: - path: dataception.parquet split: train default: true license: odc-by task_categories: - text-to-image tags: - training pretty_name: Dataception --- # Dataset Card for Dataception Dataset of datasets. ## Dataset Details ### Dataset Description Dataception is a dataset of image-text training datasets. Datasets were collected from CivitAI. Thanks to all the users who curated the original datasets and shared them ❤️ Datasets have been processed and converted into WebDataset format. To ensure consistent format `.png` images are re-encoded to `jpg` with 90% quality and `.jpeg` extension renamed to `.jpg`. Contents of each WebDataset are named with zero-filled sequence number e.g. `00000.txt`. Each dataset is accompanied by the original model metadata in `.json` format. #### Notes Some datasets were excluded from this release: - If multiple subdirectories containing image-text pairs were detected; this structure is used in some training UIs to denote multiple concepts. These datasets will be reviewed to determine appropriate processing into WebDataset format. - If no image-text pairs were detected. These datasets will be reviewed to determine whether captions are in the filenames or just not available. - If the file is not actually a `.zip`; some cases of `.rar` files have been detected which is unsupported by `zipfile` library that the current processing script uses. - If the `.zip` uses an unknown compression type that is unsupported by `zipfile` library. This excludes `deflate64` (used by Windows), which is unsupported by default, thanks to patching from `zipfile_deflate64` library. - Both cases of unsupported files will be processed at a later date. **Curated by:** hlky **License:** Open Data Commons Attribution License (ODC-By) v1.0 ## Dataset Structure ### Dataception - **id**: Model Version ID. - **modelId**: Model ID. - **name**: Model name. - **type**: Model type. - **baseModel**: Base model used for the original training. - **nsfwLevel**: Content rating for the original model and dataset. - **dataset_wds**: Path to the WebDataset. - **dataset_json**: Path to the `.json` metadata. ### Individual datasets `WebDataset` with `.jpg` and `.txt`. ## Uses Potential uses include: - Training practice; easily compare your result to the original model. - Retraining; with modern tooling and newer base models. # Citation Information ``` @misc{Dataception, author = {hlky}, title = {Dataception}, year = {2024}, publisher = {hlky}, journal = {Hugging Face repository}, howpublished = {\url{[https://huggingface.co/datasets/bigdata-pw/Dataception](https://huggingface.co/datasets/bigdata-pw/Dataception)}} } ``` ## Attribution Information ``` Contains information from [Dataception](https://huggingface.co/datasets/bigdata-pw/Dataception) which is made available under the [ODC Attribution License](https://opendatacommons.org/licenses/by/1-0/). ```
提供机构:
bigdata-pw
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作