five

danbooru2024-sfw

收藏
魔搭社区2025-12-04 更新2024-12-21 收录
下载链接:
https://modelscope.cn/datasets/deepghs/danbooru2024-sfw
下载链接
链接失效反馈
官方服务:
资源简介:
# 🎨 Danbooru2024 Dataset ![Dataset Size](https://img.shields.io/badge/Images-6.5M-blue) ![Language](https://img.shields.io/badge/Languages-EN%20|%20JA-green) ![Category](https://img.shields.io/badge/Category-Image%20Classification-orange) ## 📊 Dataset Overview The **Danbooru2024** dataset is a comprehensive collection focused on animation and illustration artwork, derived from the official Danbooru platform. It contains approximately **6.5 million high-quality, user-annotated images** with corresponding tags and textual descriptions. This dataset is filtered from an original set of **8.3 million entries**, excluding NSFW-rated, **opt-out** entries to create a more accessible and audience-friendly resource. It addresses the challenges associated with overly crawled booru databases by providing a curated and well-structured solution. ## ✨ Features ### 📋 Metadata Support Includes a Parquet format metadata. Example code for usage: ```python # install necessary packages, you can choose pyarrow or fastparquet #%pip install pandas pyarrow from tqdm.auto import tqdm import pandas as pd tqdm.pandas() # register progress_apply # read parquet file df = pd.read_parquet('metadata.parquet') print(df.head()) # check the first 5 rows #print(df.columns) # you can check the columns of the dataframe necessary_rows = [ "created_at", "score", "rating", "tag_string", "up_score", "down_score", "fav_count" ] df = df[necessary_rows] # shrink the dataframe to only necessary columns df['created_at'] = pd.to_datetime(df['created_at']) # convert to datetime datetime_start = pd.Timestamp('2007-01-01', tz='UTC') datetime_end = pd.Timestamp('2008-01-01', tz='UTC') subdf = df[(df['created_at'] >= datetime_start) & (df['created_at'] < datetime_end)] # count some rating print(subdf['rating'].value_counts()) # export subdataframe subdf.to_parquet('metadata-2007.parquet') ``` ### 📥 Partial Downloads To simplify downloading specific entries, use the **[CheeseChaser](https://github.com/deepghs/cheesechaser)** library: ```python #%pip install cheesechaser # >=0.2.0 from cheesechaser.datapool import Danbooru2024SfwDataPool from cheesechaser.query import DanbooruIdQuery pool = Danbooru2024SfwDataPool() #my_waifu_ids = DanbooruIdQuery(['surtr_(arknights)', 'solo']) # above is only available when Danbooru is accessible, if not, use following: import pandas as pd # read parquet file df = pd.read_parquet('metadata.parquet', columns=['id', 'tag_string']) # read only necessary columns #surtr_(arknights) -> gets interpreted as regex so we need to escape the brackets subdf = df[df['tag_string'].str.contains('surtr_\\(arknights\\)') & df['tag_string'].str.contains('solo')] ids = subdf.index.tolist() print(ids[:5]) # check the first 5 ids # download danbooru images with surtr+solo, to directory /data/exp2_surtr pool.batch_download_to_directory( resource_ids=ids, dst_dir='/data/exp2_surtr', max_workers=12, ) ``` # Terms and Conditions for Dataset Use ## General Use Requirements - **User Responsibility**: Users must possess sufficient knowledge and expertise to use the dataset appropriately. Any derived works or outputs created using the dataset are the sole responsibility of the user. The creators of the dataset do not offer any guarantees or warranties regarding the outcomes or uses of such derived works. ## License Agreement - **Mandatory Agreement**: Usage of this dataset is contingent upon the user’s acceptance of the associated LICENSE terms. Without agreement, users are prohibited from utilizing the dataset. - **Modifications and Updates**: The dataset may be subject to updates or changes over time. These modifications are governed by the same LICENSE terms and conditions. - **Opt-Out Compliance**: The dataset aligns with the opt-out policy of the original booru database. If applicable, any modifications to this policy will be reflected and respected in the dataset. ## Prohibited Uses The dataset explicitly prohibits the following activities: 1. **Harmful or Malicious Activities**: - Using the dataset or its outputs to harass, threaten, or intimidate individuals or groups. - Spreading false or misleading information. - Any use intended to cause harm to individuals, organizations, or society. 2. **Illegal Activities**: - Generating content or outputs that violate local, national, or international laws. - Any use that breaches applicable regulations or promotes unlawful actions. 3. **Unethical or Offensive Content Modification**: - Modified for producing controversial materials that go against ethical guidelines or community standards. - Any use that could incite hate, violence, or discrimination. ## User Agreement and Acknowledgment By using this dataset, users explicitly agree to: - Adhere to the conditions specified in the LICENSE. - Take full responsibility for how the dataset and its outputs are utilized, including any consequences resulting from their use. ## Disclaimer - **No Warranties**: The creators of the dataset provide it "as is" and make no warranties regarding the dataset's quality, reliability, or fitness for any particular purpose. - **Indemnification**: Users agree to indemnify and hold harmless the creators against any claims, damages, or liabilities arising from their use of the dataset. ## 🏷️ Dataset Information - **License**: Other - **Task Categories**: - Image Classification - Zero-shot Image Classification - Text-to-Image - **Languages**: - English - Japanese - **Tags**: - Art - Anime - **Size Category**: 1M < n < 10M - **Annotation Creators**: No annotation - **Source Datasets**: [Danbooru](danbooru.donmai.us) --- *Note: This dataset is provided for research and development purposes. Please ensure compliance with all applicable usage terms and conditions.*

# 🎨 Danbooru2024 数据集 ![数据集规模](https://img.shields.io/badge/Images-6.5M-blue) ![支持语言](https://img.shields.io/badge/Languages-EN%20|%20JA-green) ![数据集类别](https://img.shields.io/badge/Category-Image%20Classification-orange) ## 📊 数据集概览 **Danbooru2024** 数据集是源自官方Danbooru平台的综合性动画与插画艺术作品集合,包含约650万张经用户标注的高质量图像,并配有对应的标签与文本描述。 本数据集从原始的830万条数据中筛选而来,剔除了NSFW分级及选择退出(opt-out)的条目,以打造更易获取、受众友好的资源。其通过提供经过精选且结构良好的解决方案,解决了过度爬取的booru数据库所面临的各类问题。 ## ✨ 特性 ### 📋 元数据支持 支持Parquet格式的元数据。 使用示例代码: python # install necessary packages, you can choose pyarrow or fastparquet #%pip install pandas pyarrow from tqdm.auto import tqdm import pandas as pd tqdm.pandas() # register progress_apply # read parquet file df = pd.read_parquet('metadata.parquet') print(df.head()) # check the first 5 rows #print(df.columns) # you can check the columns of the dataframe necessary_rows = [ "created_at", "score", "rating", "tag_string", "up_score", "down_score", "fav_count" ] df = df[necessary_rows] # shrink the dataframe to only necessary columns df['created_at'] = pd.to_datetime(df['created_at']) # convert to datetime datetime_start = pd.Timestamp('2007-01-01', tz='UTC') datetime_end = pd.Timestamp('2008-01-01', tz='UTC') subdf = df[(df['created_at'] >= datetime_start) & (df['created_at'] < datetime_end)] # count some rating print(subdf['rating'].value_counts()) # export subdataframe subdf.to_parquet('metadata-2007.parquet') ### 📥 分块下载 为简化特定条目的下载流程,可使用 **[CheeseChaser](https://github.com/deepghs/cheesechaser)** 库: python #%pip install cheesechaser # >=0.2.0 from cheesechaser.datapool import Danbooru2024SfwDataPool from cheesechaser.query import DanbooruIdQuery pool = Danbooru2024SfwDataPool() #my_waifu_ids = DanbooruIdQuery(['surtr_(arknights)', 'solo']) # above is only available when Danbooru is accessible, if not, use following: import pandas as pd # read parquet file df = pd.read_parquet('metadata.parquet', columns=['id', 'tag_string']) # read only necessary columns #surtr_(arknights) -> gets interpreted as regex so we need to escape the brackets subdf = df[df['tag_string'].str.contains('surtr_\(arknights\)') & df['tag_string'].str.contains('solo')] ids = subdf.index.tolist() print(ids[:5]) # check the first 5 ids # download danbooru images with surtr+solo, to directory /data/exp2_surtr pool.batch_download_to_directory( resource_ids=ids, dst_dir='/data/exp2_surtr', max_workers=12, ) # 数据集使用条款 ## 通用使用要求 - **用户责任**: 用户需具备足够的专业知识以合理使用本数据集。基于本数据集创建的任何衍生作品或输出内容,其责任均由用户自行承担。数据集创作者不对此类衍生作品的使用结果或效果提供任何保证或担保。 ## 许可协议 - **强制同意**: 使用本数据集需以用户接受相关许可(LICENSE)条款为前提。若未同意相关条款,用户不得使用本数据集。 - **修改与更新**: 本数据集可能随时间推移进行更新或修改,此类修改仍受相同许可条款与条件约束。 - **选择退出合规**: 本数据集符合原始booru数据库的选择退出(opt-out)政策。若该政策发生任何适用变更,数据集也将相应反映并遵守该变更。 ## 禁止使用场景 本数据集明确禁止以下行为: 1. **有害或恶意活动**: - 使用本数据集或其输出内容骚扰、威胁或恐吓个人或群体。 - 传播虚假或误导性信息。 - 任何旨在对个人、组织或社会造成伤害的使用行为。 2. **非法活动**: - 生成违反当地、国家或国际法律的内容或输出。 - 任何违反适用法规或宣扬非法行为的使用行为。 3. **不道德或冒犯性内容修改**: - 对数据集内容进行修改以生成违背伦理准则或社区标准的争议性材料。 - 任何可能煽动仇恨、暴力或歧视的使用行为。 ## 用户协议与声明 使用本数据集即代表用户明确同意: - 遵守许可协议中规定的各项条款。 - 对本数据集及其输出内容的使用方式及由此产生的所有后果承担全部责任。 ## 免责声明 - **无任何担保**: 数据集创作者按“现状”提供本数据集,不对其质量、可靠性或特定用途的适用性作出任何担保。 - **赔偿保障**: 用户同意赔偿数据集创作者,使其免于因使用本数据集而产生的任何索赔、损害或法律责任。 ## 🏷️ 数据集信息 - **许可协议**:其他(Other) - **任务类别**: - 图像分类 - 零样本(Zero-shot)图像分类 - 文本到图像(Text-to-Image) - **支持语言**: - 英语 - 日语 - **标签**: - 艺术 - 动画 - **规模类别**:100万 < 数据量 < 1000万 - **标注创作者**:无标注 - **源数据集**:[Danbooru](danbooru.donmai.us) --- *注:本数据集仅用于研究与开发用途,请确保遵守所有适用的使用条款与条件。*
提供机构:
maas
创建时间:
2024-12-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作