five

danbooru2024-webp-4Mpixel

收藏
魔搭社区2024-12-20 更新2024-12-21 收录
下载链接:
https://modelscope.cn/datasets/deepghs/danbooru2024-webp-4Mpixel
下载链接
链接失效反馈
官方服务:
资源简介:
# 🎨 Danbooru2024 Webp 4MPixel Dataset ![Dataset Size](https://img.shields.io/badge/Images-8005010-blue) ![Language](https://img.shields.io/badge/Languages-EN%20|%20JA-green) ![Category](https://img.shields.io/badge/Category-Image%20Classification-orange) ## 📊 Dataset Overview The **Danbooru2024-Webp** dataset is a comprehensive collection focused on animation and illustration artwork, derived from the official Danbooru platform. It contains approximately **8.05 million high-quality, user-annotated images** with corresponding tags and textual descriptions. This dataset is 4MP-focused webp resized-dataset of Danbooru2024. ## ✨ Features ### 📋 Metadata Support Includes a Parquet format metadata. Example code for usage: ```python # install necessary packages, you can choose pyarrow or fastparquet #%pip install pandas pyarrow from tqdm.auto import tqdm import pandas as pd tqdm.pandas() # register progress_apply # read parquet file df = pd.read_parquet('metadata.parquet') print(df.head()) # check the first 5 rows #print(df.columns) # you can check the columns of the dataframe necessary_rows = [ "created_at", "score", "rating", "tag_string", "up_score", "down_score", "fav_count" ] df = df[necessary_rows] # shrink the dataframe to only necessary columns df['created_at'] = pd.to_datetime(df['created_at']) # convert to datetime datetime_start = pd.Timestamp('2007-01-01', tz='UTC') datetime_end = pd.Timestamp('2008-01-01', tz='UTC') subdf = df[(df['created_at'] >= datetime_start) & (df['created_at'] =0.2.0 from cheesechaser.datapool import Danbooru2024WebpDataPool from cheesechaser.query import DanbooruIdQuery pool = Danbooru2024WebpDataPool() #my_waifu_ids = DanbooruIdQuery(['surtr_(arknights)', 'solo']) # above is only available when Danbooru is accessible, if not, use following: import pandas as pd # read parquet file df = pd.read_parquet('metadata.parquet', columns=['id', 'tag_string']) # read only necessary columns #surtr_(arknights) -> gets interpreted as regex so we need to escape the brackets subdf = df[df['tag_string'].str.contains('surtr_\\(arknights\\)') & df['tag_string'].str.contains('solo')] ids = subdf.index.tolist() print(ids[:5]) # check the first 5 ids # download danbooru images with surtr+solo, to directory /data/exp2_surtr pool.batch_download_to_directory( resource_ids=ids, dst_dir='/data/exp2_surtr', max_workers=12, ) ``` # Terms and Conditions for Dataset Use ## General Use Requirements - **User Responsibility**: Users must possess sufficient knowledge and expertise to use the dataset appropriately. Any derived works or outputs created using the dataset are the sole responsibility of the user. The creators of the dataset do not offer any guarantees or warranties regarding the outcomes or uses of such derived works. ## License Agreement - **Mandatory Agreement**: Usage of this dataset is contingent upon the user’s acceptance of the associated LICENSE terms. Without agreement, users are prohibited from utilizing the dataset. - **Modifications and Updates**: The dataset may be subject to updates or changes over time. These modifications are governed by the same LICENSE terms and conditions. - **Opt-Out Compliance**: The dataset aligns with the opt-out policy of the original booru database. If applicable, any modifications to this policy will be reflected and respected in the dataset. ## Prohibited Uses The dataset explicitly prohibits the following activities: 1. **Harmful or Malicious Activities**: - Using the dataset or its outputs to harass, threaten, or intimidate individuals or groups. - Spreading false or misleading information. - Any use intended to cause harm to individuals, organizations, or society. 2. **Illegal Activities**: - Generating content or outputs that violate local, national, or international laws. - Any use that breaches applicable regulations or promotes unlawful actions. 3. **Unethical or Offensive Content Modification**: - Modified for producing controversial materials that go against ethical guidelines or community standards. - Any use that could incite hate, violence, or discrimination. ## User Agreement and Acknowledgment By using this dataset, users explicitly agree to: - Adhere to the conditions specified in the LICENSE. - Take full responsibility for how the dataset and its outputs are utilized, including any consequences resulting from their use. ## Disclaimer - **No Warranties**: The creators of the dataset provide it "as is" and make no warranties regarding the dataset's quality, reliability, or fitness for any particular purpose. - **Indemnification**: Users agree to indemnify and hold harmless the creators against any claims, damages, or liabilities arising from their use of the dataset. ## 🏷️ Dataset Information - **License**: Other - **Task Categories**: - Image Classification - Zero-shot Image Classification - Text-to-Image - **Languages**: - English - Japanese - **Tags**: - Art - Anime - **Size Category**: 1M < n < 10M - **Annotation Creators**: No annotation - **Source Datasets**: [Danbooru](danbooru.donmai.us) --- *Note: This dataset is provided for research and development purposes. Please ensure compliance with all applicable usage terms and conditions.*

# 🎨 Danbooru2024 Webp 4兆像素数据集 ![数据集规模](https://img.shields.io/badge/图像数量-8005010-blue) ![语言支持](https://img.shields.io/badge/语言-英语|日语-green) ![任务类别](https://img.shields.io/badge/类别-图像分类-orange) ## 📊 数据集概览 **Danbooru2024-Webp** 数据集是源自官方Danbooru平台的综合性动画与插画艺术作品集合,包含约805万张经用户标注的高质量图像,附带对应标签与文本描述。 本数据集是针对Danbooru2024进行Webp格式压缩、统一至4兆像素尺寸的衍生数据集。 ## ✨ 数据集特性 ### 📋 元数据支持 包含Parquet格式的元数据文件。 使用示例代码: python # 安装必要依赖包,可选择pyarrow或fastparquet #%pip install pandas pyarrow from tqdm.auto import tqdm import pandas as pd tqdm.pandas() # 注册progress_apply进度装饰器 # 读取Parquet元数据文件 df = pd.read_parquet('metadata.parquet') print(df.head()) # 查看前5行数据 #print(df.columns) # 可查看数据框的全部列名 necessary_rows = [ "created_at", "score", "rating", "tag_string", "up_score", "down_score", "fav_count" ] df = df[necessary_rows] # 仅保留所需字段以压缩数据框 df['created_at'] = pd.to_datetime(df['created_at']) # 转换为日期时间格式 datetime_start = pd.Timestamp('2007-01-01', tz='UTC') datetime_end = pd.Timestamp('2008-01-01', tz='UTC') subdf = df[(df['created_at'] >= datetime_start) & (df['created_at'] < datetime_end)] # 筛选2007年全年的数据 print(subdf.head()) # 另一种使用cheesechaser库的方式 from cheesechaser.datapool import Danbooru2024WebpDataPool from cheesechaser.query import DanbooruIdQuery pool = Danbooru2024WebpDataPool() #my_waifu_ids = DanbooruIdQuery(['surtr_(arknights)', 'solo']) # 仅当可访问Danbooru时可使用上述代码,若无法访问则使用以下方式: import pandas as pd # 读取Parquet元数据文件,仅加载所需列 df = pd.read_parquet('metadata.parquet', columns=['id', 'tag_string']) # surtr_(arknights) 会被解析为正则表达式,因此需转义括号 subdf = df[df['tag_string'].str.contains('surtr_\(arknights\)') & df['tag_string'].str.contains('solo')] ids = subdf.index.tolist() print(ids[:5]) # 查看前5个匹配的图像ID # 批量下载包含「surtr+solo」标签的图像至目录 /data/exp2_surtr pool.batch_download_to_directory( resource_ids=ids, dst_dir='/data/exp2_surtr', max_workers=12, ) # 数据集使用条款与条件 ## 通用使用要求 - **用户责任**: 用户需具备足够的专业知识与能力,以合规合理地使用本数据集。基于本数据集创作的任何衍生作品或产出物的责任均由用户自行承担,数据集创作者不对此类衍生作品的使用结果或产出效果提供任何保证或担保。 ## 许可协议 - **强制许可协议**: 使用本数据集需以用户同意相关许可条款为前提。未达成协议的用户严禁使用本数据集。 - **版本更新与修改**: 本数据集可能随时间推移进行更新或修改,此类修改仍受相同许可条款与条件约束。 - **退出合规性**: 本数据集遵循原始Booru数据库的退出政策,若该政策发生变更,本数据集将同步更新并遵循新规则。 ## 禁止使用场景 本数据集明确禁止以下行为: 1. **有害或恶意活动**: - 使用本数据集或其产出物骚扰、威胁或恐吓个人或群体; - 传播虚假或误导性信息; - 任何旨在对个人、组织或社会造成伤害的使用行为。 2. **非法活动**: - 生成违反当地、国家或国际法律的内容或产出物; - 任何违反适用法规或宣扬非法行为的使用行为。 3. **不道德或冒犯性内容修改**: - 对数据集内容进行修改以生成违背伦理准则或社区标准的争议性材料; - 任何可能煽动仇恨、暴力或歧视的使用行为。 ## 用户协议与声明 通过使用本数据集,用户明确同意: - 遵守许可协议中规定的所有条款; - 对本数据集及其产出物的使用方式及由此产生的一切后果承担全部责任。 ## 免责声明 - **无任何担保**: 数据集创作者按“现状”提供本数据集,不对其质量、可靠性或特定用途的适用性作出任何担保。 - **赔偿豁免**: 用户同意赔偿数据集创作者,使其免受因使用本数据集而引发的任何索赔、损害或法律责任。 ## 🏷️ 数据集详情 - **许可类型**:其他 - **任务类别**: - 图像分类 - 零样本图像分类 - 文本到图像(Text-to-Image) - **支持语言**: - 英语 - 日语 - **标签范畴**: - 艺术 - 动画 - **规模类别**:100万 < 样本数 < 1000万 - **标注方**:无标注 - **源数据集**:[Danbooru](danbooru.donmai.us) --- *注:本数据集仅用于研发与学术研究用途,请确保遵守所有适用的使用条款与条件。*
提供机构:
maas
创建时间:
2024-12-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作