midjourney-v6-520k-raw

Name: midjourney-v6-520k-raw
Creator: maas
Published: 2026-05-22 03:25:58
License: 暂无描述

魔搭社区2026-05-22 更新2024-08-31 收录

下载链接：

https://modelscope.cn/datasets/OmniData/midjourney-v6-520k-raw

下载链接

链接失效反馈

官方服务：

资源简介：

# Synthetic Dataset: MJv6-520k Pulled from Midjourney on 19 Jun 2024, filtered down to just singular images. Japanese captions are filtered via GPT3.5 into English -> `gpt_caption` column. Original captions are available as the `original_text` column. Each file has a metadata JSON and txt file with the same name. The metadata is the same from the parquet table. The text file is for use in SimpleTuner or Kohya for training. **This dataset contains the full images.** Code to compile parquet: ```py """ Python. """ # A script to compile all .json files in the pwd into a parquet file column_types = { "id": "int64", "version": "str", "arguments": "str", "original_text": "str", "caption": "str", "gpt_caption": "str", "width": "int", "height": "int", "reactions": "dict" } # Map column types to their corresponding pandas types import pandas as pd column_types = {k: pd.api.types.infer_dtype(v) for k, v in column_types.items()} # Read all .json files in the pwd import json import os data = [] for file in os.listdir(): if file.endswith(".json"): with open(file, "r") as f: data.append(json.load(f)) # Convert the data to a DataFrame df = pd.DataFrame(data) # Convert the columns to the correct types for col, dtype in column_types.items(): df[col] = df[col].astype(dtype) # Save the DataFrame to a parquet file df.to_parquet("train.parquet") # Print the first few rows of the DataFrame print(df.head()) ```

# 合成数据集：MJv6-520k 本数据集于2024年6月19日从Midjourney平台抓取，经筛选后仅保留单张图像。日语标题已通过GPT-3.5翻译为英语，对应字段为`gpt_caption`。原始标题存储于`original_text`字段中。每个数据文件均包含同名的元数据JSON文件与文本文件：元数据与Parquet表格中的信息完全一致；文本文件可用于SimpleTuner或Kohya的模型训练流程。 **本数据集包含完整原始图像。** 用于生成Parquet文件的代码如下： py """ Python 脚本。 """ # 用于将当前工作目录下所有JSON文件合并为Parquet文件的脚本 column_types = { "id": "int64", "version": "str", "arguments": "str", "original_text": "str", "caption": "str", "gpt_caption": "str", "width": "int", "height": "int", "reactions": "dict" } # 将列类型映射至对应的pandas数据类型 import pandas as pd column_types = {k: pd.api.types.infer_dtype(v) for k, v in column_types.items()} # 读取当前目录下所有JSON文件 import json import os data = [] for file in os.listdir(): if file.endswith(".json"): with open(file, "r") as f: data.append(json.load(f)) # 将数据转换为DataFrame格式 df = pd.DataFrame(data) # 将各列转换为指定数据类型 for col, dtype in column_types.items(): df[col] = df[col].astype(dtype) # 将DataFrame保存为Parquet文件 df.to_parquet("train.parquet") # 打印DataFrame的前几行数据 print(df.head())

提供机构：

maas

创建时间：

2024-07-27

搜集汇总

数据集介绍