tgokhale/sr2d_visor

Name: tgokhale/sr2d_visor
Creator: tgokhale
Published: 2023-06-12 04:49:57
License: 暂无描述

Hugging Face2023-06-12 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/tgokhale/sr2d_visor

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-nd-4.0 viewer: false --- # Benchmarking Spatial Relationships in Text-to-Image Generation *Tejas Gokhale, Hamid Palangi, Besmira Nushi, Vibhav Vineet, Eric Horvitz, Ece Kamar, Chitta Baral, Yezhou Yang* - We introduce a large-scale challenge dataset SR2D that contains sentences describing two objects and the spatial relationship between them. - We introduce a metric called VISOR (short for **V**erify**I**ng **S**patial **O**bject **R**elationships) to quantify spatial reasoning performance. - VISOR and SR2D can be used off-the-shelf with any text-to-image model. ## SR2D Dataset Our dataset is hosted as [here](https://huggingface.co/datasets/tgokhale/sr2d_visor). This contains 1. The text prompt dataset in `.json` format (`text_spatial_rel_phrases.json`) 2. Images generated using 7 models (GLIDE, CogView2, DALLE-mini, Stable Diffusion, GLIDE + Stable Diffusion + CDM, and Stable Diffusion v2.1) Alternatively, the text prompt dataset can also accessed from [`text_spatial_rel_phrases.json`](https://github.com/microsoft/VISOR/blob/main/text_spatial_rel_phrases.json). It contains all examples from the current version of the dataset (31680 text prompts) accompanied by the corresponding metadata. This dataset can also be generated by running the script `python create_spatial_phrases.py` ## GitHub repository The GitHub repository for [VISOR](https://github.com/microsoft/VISOR/) contains code for generating images with prompts from the SR2D dataset and evaluating the generated images using VISOR. ## References Code for text-to-image generation: 1. GLIDE: https://github.com/openai/glide-text2im 2. DALLE-mini: https://github.com/borisdayma/dalle-mini 3. CogView2: https://github.com/THUDM/CogView2 4. Stable Diffusion: https://github.com/CompVis/stable-diffusion 5. Composable Diffusion Models: https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch 6. OpenAI API for DALLE-2: https://openai.com/api/ ## Citation If you find SR2D or VISOR useful in your research, please use the following citation: ``` @article{gokhale2022benchmarking, title={Benchmarking Spatial Relationships in Text-to-Image Generation}, author={Gokhale, Tejas and Palangi, Hamid and Nushi, Besmira and Vineet, Vibhav and Horvitz, Eric and Kamar, Ece and Baral, Chitta and Yang, Yezhou}, journal={arXiv preprint arXiv:2212.10015}, year={2022} } ```

--- 许可证：CC BY-NC-ND 4.0 查看器：禁用 --- # 文本生成图像中的空间关系基准测试 *Tejas Gokhale、Hamid Palangi、Besmira Nushi、Vibhav Vineet、Eric Horvitz、Ece Kamar、Chitta Baral、Yezhou Yang* - 我们构建了大规模挑战数据集SR₂D，该数据集包含描述两个物体及其间空间关系的语句。 - 我们提出了一种名为VISOR（全称为**V**erify**I**ng **S**patial **O**bject **R**elationships）的指标，用于量化空间推理性能。 - VISOR与SR₂D均可开箱即用，适配任意文本生成图像模型。 ## SR₂D 数据集本数据集托管于 [此处](https://huggingface.co/datasets/tgokhale/sr2d_visor)，包含以下内容： 1. 以`.json`格式存储的文本提示数据集（`text_spatial_rel_phrases.json`） 2. 由7种模型生成的图像，分别为GLIDE、CogView2、DALLE-mini、Stable Diffusion、GLIDE + Stable Diffusion + CDM以及Stable Diffusion v2.1。此外，文本提示数据集也可通过 [`text_spatial_rel_phrases.json`](https://github.com/microsoft/VISOR/blob/main/text_spatial_rel_phrases.json) 获取，该文件包含当前版本数据集的全部样本（共31680条文本提示）及其对应的元数据。用户也可通过运行脚本`python create_spatial_phrases.py`生成本数据集。 ## GitHub 仓库 VISOR的GitHub仓库 [VISOR](https://github.com/microsoft/VISOR/) 包含基于SR₂D数据集的提示生成图像，以及使用VISOR对生成图像进行评估的代码。 ## 参考文献文本生成图像相关代码： 1. GLIDE：https://github.com/openai/glide-text2im 2. DALLE-mini：https://github.com/borisdayma/dalle-mini 3. CogView2：https://github.com/THUDM/CogView2 4. Stable Diffusion：https://github.com/CompVis/stable-diffusion 5. 组合式扩散模型（Composable Diffusion Models）：https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch 6. DALLE-2的OpenAI API：https://openai.com/api/ ## 引用若您在研究中使用SR₂D或VISOR，请采用以下引用格式： @article{gokhale2022benchmarking, title={Benchmarking Spatial Relationships in Text-to-Image Generation}, author={Gokhale, Tejas and Palangi, Hamid and Nushi, Besmira and Vineet, Vibhav and Horvitz, Eric and Kamar, Ece and Baral, Chitta and Yang, Yezhou}, journal={arXiv preprint arXiv:2212.10015}, year={2022} }

提供机构：

tgokhale

原始信息汇总

SR2D 数据集概述

数据集介绍

名称：SR2D
描述：包含描述两个对象及其空间关系的句子的挑战数据集。
用途：可与任何文本到图像生成模型配合使用。

数据集内容

文本提示数据：以 .json 格式提供，文件名为 text_spatial_rel_phrases.json，包含 31680 个文本提示及其相应元数据。
生成图像：使用 7 种模型（GLIDE, CogView2, DALLE-mini, Stable Diffusion, GLIDE + Stable Diffusion + CDM, 和 Stable Diffusion v2.1）生成的图像。

数据集访问

链接：SR2D 数据集
文本提示数据：也可通过 text_spatial_rel_phrases.json 访问。
生成脚本：可通过运行脚本 python create_spatial_phrases.py 生成数据集。

引用

@article{gokhale2022benchmarking, title={Benchmarking Spatial Relationships in Text-to-Image Generation}, author={Gokhale, Tejas and Palangi, Hamid and Nushi, Besmira and Vineet, Vibhav and Horvitz, Eric and Kamar, Ece and Baral, Chitta and Yang, Yezhou}, journal={arXiv preprint arXiv:2212.10015}, year={2022} }

搜集汇总

数据集介绍