tgokhale/sr2d_visor
收藏Hugging Face2023-06-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/tgokhale/sr2d_visor
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-nd-4.0
viewer: false
---
# Benchmarking Spatial Relationships in Text-to-Image Generation
*Tejas Gokhale, Hamid Palangi, Besmira Nushi, Vibhav Vineet, Eric Horvitz, Ece Kamar, Chitta Baral, Yezhou Yang*
- We introduce a large-scale challenge dataset SR<sub>2D</sub> that contains sentences describing two objects and the spatial relationship between them.
- We introduce a metric called VISOR (short for **V**erify**I**ng **S**patial **O**bject **R**elationships) to quantify spatial reasoning performance.
- VISOR and SR<sub>2D</sub> can be used off-the-shelf with any text-to-image model.
## SR<sub>2D</sub> Dataset
Our dataset is hosted as [here](https://huggingface.co/datasets/tgokhale/sr2d_visor). This contains
1. The text prompt dataset in `.json` format (`text_spatial_rel_phrases.json`)
2. Images generated using 7 models (GLIDE, CogView2, DALLE-mini, Stable Diffusion, GLIDE + Stable Diffusion + CDM, and Stable Diffusion v2.1)
Alternatively, the text prompt dataset can also accessed from [`text_spatial_rel_phrases.json`](https://github.com/microsoft/VISOR/blob/main/text_spatial_rel_phrases.json). It contains all examples from the current version of the dataset (31680 text prompts) accompanied by the corresponding metadata.
This dataset can also be generated by running the script `python create_spatial_phrases.py`
## GitHub repository
The GitHub repository for [VISOR](https://github.com/microsoft/VISOR/) contains code for generating images with prompts from the SR<sub>2D</sub> dataset and evaluating the generated images using VISOR.
## References
Code for text-to-image generation:
1. GLIDE: https://github.com/openai/glide-text2im
2. DALLE-mini: https://github.com/borisdayma/dalle-mini
3. CogView2: https://github.com/THUDM/CogView2
4. Stable Diffusion: https://github.com/CompVis/stable-diffusion
5. Composable Diffusion Models: https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch
6. OpenAI API for DALLE-2: https://openai.com/api/
## Citation
If you find SR<sub>2D</sub> or VISOR useful in your research, please use the following citation:
```
@article{gokhale2022benchmarking,
title={Benchmarking Spatial Relationships in Text-to-Image Generation},
author={Gokhale, Tejas and Palangi, Hamid and Nushi, Besmira and Vineet, Vibhav and Horvitz, Eric and Kamar, Ece and Baral, Chitta and Yang, Yezhou},
journal={arXiv preprint arXiv:2212.10015},
year={2022}
}
```
---
许可证:CC BY-NC-ND 4.0
查看器:禁用
---
# 文本生成图像中的空间关系基准测试
*Tejas Gokhale、Hamid Palangi、Besmira Nushi、Vibhav Vineet、Eric Horvitz、Ece Kamar、Chitta Baral、Yezhou Yang*
- 我们构建了大规模挑战数据集SR₂D,该数据集包含描述两个物体及其间空间关系的语句。
- 我们提出了一种名为VISOR(全称为**V**erify**I**ng **S**patial **O**bject **R**elationships)的指标,用于量化空间推理性能。
- VISOR与SR₂D均可开箱即用,适配任意文本生成图像模型。
## SR₂D 数据集
本数据集托管于 [此处](https://huggingface.co/datasets/tgokhale/sr2d_visor),包含以下内容:
1. 以`.json`格式存储的文本提示数据集(`text_spatial_rel_phrases.json`)
2. 由7种模型生成的图像,分别为GLIDE、CogView2、DALLE-mini、Stable Diffusion、GLIDE + Stable Diffusion + CDM以及Stable Diffusion v2.1。
此外,文本提示数据集也可通过 [`text_spatial_rel_phrases.json`](https://github.com/microsoft/VISOR/blob/main/text_spatial_rel_phrases.json) 获取,该文件包含当前版本数据集的全部样本(共31680条文本提示)及其对应的元数据。用户也可通过运行脚本`python create_spatial_phrases.py`生成本数据集。
## GitHub 仓库
VISOR的GitHub仓库 [VISOR](https://github.com/microsoft/VISOR/) 包含基于SR₂D数据集的提示生成图像,以及使用VISOR对生成图像进行评估的代码。
## 参考文献
文本生成图像相关代码:
1. GLIDE:https://github.com/openai/glide-text2im
2. DALLE-mini:https://github.com/borisdayma/dalle-mini
3. CogView2:https://github.com/THUDM/CogView2
4. Stable Diffusion:https://github.com/CompVis/stable-diffusion
5. 组合式扩散模型(Composable Diffusion Models):https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch
6. DALLE-2的OpenAI API:https://openai.com/api/
## 引用
若您在研究中使用SR₂D或VISOR,请采用以下引用格式:
@article{gokhale2022benchmarking,
title={Benchmarking Spatial Relationships in Text-to-Image Generation},
author={Gokhale, Tejas and Palangi, Hamid and Nushi, Besmira and Vineet, Vibhav and Horvitz, Eric and Kamar, Ece and Baral, Chitta and Yang, Yezhou},
journal={arXiv preprint arXiv:2212.10015},
year={2022}
}
提供机构:
tgokhale
原始信息汇总
SR<sub>2D</sub> 数据集概述
数据集介绍
- 名称:SR<sub>2D</sub>
- 描述:包含描述两个对象及其空间关系的句子的挑战数据集。
- 用途:可与任何文本到图像生成模型配合使用。
数据集内容
- 文本提示数据:以
.json格式提供,文件名为text_spatial_rel_phrases.json,包含 31680 个文本提示及其相应元数据。 - 生成图像:使用 7 种模型(GLIDE, CogView2, DALLE-mini, Stable Diffusion, GLIDE + Stable Diffusion + CDM, 和 Stable Diffusion v2.1)生成的图像。
数据集访问
- 链接:SR<sub>2D</sub> 数据集
- 文本提示数据:也可通过
text_spatial_rel_phrases.json访问。 - 生成脚本:可通过运行脚本
python create_spatial_phrases.py生成数据集。
相关工具
- VISOR:用于量化空间推理性能的指标,可与 SR<sub>2D</sub> 数据集配合使用。
- GitHub 仓库:VISOR 包含生成图像和评估生成图像的代码。
引用
@article{gokhale2022benchmarking, title={Benchmarking Spatial Relationships in Text-to-Image Generation}, author={Gokhale, Tejas and Palangi, Hamid and Nushi, Besmira and Vineet, Vibhav and Horvitz, Eric and Kamar, Ece and Baral, Chitta and Yang, Yezhou}, journal={arXiv preprint arXiv:2212.10015}, year={2022} }
搜集汇总
数据集介绍

背景与挑战
背景概述
SR2D是一个用于评估文本到图像生成模型中空间关系理解能力的数据集,包含31680个文本提示和由7种模型生成的图像,并引入了VISOR指标进行性能评估。
以上内容由遇见数据集搜集并总结生成



