five

tgokhale/sr2d_visor

收藏
Hugging Face2023-06-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/tgokhale/sr2d_visor
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-nd-4.0 viewer: false --- # Benchmarking Spatial Relationships in Text-to-Image Generation *Tejas Gokhale, Hamid Palangi, Besmira Nushi, Vibhav Vineet, Eric Horvitz, Ece Kamar, Chitta Baral, Yezhou Yang* - We introduce a large-scale challenge dataset SR<sub>2D</sub> that contains sentences describing two objects and the spatial relationship between them. - We introduce a metric called VISOR (short for **V**erify**I**ng **S**patial **O**bject **R**elationships) to quantify spatial reasoning performance. - VISOR and SR<sub>2D</sub> can be used off-the-shelf with any text-to-image model. ## SR<sub>2D</sub> Dataset Our dataset is hosted as [here](https://huggingface.co/datasets/tgokhale/sr2d_visor). This contains 1. The text prompt dataset in `.json` format (`text_spatial_rel_phrases.json`) 2. Images generated using 7 models (GLIDE, CogView2, DALLE-mini, Stable Diffusion, GLIDE + Stable Diffusion + CDM, and Stable Diffusion v2.1) Alternatively, the text prompt dataset can also accessed from [`text_spatial_rel_phrases.json`](https://github.com/microsoft/VISOR/blob/main/text_spatial_rel_phrases.json). It contains all examples from the current version of the dataset (31680 text prompts) accompanied by the corresponding metadata. This dataset can also be generated by running the script `python create_spatial_phrases.py` ## GitHub repository The GitHub repository for [VISOR](https://github.com/microsoft/VISOR/) contains code for generating images with prompts from the SR<sub>2D</sub> dataset and evaluating the generated images using VISOR. ## References Code for text-to-image generation: 1. GLIDE: https://github.com/openai/glide-text2im 2. DALLE-mini: https://github.com/borisdayma/dalle-mini 3. CogView2: https://github.com/THUDM/CogView2 4. Stable Diffusion: https://github.com/CompVis/stable-diffusion 5. Composable Diffusion Models: https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch 6. OpenAI API for DALLE-2: https://openai.com/api/ ## Citation If you find SR<sub>2D</sub> or VISOR useful in your research, please use the following citation: ``` @article{gokhale2022benchmarking, title={Benchmarking Spatial Relationships in Text-to-Image Generation}, author={Gokhale, Tejas and Palangi, Hamid and Nushi, Besmira and Vineet, Vibhav and Horvitz, Eric and Kamar, Ece and Baral, Chitta and Yang, Yezhou}, journal={arXiv preprint arXiv:2212.10015}, year={2022} } ```

--- 许可证:CC BY-NC-ND 4.0 查看器:禁用 --- # 文本生成图像中的空间关系基准测试 *Tejas Gokhale、Hamid Palangi、Besmira Nushi、Vibhav Vineet、Eric Horvitz、Ece Kamar、Chitta Baral、Yezhou Yang* - 我们构建了大规模挑战数据集SR₂D,该数据集包含描述两个物体及其间空间关系的语句。 - 我们提出了一种名为VISOR(全称为**V**erify**I**ng **S**patial **O**bject **R**elationships)的指标,用于量化空间推理性能。 - VISOR与SR₂D均可开箱即用,适配任意文本生成图像模型。 ## SR₂D 数据集 本数据集托管于 [此处](https://huggingface.co/datasets/tgokhale/sr2d_visor),包含以下内容: 1. 以`.json`格式存储的文本提示数据集(`text_spatial_rel_phrases.json`) 2. 由7种模型生成的图像,分别为GLIDE、CogView2、DALLE-mini、Stable Diffusion、GLIDE + Stable Diffusion + CDM以及Stable Diffusion v2.1。 此外,文本提示数据集也可通过 [`text_spatial_rel_phrases.json`](https://github.com/microsoft/VISOR/blob/main/text_spatial_rel_phrases.json) 获取,该文件包含当前版本数据集的全部样本(共31680条文本提示)及其对应的元数据。用户也可通过运行脚本`python create_spatial_phrases.py`生成本数据集。 ## GitHub 仓库 VISOR的GitHub仓库 [VISOR](https://github.com/microsoft/VISOR/) 包含基于SR₂D数据集的提示生成图像,以及使用VISOR对生成图像进行评估的代码。 ## 参考文献 文本生成图像相关代码: 1. GLIDE:https://github.com/openai/glide-text2im 2. DALLE-mini:https://github.com/borisdayma/dalle-mini 3. CogView2:https://github.com/THUDM/CogView2 4. Stable Diffusion:https://github.com/CompVis/stable-diffusion 5. 组合式扩散模型(Composable Diffusion Models):https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch 6. DALLE-2的OpenAI API:https://openai.com/api/ ## 引用 若您在研究中使用SR₂D或VISOR,请采用以下引用格式: @article{gokhale2022benchmarking, title={Benchmarking Spatial Relationships in Text-to-Image Generation}, author={Gokhale, Tejas and Palangi, Hamid and Nushi, Besmira and Vineet, Vibhav and Horvitz, Eric and Kamar, Ece and Baral, Chitta and Yang, Yezhou}, journal={arXiv preprint arXiv:2212.10015}, year={2022} }
提供机构:
tgokhale
原始信息汇总

SR<sub>2D</sub> 数据集概述

数据集介绍

  • 名称:SR<sub>2D</sub>
  • 描述:包含描述两个对象及其空间关系的句子的挑战数据集。
  • 用途:可与任何文本到图像生成模型配合使用。

数据集内容

  • 文本提示数据:以 .json 格式提供,文件名为 text_spatial_rel_phrases.json,包含 31680 个文本提示及其相应元数据。
  • 生成图像:使用 7 种模型(GLIDE, CogView2, DALLE-mini, Stable Diffusion, GLIDE + Stable Diffusion + CDM, 和 Stable Diffusion v2.1)生成的图像。

数据集访问

相关工具

  • VISOR:用于量化空间推理性能的指标,可与 SR<sub>2D</sub> 数据集配合使用。
  • GitHub 仓库VISOR 包含生成图像和评估生成图像的代码。

引用

@article{gokhale2022benchmarking, title={Benchmarking Spatial Relationships in Text-to-Image Generation}, author={Gokhale, Tejas and Palangi, Hamid and Nushi, Besmira and Vineet, Vibhav and Horvitz, Eric and Kamar, Ece and Baral, Chitta and Yang, Yezhou}, journal={arXiv preprint arXiv:2212.10015}, year={2022} }

搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
SR2D是一个用于评估文本到图像生成模型中空间关系理解能力的数据集,包含31680个文本提示和由7种模型生成的图像,并引入了VISOR指标进行性能评估。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作