five

GRIT

收藏
魔搭社区2026-05-15 更新2024-06-08 收录
下载链接:
https://modelscope.cn/datasets/swift/GRIT
下载链接
链接失效反馈
官方服务:
资源简介:
# GRIT: Large-Scale Training Corpus of Grounded Image-Text Pairs ### Dataset Description - **Repository:** [Microsoft unilm](https://github.com/microsoft/unilm/tree/master/kosmos-2) - **Paper:** [Kosmos-2](https://arxiv.org/abs/2306.14824) ### Dataset Summary We introduce GRIT, a large-scale dataset of Grounded Image-Text pairs, which is created based on image-text pairs from [COYO-700M](https://github.com/kakaobrain/coyo-dataset) and LAION-2B. We construct a pipeline to extract and link text spans (i.e., noun phrases, and referring expressions) in the caption to their corresponding image regions. More details can be found in the [paper](https://arxiv.org/abs/2306.14824). ### Supported Tasks During the construction, we excluded the image-caption pairs if no bounding boxes are retained. This procedure resulted in a high-quality image-caption subset of COYO-700M, which we will validate in the future. Furthermore, this dataset contains text-span-bounding-box pairs. Thus, it can be used in many location-aware mono/multimodal tasks, such as phrase grounding, referring expression comprehension, referring expression generation, and open-world object detection. ### Data Instance One instance is ```python { 'key': '000373938', 'clip_similarity_vitb32': 0.353271484375, 'clip_similarity_vitl14': 0.2958984375, 'id': 1795296605919, 'url': "https://www.thestrapsaver.com/wp-content/uploads/customerservice-1.jpg", 'caption': 'a wire hanger with a paper cover that reads we heart our customers', 'width': 1024, 'height': 693, 'noun_chunks': [[19, 32, 0.019644069503434333, 0.31054004033406574, 0.9622142865754519, 0.9603442351023356, 0.79298526], [0, 13, 0.019422357885505368, 0.027634161214033764, 0.9593302408854166, 0.969467560450236, 0.67520964]], 'ref_exps': [[19, 66, 0.019644069503434333, 0.31054004033406574, 0.9622142865754519, 0.9603442351023356, 0.79298526], [0, 66, 0.019422357885505368, 0.027634161214033764, 0.9593302408854166, 0.969467560450236, 0.67520964]] } ``` - `key`: The generated file name when using img2dataset to download COYO-700M (omit it). - `clip_similarity_vitb32`: The cosine similarity between text and image(ViT-B/32) embeddings by [OpenAI CLIP](https://github.com/openai/CLIP), provided by COYO-700M. - `clip_similarity_vitl14`: The cosine similarity between text and image(ViT-L/14) embeddings by [OpenAI CLIP](https://github.com/openai/CLIP), provided by COYO-700M. - `id`: Unique 64-bit integer ID in COYO-700M. - `url`: The image URL. - `caption`: The corresponding caption. - `width`: The width of the image. - `height`: The height of the image. - `noun_chunks`: The noun chunks (extracted by [spaCy](https://spacy.io/)) that have associated bounding boxes (predicted by [GLIP](https://github.com/microsoft/GLIP)). The items in the children list respectively represent 'Start of the noun chunk in caption', 'End of the noun chunk in caption', 'normalized x_min', 'normalized y_min', 'normalized x_max', 'normalized y_max', 'confidence score'. - `ref_exps`: The corresponding referring expressions. If a noun chunk has no expansion, we just copy it. ### Download image We recommend to use [img2dataset](https://github.com/rom1504/img2dataset) tool to download the images. 1. Download the metadata. You can download it by cloning current repository: ```bash git lfs install git clone https://huggingface.co/datasets/zzliang/GRIT ``` 2. Install [img2dataset](https://github.com/rom1504/img2dataset). ```bash pip install img2dataset ``` 3. Download images You need to replace `/path/to/GRIT_dataset/grit-20m` with the local path to this repository. ```bash img2dataset --url_list /path/to/GRIT_dataset/grit-20m --input_format "parquet"\ --url_col "url" --caption_col "caption" --output_format webdataset \ --output_folder /tmp/grit --processes_count 4 --thread_count 64 --image_size 256 \ --resize_only_if_bigger=True --resize_mode="keep_ratio" --skip_reencode=True \ --save_additional_columns '["id","noun_chunks","ref_exps","clip_similarity_vitb32","clip_similarity_vitl14"]' \ --enable_wandb False ``` You can adjust some parameters according to your actual needs (e.g., `processes_count`, `thread_count`, `image_size`, `save_additional_columns`). More img2dataset hyper-parameters can be found in [here](https://github.com/rom1504/img2dataset#api). ### Citation Information If you apply this dataset to any project and research, please cite our paper and coyo-700m: ``` @article{Kosmos2, title={Kosmos-2: Grounding Multimodal Large Language Models to the World}, author={Zhiliang Peng and Wenhui Wang and Li Dong and Yaru Hao and Shaohan Huang and Shuming Ma and Furu Wei}, journal={ArXiv}, year={2023}, volume={abs/2306.14824} } @misc{kakaobrain2022coyo-700m, title = {COYO-700M: Image-Text Pair Dataset}, author = {Minwoo Byeon, Beomhee Park, Haecheon Kim, Sungjun Lee, Woonhyuk Baek, Saehoon Kim}, year = {2022}, howpublished = {\url{https://github.com/kakaobrain/coyo-dataset}}, } ```

# GRIT:锚定型图像-文本对大规模训练语料库 ### 数据集说明 - **项目仓库**:[Microsoft unilm](https://github.com/microsoft/unilm/tree/master/kosmos-2) - **相关论文**:[Kosmos-2](https://arxiv.org/abs/2306.14824) ### 数据集概览 本文提出GRIT,一款大规模锚定型图像-文本对数据集,其构建依托于[COYO-700M](https://github.com/kakaobrain/coyo-dataset)与LAION-2B中的图像-文本对。我们搭建了一套完整流程,用于提取图像标题中的文本片段(即名词短语与指代表达),并将其与对应图像区域进行锚定关联。更多细节可查阅[相关论文](https://arxiv.org/abs/2306.14824)。 ### 支持任务 在数据集构建过程中,我们剔除了未保留边界框的图像-标题对,由此得到了COYO-700M的高质量子集,后续我们将对该子集进行验证。 此外,本数据集包含文本片段-边界框配对数据,因此可适用于多种位置感知的单模态/多模态任务,例如短语锚定(phrase grounding)、指代表达理解、指代表达生成以及开放世界目标检测。 ### 数据实例 单条数据实例如下: python { 'key': '000373938', 'clip_similarity_vitb32': 0.353271484375, 'clip_similarity_vitl14': 0.2958984375, 'id': 1795296605919, 'url': "https://www.thestrapsaver.com/wp-content/uploads/customerservice-1.jpg", 'caption': 'a wire hanger with a paper cover that reads we heart our customers', 'width': 1024, 'height': 693, 'noun_chunks': [[19, 32, 0.019644069503434333, 0.31054004033406574, 0.9622142865754519, 0.9603442351023356, 0.79298526], [0, 13, 0.019422357885505368, 0.027634161214033764, 0.9593302408854166, 0.969467560450236, 0.67520964]], 'ref_exps': [[19, 66, 0.019644069503434333, 0.31054004033406574, 0.9622142865754519, 0.9603442351023356, 0.79298526], [0, 66, 0.019422357885505368, 0.027634161214033764, 0.9593302408854166, 0.969467560450236, 0.67520964]] } - `key`:使用img2dataset下载COYO-700M时生成的文件名(本字段可忽略)。 - `clip_similarity_vitb32`:由COYO-700M提供的、基于[OpenAI CLIP](https://github.com/openai/CLIP)的文本与图像(ViT-B/32)嵌入向量间的余弦相似度。 - `clip_similarity_vitl14`:由COYO-700M提供的、基于[OpenAI CLIP](https://github.com/openai/CLIP)的文本与图像(ViT-L/14)嵌入向量间的余弦相似度。 - `id`:COYO-700M中的唯一64位整数标识符。 - `url`:图像的下载链接。 - `caption`:对应的图像标题。 - `width`:图像的宽度。 - `height`:图像的高度。 - `noun_chunks`:已关联边界框的名词短语(由[spaCy](https://spacy.io/)提取,边界框由[GLIP](https://github.com/microsoft/GLIP)预测)。子列表中的元素依次代表:「标题中名词短语的起始位置」、「标题中名词短语的结束位置」、「归一化x_min」、「归一化y_min」、「归一化x_max」、「归一化y_max」以及「置信度得分」。 - `ref_exps`:对应的指代表达。若名词短语无扩展形式,则直接复制该名词短语。 ### 图像下载 我们推荐使用[img2dataset](https://github.com/rom1504/img2dataset)工具完成图像下载,具体步骤如下: 1. 下载元数据:可通过克隆本仓库获取: bash git lfs install git clone https://huggingface.co/datasets/zzliang/GRIT 2. 安装img2dataset: bash pip install img2dataset 3. 下载图像 你需要将命令中的`/path/to/GRIT_dataset/grit-20m`替换为该仓库的本地路径。 bash img2dataset --url_list /path/to/GRIT_dataset/grit-20m --input_format "parquet" --url_col "url" --caption_col "caption" --output_format webdataset --output_folder /tmp/grit --processes_count 4 --thread_count 64 --image_size 256 --resize_only_if_bigger=True --resize_mode="keep_ratio" --skip_reencode=True --save_additional_columns '["id","noun_chunks","ref_exps","clip_similarity_vitb32","clip_similarity_vitl14"]' --enable_wandb False 你可根据实际需求调整部分参数(例如`processes_count`、`thread_count`、`image_size`、`save_additional_columns`)。更多img2dataset的超参数可查阅[官方文档](https://github.com/rom1504/img2dataset#api)。 ### 引用信息 若您将本数据集用于项目或研究,请引用如下论文与COYO-700M: @article{Kosmos2, title={Kosmos-2: Grounding Multimodal Large Language Models to the World}, author={Zhiliang Peng and Wenhui Wang and Li Dong and Yaru Hao and Shaohan Huang and Shuming Ma and Furu Wei}, journal={ArXiv}, year={2023}, volume={abs/2306.14824} } @misc{kakaobrain2022coyo-700m, title = {COYO-700M: Image-Text Pair Dataset}, author = {Minwoo Byeon, Beomhee Park, Haecheon Kim, Sungjun Lee, Woonhyuk Baek, Saehoon Kim}, year = {2022}, howpublished = {url{https://github.com/kakaobrain/coyo-dataset}}, }
提供机构:
maas
创建时间:
2024-06-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作