five

CoSyn-point

收藏
魔搭社区2025-11-27 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/CoSyn-point
下载链接
链接失效反馈
官方服务:
资源简介:
# CoSyn-point CoSyn-point is a collection of diverse computer-generated images that are annotated with queries and answer points. It can be used to train models to return points in the image in response to a user query. The data was created by using the [Claude large language model](https://claude.ai/) to generate code that can be executed to render an image, The code used to generate this data is [open source](https://github.com/allenai/pixmo-docs). Synthetic question-answer data is also available in a [seperate repo](https://huggingface.co/datasets/allenai/CoSyn-400k). Quick links: - 📃 [CoSyn Paper](https://arxiv.org/pdf/2502.14846) - 📃 [Molmo Paper](https://molmo.allenai.org/paper.pdf) ## Loading Load the data with: ```python point_dataset = datasets.load_dataset("allenai/CoSyn-point", split="train") ``` ## Data Format The rendered image is included in the dataset directly: ```python print(point_dataset[0]["image"]) # >>> PIL.PngImagePlugin.PngImageFile image mode=RGB size=2400x1200 at 0x7F362070CEB0> ``` Each image is matched with multiple query-point pairs: ```python for q, a in zip(point_dataset[0]["questions"], point_dataset[0]["answer_points"]): print(q, a) # >>> # Find the main title that introduces the storytelling platform for Italian football matches. {'x': [50.0], 'y': [5.9]} # Find the podcast host who provides commentary on the historic Milan vs. Inter derby match from 2001. {'x': [64.9], 'y': [49.1]} # Find the button that allows users to participate in match discussions with other fans. {'x': [14.8], 'y': [68.4]} # Find the score display of the historic Milan Derby where AC Milan achieved their remarkable victory. {'x': [53.7], 'y': [43.8]} # Find the poll option to indicate that the 1982 World Cup match between Italy and Brazil was the most impactful. {'x': [14.3], 'y': [74.3]} ``` The points are in normalized format where (0, 0) is the upper left and (100, 100) is the lower right. ## Splits The data is divided into validation and train splits. These splits are "unofficial" because we do not generally use this data for evaluation anyway. However, they reflect what we used when training. ## License This dataset is licensed by ODC-BY-1.0. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use). This dataset includes output images derived from code generated from Claude that are subject to Anthropic [terms of service](https://www.anthropic.com/legal/commercial-terms) and [usage policy](https://www.anthropic.com/legal/aup). The questions were generated from GPT-4o Mini and are subject to [separate terms](https://openai.com/policies/row-terms-of-use) governing their use. ## Citation Please cite the following papers if you use this code in your work. ```bibtex @article{yang2025scaling, title={Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation}, author={Yang, Yue and Patel, Ajay and Deitke, Matt and Gupta, Tanmay and Weihs, Luca and Head, Andrew and Yatskar, Mark and Callison-Burch, Chris and Krishna, Ranjay and Kembhavi, Aniruddha and others}, journal={arXiv preprint arXiv:2502.14846}, year={2025} } ``` ```bibtex @article{deitke2024molmo, title={Molmo and pixmo: Open weights and open data for state-of-the-art multimodal models}, author={Deitke, Matt and Clark, Christopher and Lee, Sangho and Tripathi, Rohun and Yang, Yue and Park, Jae Sung and Salehi, Mohammadreza and Muennighoff, Niklas and Lo, Kyle and Soldaini, Luca and others}, journal={arXiv preprint arXiv:2409.17146}, year={2024} } ```

# CoSyn-point CoSyn-point是一个包含多样化计算机生成图像的数据集,所有图像均附带查询与答案点标注。该数据集可用于训练模型,使其能够根据用户查询返回图像中的对应点坐标。 本数据集通过使用Claude大语言模型(Claude large language model)生成可执行代码来渲染图像而创建。用于生成该数据集的代码已开源,地址为:https://github.com/allenai/pixmo-docs。合成问答数据也可在独立仓库中获取:https://huggingface.co/datasets/allenai/CoSyn-400k。 快速链接: - 📃 [CoSyn研究论文](https://arxiv.org/pdf/2502.14846) - 📃 [Molmo研究论文](https://molmo.allenai.org/paper.pdf) ## 加载方法 使用以下代码加载数据: python point_dataset = datasets.load_dataset("allenai/CoSyn-point", split="train") ## 数据格式 渲染后的图像会直接包含在数据集中: python print(point_dataset[0]["image"]) # >>> PIL.PngImagePlugin.PngImageFile image mode=RGB size=2400x1200 at 0x7F362070CEB0> 每张图像均对应多组查询-点对: python for q, a in zip(point_dataset[0]["questions"], point_dataset[0]["answer_points"]): print(q, a) # >>> # 查找用于介绍意大利足球赛事转播平台的主标题。 {'x': [50.0], 'y': [5.9]} # 查找2001年米兰德比经典赛事的解说播客主持人。 {'x': [64.9], 'y': [49.1]} # 查找允许用户与其他球迷参与赛事讨论的按钮。 {'x': [14.8], 'y': [68.4]} # 查找AC米兰取得精彩胜利的那场米兰德比的比分显示区域。 {'x': [53.7], 'y': [43.8]} # 查找投票选项,用于标记1982年意大利与巴西的世界杯赛事为最具影响力的比赛。 {'x': [14.3], 'y': [74.3]} 点坐标采用归一化格式,其中(0, 0)代表图像左上角,(100, 100)代表图像右下角。 ## 数据集划分 该数据集分为验证集与训练集两个划分,由于本数据集通常不用于模型评估,因此这两个划分属于非官方划分。不过它们与我们训练模型时所使用的划分保持一致。 ## 授权协议 本数据集采用ODC-BY-1.0协议进行授权,仅可用于研究与教育用途,并需遵循艾伦人工智能研究所(Ai2)的《负责任使用指南》(https://allenai.org/responsible-use)。 本数据集包含由Claude生成的代码渲染得到的图像,其使用需遵守Anthropic的《服务条款》(https://www.anthropic.com/legal/commercial-terms)与《使用政策》(https://www.anthropic.com/legal/aup)。 数据集内的问答问题由GPT-4o Mini生成,其使用需遵守OpenAI的《单独使用条款》(https://openai.com/policies/row-terms-of-use)。 ## 引用说明 若在研究工作中使用本数据集,请引用以下论文: bibtex @article{yang2025scaling, title={Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation}, author={Yang, Yue and Patel, Ajay and Deitke, Matt and Gupta, Tanmay and Weihs, Luca and Head, Andrew and Yatskar, Mark and Callison-Burch, Chris and Krishna, Ranjay and Kembhavi, Aniruddha and others}, journal={arXiv preprint arXiv:2502.14846}, year={2025} } bibtex @article{deitke2024molmo, title={Molmo and pixmo: Open weights and open data for state-of-the-art multimodal models}, author={Deitke, Matt and Clark, Christopher and Lee, Sangho and Tripathi, Rohun and Yang, Yue and Park, Jae Sung and Salehi, Mohammadreza and Muennighoff, Niklas and Lo, Kyle and Soldaini, Luca and others}, journal={arXiv preprint arXiv:2409.17146}, year={2024} }
提供机构:
maas
创建时间:
2025-05-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作