five

PLM-Image-Auto

收藏
魔搭社区2025-12-05 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/facebook/PLM-Image-Auto
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for PLM-Image Auto [\[📃 Tech Report\]](https://arxiv.org/abs/2504.13180) [\[📂 Github\]](https://github.com/facebookresearch/perception_models/) Sythetic image captions and QAs used in PLM, please refer to the paper, Section 3, for more details. The sythetic annotations covers: SA1B, Openimages, Obejct365, ArxivQA, UCSF, PDFAcc. ## Dataset Structure ### Image Captions (SA1B, Openimages, Obejct365) Data fields are : - `image_id`: a `string` feature, unique identifier for the image. - `image`: a `string` feature, the actual image path in the correspoding data folder. - `conversations`: a `list` of `dict` feature, with the actual prompt and caption used to train PLM. A sample from SA1B Captions looks as follows: ``` { "image_id": "sa_1488824.jpg", "image": "segment_everything/release_040523_batched/batch_133/sa_1488824.jpg", "conversations": [ {"from": "human", "value": "<image>\nDescribe the image."}, { "from": "assistant", "value": "The image depicts a cityscape with a prominent tower in the center, surrounded by various buildings and infrastructure. * The tower is the main focus of the image, standing tall and slender with a distinctive shape. It has a series of rings or platforms around its middle section, giving it a unique appearance. * The tower is surrounded by a mix of old and new buildings, including a large gray building on the right side of the image. There are also several smaller buildings and structures in the background, adding to the urban landscape. * In the foreground, there is a canal or waterway that runs along the bottom of the image. It is lined with greenery and has a few trees and bushes along its banks. * On the left side of the image, there is a bridge or overpass that spans the canal. It has a metal railing and appears to be a pedestrian or bicycle path. * The sky above the city is clear and blue, with the sun shining brightly in the top center of the image. This creates a sense of warmth and energy in the scene. * Overall, the image presents a vibrant and dynamic cityscape, with a mix of old and new architecture, natural elements, and human activity.", }, ], } ``` ### Image Captions and QA (ArxivQA) Data fields are : - `uuid`: a `string` feature, unique identifier for the sample. - `image`: a `string` feature, the actual image path in the correspoding data folder. For instance `2004.07107/figure_0_0.jpg` represent the paper id `2004.07107`, Page 0, and Figure 0. - `conversations`: a `list` of `dict` feature, with the actual conversation used to train PLM. A sample from SA1B Captions looks as follows: ``` { "image": "2004.07107/figure_0_0.jpg" "uuid": "8caa2cf3-d16f-42e2-b94c-e866f87ec77c", "conversations": [ {"from": "human", "value": "<image>\nExplain the image in detail."}, {"from": "assistant", "value": "The image is a diagram of the chemical potential of a single-particle excitation in a material. The diagram shows the energy gap between the valence and conduction states, with the chemical potential (\u03bc) marked at the center. The diagram also shows the single-particle excitation, which is represented by an orange dot above the conduction state.\n\n**Key Features:**\n\n* **Chemical Potential (\u03bc):** Marked at the center of the diagram, representing the energy level at which the number of electrons in the material is equal to the number of holes.\n* **Energy Gap:** The energy difference between the valence and conduction states, represented by a horizontal line.\n* **Valence State:** The energy level of the valence electrons, represented by a gray box on the right side of the diagram.\n* **Conduction State:** The energy level of the conduction electrons, represented by a gray box on the left side of the diagram.\n* **Single-Particle Excitation:** Represented by an orange dot above the conduction state, indicating the excitation of a single electron from the valence state to the conduction state.\n* **Single-Hole Excitation:** Represented by a green circle below the valence state, indicating the creation of a hole in the valence state.\n\n**Relationships:**\n\n* The chemical potential (\u03bc) is located at the center of the diagram, indicating that it is the average energy level of the electrons in the material.\n* The energy gap between the valence and conduction states is represented by a horizontal line, indicating that it is a fixed energy difference.\n* The single-particle excitation is represented by an orange dot above the conduction state, indicating that it is an excitation of a single electron from the valence state to the conduction state.\n* The single-hole excitation is represented by a green circle below the valence state, indicating that it is the creation of a hole in the valence state.\n\nOverall, the diagram provides a clear representation of the chemical potential and energy gap in a material, as well as the single-particle and single-hole excitations that occur within it."}, {"from": "human", "value": "What is represented by the orange dot above the conduction state in the diagram?\nOptions:\n(A) Energy gap\n(B) Chemical potential\n(C) Single-hole excitation\n(D) Single-particle excitation\n"}, {"from": "assistant", "value": "(D) Single-particle excitation."}, ], } ``` ### Document QA (UCSF, PDFAcc) Data fields are : - `uuid`: a `string` feature, unique identifier for the sample. - `image`: a `string` feature, the actual image path in the correspoding data folder. For instance `pdfa-eng-train-0045/3018390_0.jpg` represent the paper shard `pdfa-eng-train-0045`, Document 3018390, and Page 0. - `conversations`: a `list` of `dict` feature, with the actual conversation used to train PLM. A sample from SA1B Captions looks as follows: ``` { "uuid": "886d95d3-8e13-4b55-ab3a-3517d4edb1b4_3018390_page_0", "image": "pdfa-eng-train-0045/3018390_0.jpg", "conversations": [ {"from": "human", "value": "<image>\nHow can I get more information about the brunch buffet offer?"}, {"from": "assistant", "value": "By contacting the Private Dining Director, Jamie Cawood, at 859.977.2602 or jamiec@bhglex.com"}, {"from": "human", "value": "What is the overall content of this document?"}, {"from": "assistant", "value": "The document describes a brunch buffet offer, including menu items, pricing, and additional fees, as well as contact information for the Private Dining Director."}, {"from": "human", "value": "What is the price per person for the brunch buffet?"}, {"from": "assistant", "value": "$24"}, {"from": "human", "value": "What is the additional cost for the Filet Medallion Station?"}, {"from": "assistant", "value": "$5"}, {"from": "human", "value": "What additional fees will be added to the total sales?"}, {"from": "assistant", "value": "A 14% facility fee, $100 attendant fee, and 6% sales tax"} ] } ``` ### Data Stats | | Sample | Type | | ----------- | ----------- | ----------- | | sa1b | 9360168 | Captions | | obj365 | 3439046 | MCQAs | | openimages | 1740864 | Captions | | arxivqa | 1859680 | Captions+ QAs | | pdfacc | 12024670 | QAs | | ucsf | 5953490 | QAs | ### Licensing Information This data is an output from Llama 3.2, and subject to the Llama 3.2 license (https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE). Use of the data to train, fine tune, or otherwise improve an AI model, which is distributed or made available, shall also include "Llama" at the beginning of any such AI model name. ### Citation Information Cite as: ``` @article{cho2025PerceptionLM, title={PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding}, author={Jang Hyun Cho and Andrea Madotto and Effrosyni Mavroudi and Triantafyllos Afouras and Tushar Nagarajan and Muhammad Maaz and Yale Song and Tengyu Ma and Shuming Hu and Hanoona Rasheed and Peize Sun and Po-Yao Huang and Daniel Bolya and Suyog Jain and Miguel Martin and Huiyu Wang and Nikhila Ravi and Shashank Jain and Temmy Stark and Shane Moon and Babak Damavandi and Vivian Lee and Andrew Westbury and Salman Khan and Philipp Kr\"{a}henb\"{u}hl and Piotr Doll{\'a}r and Lorenzo Torresani and Kristen Grauman and Christoph Feichtenhofer}, journal={arXiv}, year={2025} } ```

# PLM-Image Auto 数据集卡片 [📃 技术报告](https://arxiv.org/abs/2504.13180) [📂 Github 仓库](https://github.com/facebookresearch/perception_models/) 本数据集包含用于PLM(Perception Language Model)的合成图像描述与问答数据,详细说明请参阅论文第3节。合成标注数据集涵盖SA1B、OpenImages、Object365、ArxivQA、UCSF及PDFAcc。 ## 数据集结构 ### 图像描述(SA1B、OpenImages、Object365) 数据字段包括: - `image_id`:字符串类型特征,为图像的唯一标识符。 - `image`:字符串类型特征,对应数据文件夹中的实际图像路径。 - `conversations`:字典列表类型特征,包含用于训练PLM的实际提示词与图像描述内容。 SA1B图像描述的示例如下: { "image_id": "sa_1488824.jpg", "image": "segment_everything/release_040523_batched/batch_133/sa_1488824.jpg", "conversations": [ {"from": "human", "value": "<image> 描述该图像。"}, { "from": "assistant", "value": "该图像展现了一处城市景观,中央矗立着一座显眼的塔楼,周围环绕着各类建筑与基础设施。* 塔楼是画面的核心焦点,修长挺拔且造型独特,中部环绕着一系列环形平台或结构,外观别具一格。* 塔楼周边新旧建筑错落分布,画面右侧可见一栋大型灰色建筑,背景中还有多栋小型建筑与设施,共同构成了都市风貌。* 画面前景处有一条运河或水道沿底部延伸,沿岸点缀着绿植与些许树木灌木丛。* 画面左侧有一座桥梁或过街通道横跨水道,配有金属栏杆,看起来可供行人或自行车通行。* 城市上空晴朗湛蓝,太阳位于画面顶部中央,洒下明媚的光线,为场景增添了温暖与活力。* 整体而言,这幅图像呈现出充满活力的都市景观,融合了新旧建筑、自然元素与人类活动痕迹。", }, ], } ### 图像描述与问答(ArxivQA) 数据字段包括: - `uuid`:字符串类型特征,为样本的唯一标识符。 - `image`:字符串类型特征,对应数据文件夹中的实际图像路径。例如`2004.07107/figure_0_0.jpg`代表论文ID为`2004.07107`的第0页第0幅图。 - `conversations`:字典列表类型特征,包含用于训练PLM的实际对话内容。 ArxivQA的示例如下: { "image": "2004.07107/figure_0_0.jpg", "uuid": "8caa2cf3-d16f-42e2-b94c-e866f87ec77c", "conversations": [ {"from": "human", "value": "<image> 详细解释该图像。"}, {"from": "assistant", "value": "该图像为某材料中单粒子激发的化学势示意图。图中展示了价态与传导态之间的能隙,中心处标记了化学势(μ)。同时还展示了单粒子激发,以传导态上方的橙色圆点表示。 **核心特征:** * **化学势(μ):** 标记在示意图中心,代表材料中电子数与空穴数相等时的能级。 * **能隙:** 价态与传导态之间的能量差,以一条水平线表示。 * **价态:** 价电子的能级,以示意图右侧的灰色方框表示。 * **传导态:** 传导电子的能级,以示意图左侧的灰色方框表示。 * **单粒子激发:** 以传导态上方的橙色圆点表示,代表单个电子从价态激发至传导态的过程。 * **单空穴激发:** 以价态下方的绿色圆圈表示,代表价态中产生一个空穴的过程。 **相互关系:** * 化学势(μ)位于示意图中心,代表材料中电子的平均能级。 * 价态与传导态之间的能隙以水平线表示,代表固定的能量差值。 * 单粒子激发以传导态上方的橙色圆点表示,代表单个电子从价态激发至传导态的过程。 * 单空穴激发以价态下方的绿色圆圈表示,代表价态中产生一个空穴的过程。 总体而言,该示意图清晰展示了材料中的化学势与能隙,以及其中发生的单粒子与单空穴激发过程。"}, {"from": "human", "value": "该图中传导态上方的橙色圆点代表什么? 选项: (A) 能隙 (B) 化学势 (C) 单空穴激发 (D) 单粒子激发 "}, {"from": "assistant", "value": "(D) 单粒子激发。"}, ], } ### 文档问答(UCSF、PDFAcc) 数据字段包括: - `uuid`:字符串类型特征,为样本的唯一标识符。 - `image`:字符串类型特征,对应数据文件夹中的实际图像路径。例如`pdfa-eng-train-0045/3018390_0.jpg`代表文档分片`pdfa-eng-train-0045`中的文档3018390的第0页。 - `conversations`:字典列表类型特征,包含用于训练PLM的实际对话内容。 UCSF文档问答的示例如下: { "uuid": "886d95d3-8e13-4b55-ab3a-3517d4edb1b4_3018390_page_0", "image": "pdfa-eng-train-0045/3018390_0.jpg", "conversations": [ {"from": "human", "value": "<image> 我如何获取更多关于早午餐自助餐套餐的信息?"}, {"from": "assistant", "value": "可联系私人餐饮主管Jamie Cawood,电话:859.977.2602,邮箱:jamiec@bhglex.com"}, {"from": "human", "value": "该文档的整体内容是什么?"}, {"from": "assistant", "value": "该文档介绍了早午餐自助餐套餐,包含菜品、定价及额外费用,并提供了私人餐饮主管的联系方式。"}, {"from": "human", "value": "早午餐自助餐的人均价格是多少?"}, {"from": "assistant", "value": "24美元"}, {"from": "human", "value": "菲力Medallion套餐额外收取多少费用?"}, {"from": "assistant", "value": "5美元"}, {"from": "human", "value": "总销售额中还需添加哪些额外费用?"}, {"from": "assistant", "value": "14%的场地费、100美元的服务人员费用以及6%的销售税"} ] } ### 数据统计 | | 样本数量 | 数据类型 | | ----------- | ----------- | ----------- | | SA1B | 9360168 | 图像描述 | | Object365 | 3439046 | 多项选择问答 | | OpenImages | 1740864 | 图像描述 | | ArxivQA | 1859680 | 图像描述+问答 | | PDFAcc | 12024670 | 问答 | | UCSF | 5953490 | 问答 | ### 许可信息 本数据为Llama 3.2的生成产物,受Llama 3.2许可协议(https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE)约束。若使用该数据训练、微调或以其他方式改进人工智能模型并进行分发或公开提供,则此类人工智能模型的名称需以“Llama”作为前缀。 ### 引用信息 引用格式如下: @article{cho2025PerceptionLM, title={PerceptionLM: 面向精细视觉理解的开源数据与模型}, author={Jang Hyun Cho and Andrea Madotto and Effrosyni Mavroudi and Triantafyllos Afouras and Tushar Nagarajan and Muhammad Maaz and Yale Song and Tengyu Ma and Shuming Hu and Hanoona Rasheed and Peize Sun and Po-Yao Huang and Daniel Bolya and Suyog Jain and Miguel Martin and Huiyu Wang and Nikhila Ravi and Shashank Jain and Temmy Stark and Shane Moon and Babak Damavandi and Vivian Lee and Andrew Westbury and Salman Khan and Philipp Krähenbühl and Piotr Dollár and Lorenzo Torresani and Kristen Grauman and Christoph Feichtenhofer}, journal={arXiv}, year={2025} }
提供机构:
maas
创建时间:
2025-05-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作