LLaVA-CoT-o1-Instruct
收藏魔搭社区2026-01-06 更新2024-12-07 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/LLaVA-CoT-o1-Instruct
下载链接
链接失效反馈官方服务:
资源简介:
<img src="https://lh3.googleusercontent.com/pw/AP1GczOZcqfVlQgLnp246CO1Lfv84ezhM2DXW6_p1UYaymdXdzpFfHjnV3OR-8GZgelNXJfOjSuYe2xI-SeZv1czEXPjluV75gotsJGsFTi4n-XyQO6Ka20=w665-h666-no" alt="Italian Trulli">
# Example1
<img src="https://lh3.googleusercontent.com/pw/AP1GczNXIRoSBttb0JTYt1Dpuw15OQOQYgkITB8pDqmhVp_1QllVDUzo69cGAw_sJwrmil6azEDzcpDiRnGyF_kI00I4Squ-03wb5GJIlF-HkW035z_L0CU=w769-h356-no" alt="Italian Trulli">
## Input:
Please answer the question below, explaining your reasoning step by step before providing the final answer.
Question: Are there enough straws for every cup ?
A. yes
B. no
## Output:
`<SUMMARY/>`The question asks whether there are enough straws to provide one for each cup depicted in an image.
To answer, we <span style="color:blue">need to count</span> the number of straws and cups separately and then <span style="color:blue">compare those quantities</span>.`</SUMMARY>`
`<CAPTION>`The image shows three red-and-white striped straws and five blue-and-white striped cups filled with a brown liquid.
The cups are arranged in a horizontal row, and the straws are arranged vertically above them.`</CAPTION>`
`<REASONING>`1. **Identify the objects:** The image contains straws and cups. We need to determine the number of each.
2. **Count the straws:** Carefully examine the image. There are <span style="color:blue">three straws</span> in total. Each straw is distinct and easily identifiable.
3. **Count the cups:** Now count the number of cups. There are <span style="color:blue">five cups</span>, each filled with a dark-brown liquid and featuring a similar striped design.
4. **Compare quantities:** We compare the number of straws (3) to the number of cups (5). <span style="color:blue">The number of cups (5) is greater than the number of straws (3)</span>.
5. **Draw a conclusion:** Since there are fewer straws than cups, there <span style="color:blue">aren't enough</span> straws for every cup to have one.`</REASONING>`
`<CONCLUSION>`B. no`</CONCLUSION>`
## Llava-o1's creator just gave us a like. Thank you for your interesting contributions to the community 😊!
<img src="https://lh3.googleusercontent.com/pw/AP1GczNFuVrsyxgu9rQNRUJP0xjZDJQv_tZJfDyrkob6NBiGeSBTqwxG_yRaMLPWpK_P-kqurYhHjRlms5TZXtFQO0aQfi3BmVnRp53S3loreQ4kCuGhthw=w769-h272-no" alt="Italian Trulli" width=400>
## References
G. Xu, P. Jin, H. Li, Y. Song, L. Sun, and L. Yuan, ‘LLaVA-CoT: Let Vision Language Models Reason Step-by-Step’, arXiv [cs.CV]. 2024.
W. Wang et al., ‘Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization’, arXiv [cs.CL]. 2024.
**We are continuing to refine reasoning errors in this dataset using Gemini Pro to create a complete DPO dataset. Stay tuned!**
<img src="https://lh3.googleusercontent.com/pw/AP1GczOZcqfVlQgLnp246CO1Lfv84ezhM2DXW6_p1UYaymdXdzpFfHjnV3OR-8GZgelNXJfOjSuYe2xI-SeZv1czEXPjluV75gotsJGsFTi4n-XyQO6Ka20=w665-h666-no" alt="Italian Trulli">
# 示例1
<img src="https://lh3.googleusercontent.com/pw/AP1GczNXIRoSBttb0JTYt1Dpuw15OQOQYgkITB8pDqmhVp_1QllVDUzo69cGAw_sJwrmil6azEDzcpDiRnGyF_kI00I4Squ-03wb5GJIlF-HkW035z_L0CU=w769-h356-no" alt="Italian Trulli">
## 输入:
请解答以下问题,请在给出最终答案前逐步阐释推理过程。
问题:是否每个杯子都配有足够的吸管?
A. 是
B. 否
## 输出:
`<SUMMARY/>`本问题旨在判断图片中展示的吸管数量是否足以匹配每一个杯子。要解答该问题,我们需要<span style="color:blue">分别计数</span>吸管与杯子的数量,随后<span style="color:blue">对二者数量进行比对</span>。`</SUMMARY>`
`<CAPTION>`该图片展示了3根红白条纹吸管,以及5只盛有棕褐色液体的蓝白条纹纸杯。纸杯呈水平排列,吸管则垂直放置于纸杯上方。`</CAPTION>`
`<REASONING>`1. **确定目标物体**:图片中包含吸管与杯子两类物体,需分别统计二者的数量。
2. **计数吸管数量**:仔细观察图片,可识别出<span style="color:blue">总计3根吸管</span>,每根吸管均清晰可辨。
3. **计数杯子数量**:接下来统计杯子的总数,可见<span style="color:blue">共计5只杯子</span>,每只杯子均盛有深棕褐色液体,且带有统一的条纹装饰。
4. **比对数量差异**:将吸管数量(3)与杯子数量(5)进行比对,<span style="color:blue">杯子的数量(5)多于吸管的数量(3)</span>。
5. **得出结论**:由于吸管数量少于杯子数量,因此<span style="color:blue">无法为每个杯子都配备一根吸管</span>。`</REASONING>`
`<CONCLUSION>`B. 否`</CONCLUSION>`
Llava-o1的创作者刚刚为我们点了赞。感谢您为社区带来的精彩分享😊!
<img src="https://lh3.googleusercontent.com/pw/AP1GczNFuVrsyxgu9rQNRUJP0xjZDJQv_tZJfDyrkob6NBiGeSBTqwxG_yRaMLPWpK_P-kqurYhHjRlms5TZXtFQO0aQfi3BmVnRp53S3loreQ4kCuGhthw=w769-h272-no" alt="Italian Trulli" width=400>
## 参考文献
G. Xu、P. Jin、H. Li、Y. Song、L. Sun 与 L. Yuan,《LLaVA-CoT:让视觉语言模型逐步完成推理》,arXiv [cs.CV],2024年。
W. Wang 等人,《通过混合偏好优化提升多模态大语言模型的推理能力》,arXiv [cs.CL],2024年。
**我们正借助Gemini Pro优化本数据集内的推理错误,以构建完整的直接偏好优化(Direct Preference Optimization, DPO)数据集。敬请期待!**
提供机构:
maas
创建时间:
2024-12-03



