five

Supplementary data for the paper 'Putting ChatGPT Vision (GPT-4V) to the test: Risk perception in traffic images'

收藏
4TU.ResearchData2024-04-16 更新2026-04-23 收录
下载链接:
https://data.4tu.nl/datasets/dfbe6de4-d559-49cd-a7c6-9bebe5d43d50/2
下载链接
链接失效反馈
官方服务:
资源简介:
Vision-language models are of interest in various domains, including automated driving, where computer vision techniques can accurately detect road users, but where the vehicle sometimes fails to understand context. This study examined the effectiveness of GPT-4V in predicting the level of ‘risk’ in traffic images as assessed by humans. We used 210 static images taken from a moving vehicle, each previously rated by approximately 650 people. Based on psychometric construct theory and using insights from the self-consistency prompting method, we formulated three hypotheses: 1) repeating the prompt under effectively identical conditions increases validity, 2) varying the prompt text and extracting a total score increases validity compared to using a single prompt, and 3) in a multiple regression analysis, the incorporation of object detection features, alongside the GPT-4V-based risk rating, significantly contributes to improving the model’s validity. Validity was quantified by the correlation coefficient with human risk scores, across the 210 images. The results confirmed the three hypotheses. The eventual validity coefficient was r = 0.83, indicating that population-level human risk can be predicted using AI with a high degree of accuracy. The findings suggest that GPT-4V must be prompted in a way equivalent to how humans fill out a multi-item questionnaire.

视觉语言模型(Vision-language models)在诸多领域受到广泛关注,其中包括自动驾驶领域:尽管计算机视觉技术可精准识别道路使用者,但自动驾驶车辆时常难以理解场景上下文。本研究针对GPT-4V在预测交通图像中人类标注的“风险等级”方面的有效性展开探究。我们使用了210张来自移动车辆的静态图像,每张图像此前均由约650名受试者完成评分。本研究基于心理计量建构理论(psychometric construct theory),并结合自一致性提示方法(self-consistency prompting method)的研究思路,提出三项假设:其一,在近乎完全一致的条件下重复提示可提升模型效度;其二,相较于单一提示,调整提示文本并提取综合评分可提升模型效度;其三,在多元回归分析中,将目标检测特征与基于GPT-4V的风险评分相结合,可显著改善模型效度。针对210张图像,模型效度通过与人类风险评分的相关系数进行量化。实验结果验证了上述三项假设,最终得到的效度系数r=0.83,表明借助人工智能可高精度地预测群体层面的人类风险感知结果。本研究结果提示,对GPT-4V的提示方式应与人类填写多条目问卷的流程保持一致。
提供机构:
Driessen, Tom
创建时间:
2024-04-16
二维码
社区交流群
二维码
科研交流群
商业服务