Supplementary data for the paper 'Putting ChatGPT Vision (GPT-4V) to the test: Risk perception in traffic images'

Name: Supplementary data for the paper 'Putting ChatGPT Vision (GPT-4V) to the test: Risk perception in traffic images'
Creator: 4TU.ResearchData
Published: 2024-04-16 07:08:17
License: 暂无描述

DataCite Commons2024-04-16 更新2024-07-03 收录

下载链接：

https://data.4tu.nl/datasets/dfbe6de4-d559-49cd-a7c6-9bebe5d43d50

下载链接

链接失效反馈

官方服务：

资源简介：

Vision-language models are of interest in various domains, including automated driving, where computer vision techniques can accurately detect road users, but where the vehicle sometimes fails to understand context. This study examined the effectiveness of GPT-4V in predicting the level of ‘risk’ in traffic images as assessed by humans. We used 210 static images taken from a moving vehicle, each previously rated by approximately 650 people. Based on psychometric construct theory and using insights from the self-consistency prompting method, we formulated three hypotheses: 1) repeating the prompt under effectively identical conditions increases validity, 2) varying the prompt text and extracting a total score increases validity compared to using a single prompt, and 3) in a multiple regression analysis, the incorporation of object detection features, alongside the GPT-4V-based risk rating, significantly contributes to improving the model’s validity. Validity was quantified by the correlation coefficient with human risk scores, across the 210 images. The results confirmed the three hypotheses. The eventual validity coefficient was r = 0.83, indicating that population-level human risk can be predicted using AI with a high degree of accuracy. The findings suggest that GPT-4V must be prompted in a way equivalent to how humans fill out a multi-item questionnaire.

视觉语言模型在诸多领域均受到广泛关注，自动驾驶便是其中之一。在自动驾驶场景中，计算机视觉技术可精准检测道路参与者，但自动驾驶系统时常难以理解场景上下文。本研究探究了GPT-4V在预测交通图像中人类标注的"风险等级"方面的有效性。我们采用了210张从移动车辆中采集的静态图像，每张图像此前均由约650名受试者完成评分。本研究基于心理测量构念理论，并借鉴自一致性提示方法的研究思路，提出了三项假设：1. 在近乎完全一致的条件下重复提示，可提升模型效度；2. 相较于单次提示，调整提示文本并整合得到综合评分，可提升模型效度；3. 在多元回归分析中，将目标检测特征与基于GPT-4V的风险评分相结合，可显著提升模型效度。针对全部210张图像，模型效度通过与人类风险评分的相关系数进行量化。实验结果验证了上述三项假设。最终得到的效度相关系数为r = 0.83，表明借助人工智能可高精度预测群体层面的人类风险感知水平。本研究结果表明，对GPT-4V进行提示时，需采用与人类填写多条目问卷一致的方式。

提供机构：

4TU.ResearchData

创建时间：

2024-04-15