Descriptive Analyses and Correlations.
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Descriptive_Analyses_and_Correlations_/28515109
下载链接
链接失效反馈官方服务:
资源简介:
Purpose
The present study compared the performance of a Large Language Model (LLM; ChatGPT) and human interviewers in interviewing children about a mock-event they witnessed.
Methods
Children aged 6-8 (N = 78) were randomly assigned to the LLM (n = 40) or the human interviewer condition (n = 38). In the experiment, the children were asked to watch a video filmed by the researchers that depicted behavior including elements that could be misinterpreted as abusive in other contexts, and then answer questions posed by either an LLM (presented by a human researcher) or a human interviewer.
Results
Irrespective of condition, recommended (vs. not recommended) questions elicited more correct information. The LLM posed fewer questions overall, but no difference in the proportion of the questions recommended by the literature. There were no differences between the LLM and human interviewers in unique correct information elicited but questions posed by LLM (vs. humans) elicited more unique correct information per question. LLM (vs. humans) also elicited less false information overall, but there was no difference in false information elicited per question.
Conclusions
The findings show that the LLM was competent in formulating questions that adhere to best practice guidelines while human interviewers asked more questions following up on the child responses in trying to find out what the children had witnessed. The results indicate LLMs could possibly be used to support child investigative interviewers. However, substantial further investigation is warranted to ascertain the utility of LLMs in more realistic investigative interview settings.
一、研究目的
本研究对比了大语言模型(Large Language Model,LLM;ChatGPT)与人类面试官针对目击模拟事件的儿童开展访谈的表现。
二、研究方法
78名6至8岁儿童被随机分配至大语言模型组(n=40)或人类面试官组(n=38)。实验过程中,儿童需观看研究人员摄制的视频,该视频所呈现的行为包含在其他情境下可能被误判为虐待的元素;随后儿童需回答由大语言模型(由人类研究员呈现)或人类面试官提出的问题。
三、研究结果
无论所属组别如何,相较于非推荐性问题,推荐性问题均能获取更多正确信息。大语言模型整体提出的问题总量更少,但文献推荐类问题的占比与人类面试官无显著差异。在获取的独特正确信息总量上,大语言模型与人类面试官并无显著差异;但相较于人类面试官,大语言模型提出的单个问题所获取的独特正确信息更多。此外,大语言模型整体获取的虚假信息总量更少,但单位问题获取的虚假信息无显著差异。
四、研究结论
本研究结果表明,大语言模型能够胜任符合最佳实践规范的问题拟定工作,而人类面试官则会针对儿童的回应提出更多跟进问题,以探明儿童目击的内容。研究结果显示,大语言模型或可用于辅助儿童调查访谈工作。不过,仍需开展大量后续研究,以确认大语言模型在更贴近真实场景的调查访谈环境中的应用价值。
创建时间:
2025-02-28



