Question Categories.

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://figshare.com/articles/dataset/Question_Categories_/28515103

下载链接

链接失效反馈

官方服务：

资源简介：

Purpose The present study compared the performance of a Large Language Model (LLM; ChatGPT) and human interviewers in interviewing children about a mock-event they witnessed. Methods Children aged 6-8 (N = 78) were randomly assigned to the LLM (n = 40) or the human interviewer condition (n = 38). In the experiment, the children were asked to watch a video filmed by the researchers that depicted behavior including elements that could be misinterpreted as abusive in other contexts, and then answer questions posed by either an LLM (presented by a human researcher) or a human interviewer. Results Irrespective of condition, recommended (vs. not recommended) questions elicited more correct information. The LLM posed fewer questions overall, but no difference in the proportion of the questions recommended by the literature. There were no differences between the LLM and human interviewers in unique correct information elicited but questions posed by LLM (vs. humans) elicited more unique correct information per question. LLM (vs. humans) also elicited less false information overall, but there was no difference in false information elicited per question. Conclusions The findings show that the LLM was competent in formulating questions that adhere to best practice guidelines while human interviewers asked more questions following up on the child responses in trying to find out what the children had witnessed. The results indicate LLMs could possibly be used to support child investigative interviewers. However, substantial further investigation is warranted to ascertain the utility of LLMs in more realistic investigative interview settings.

研究目的本研究对比了大语言模型（Large Language Model，LLM，ChatGPT）与人类访谈者对目击模拟事件的儿童进行访谈时的表现。研究方法招募了78名6至8岁的儿童，将其随机分配至大语言模型组（n=40）与人类访谈者组（n=38）。实验过程中，要求儿童观看研究人员拍摄的一段视频，该视频所呈现的行为包含在其他情境下可能被误解为虐待行为的元素；随后，儿童需接受由人类研究人员呈现的大语言模型，或是人类访谈者提出的问题。研究结果无论处于何种实验条件，被推荐的问题（相较于未被推荐的问题）均能获取更多正确信息。大语言模型整体提出的问题数量更少，但文献推荐类问题的占比并无显著差异。在获取的独特正确信息总量上，大语言模型与人类访谈者并无显著差异；但相较人类访谈者，大语言模型提出的每个问题所能获取的独特正确信息更多。此外，大语言模型整体获取的虚假信息更少，但单位问题所获取的虚假信息并无显著差异。研究结论本研究结果表明，大语言模型能够精准构建符合最佳实践指南的访谈问题；而人类访谈者则会针对儿童的回应提出更多跟进问题，以进一步确认儿童所目击的内容。研究结果显示，大语言模型可用于辅助儿童调查访谈工作。不过，仍需开展大量后续研究，以明确大语言模型在更贴近现实的调查访谈场景中的应用价值。

创建时间：

2025-02-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集