Simulating Survey Respondents with Large Language Models through Impulse Variables and Persona Conditioning
收藏DataCite Commons2026-05-07 更新2025-04-15 收录
下载链接:
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/RQGKHW
下载链接
链接失效反馈官方服务:
资源简介:
Human survey samples are slow and expensive to recruit, and psychometric evidence on whether large language models (LLMs) can stand in for human respondents remains thin. This paper proposes a prompting framework that conditions the LLM on demographic variables, Big Five personality anchors, and what we call \emph{impulse variables}, a pragmatic device that assigns each synthetic persona a discrete (low/medium/high) level on the central constructs of a measurement model and counteracts the response homogeneity that frontier LLMs typically display. We validate the framework using GPT-4o-mini, ChatGPT (GPT-4o web interface), and Llama-3.1-70b on a published instrument measuring self-esteem and abusive supervision, and on a held-out networking instrument that was not present in any pre-training corpus. The synthetic data reproduce the directional structure of a recent meta-analysis with acceptable measurement fit, while still showing the inflated reliability and mild positive trait drift typical of LLM responses. The framework makes scale pre-testing and instrument refinement tractable at a fraction of human-sample recruitment cost, with explicit boundary conditions on where it should not be applied.
提供机构:
Harvard Dataverse
创建时间:
2024-09-13



