Large Language Model Responses to the Short Phronesis Measure
收藏DataCite Commons2026-05-04 更新2026-05-18 收录
下载链接:
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/DKKZMR
下载链接
链接失效反馈官方服务:
资源简介:
Phronesis, the ability to perceive morally important aspects of situations, regulate emotions in the context of sound judgment, and act in accordance with one's deepest values, has historically been considered a uniquely human capacity. This study examines whether large language models (LLMs) generate response patterns that are consistent with Aristotelian phronesis, as operationalized by the Short Phronesis Measure (SPM), and whether those patterns are sensitive to prompt framing. 55 LLM variants from 9 major model families (Claude, ChatGPT, Gemini, Grok, Copilot, DeepSeek, Mistral, Perplexity, and Llama) were evaluated on all SPM items under two conditions, including a standard baseline prompt and an unbiased prompting. LLM responses were compared to a human sample (n = 1,985). Data were analyzed using ANOVA, Mann-Whitney U tests, Pearson correlation, t-tests, Cronbach's Alpha, and Omega coefficients. The findings revealed that LLMs produced high scores across all 10 phronesis subscales, significantly exceeding human normative values. Unbiased prompting resulted in significant and systematic reductions in most deliberative, identity, and affective subscales (|d| = 0.645-1.251) and perceptual subscales remaining unchanged. Cronbach's alpha and Omega analysis found near-perfect internal consistency on most subscales, with a significant improvement in Virtue Identification reliability from unacceptable (α =.314) to acceptable (α =.758, ω = .781) under unbiased prompting. These findings indicate that LLMs simulate phronesis dimensions. Despite scoring higher than humans on all 10 subscales, many LLMs still have room for improvement in capturing the nuanced, experience-based nature of phronesis.
提供机构:
Harvard Dataverse
创建时间:
2026-05-04



