Evaluating Visual Moral Dilemmas in Human and LLMs: Insights from the Moral Machine Experiment
收藏DataONE2026-01-29 更新2026-02-07 收录
下载链接:
https://search.dataone.org/view/sha256:6eeb5f1768e94ebd9dfdcb39ddfd2dc05c605cde50df52634b420e42460d6e47
下载链接
链接失效反馈官方服务:
资源简介:
Abstract This study examined how large language models (LLMs) respond to visual moral dilemmas in the context of autonomous vehicles (AVs) and compared their preferences to human baseline patterns established by the Moral Machine Experiment. It examined 44 LLM variants for 13 visual moral machine dilemmas, evaluated the LLM's initial responses to the moral machine, compared baseline data to human responses, and investigated unbiased prompts and interventions. The study found significant differences between LLMs and humans in four dimensions: Status (Cohen's d = 0.428, p =.014), Law (Cohen's d = 0.680, p <.001), Age (Cohen's d = 0.487, p =.006), and Quantity (Cohen's d = -0.382, p =.030), which indicates that LLMs prioritize utility over legal compliance and social status. In addition, LLM families showed significant differences in Action (p =.007), Fitness (p <.001), and Age (p =.029). \"Unbiased\" prompting resulted in no significant changes (all p >.05), which indicates that moral preferences are deeply embedded in training rather than prompt sensitivity. Finally, interventions (celebrities, historical figures, criminals) had selective effects, with significant changes in species preferences (p =.013, η² =.066) and a statistically non-significant but practically meaningful effect on fitness (p =.059, η² =.044). These findings show that while current LLMs share some similarities with the original moral machine experiment and human data in this study, they additionally show some biases and differences, and deployment in safety-critical AV decision-making requires crucial refinement and further studies. Keywords: Large Language Models, AI Ethics, Moral Machine Experiment, Autonomous Vehicles, Human-AI Alignment
创建时间:
2026-02-01



