Evaluating Visual Moral Dilemmas in Human and LLMs: Insights from the Moral Machine Experiment

DataONE2026-01-29 更新2026-02-07 收录

下载链接：

https://search.dataone.org/view/sha256:6eeb5f1768e94ebd9dfdcb39ddfd2dc05c605cde50df52634b420e42460d6e47

下载链接

链接失效反馈

官方服务：

资源简介：

Abstract This study examined how large language models (LLMs) respond to visual moral dilemmas in the context of autonomous vehicles (AVs) and compared their preferences to human baseline patterns established by the Moral Machine Experiment. It examined 44 LLM variants for 13 visual moral machine dilemmas, evaluated the LLM's initial responses to the moral machine, compared baseline data to human responses, and investigated unbiased prompts and interventions. The study found significant differences between LLMs and humans in four dimensions: Status (Cohen's d = 0.428, p =.014), Law (Cohen's d = 0.680, p <.001), Age (Cohen's d = 0.487, p =.006), and Quantity (Cohen's d = -0.382, p =.030), which indicates that LLMs prioritize utility over legal compliance and social status. In addition, LLM families showed significant differences in Action (p =.007), Fitness (p <.001), and Age (p =.029). \"Unbiased\" prompting resulted in no significant changes (all p >.05), which indicates that moral preferences are deeply embedded in training rather than prompt sensitivity. Finally, interventions (celebrities, historical figures, criminals) had selective effects, with significant changes in species preferences (p =.013, η² =.066) and a statistically non-significant but practically meaningful effect on fitness (p =.059, η² =.044). These findings show that while current LLMs share some similarities with the original moral machine experiment and human data in this study, they additionally show some biases and differences, and deployment in safety-critical AV decision-making requires crucial refinement and further studies. Keywords: Large Language Models, AI Ethics, Moral Machine Experiment, Autonomous Vehicles, Human-AI Alignment

创建时间：

2026-02-01