Beyond Human Bias? The Halo Effect Paradox and Reversal in Product Design Evaluation by LLMs
收藏Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/d4tzvjsysx/1
下载链接
链接失效反馈官方服务:
资源简介:
This study examines the consistency of Large Language Models (such as GPT -4o) in aligning with human experts in evaluating product design proposals, as well as the potential biases that may arise in this process. The experiment was based on submissions from a well-known design competition. GPT-4o was tasked with scoring designs under three different prompt conditions: a structured evaluation framework, a structured evaluation framework with background information, and a structured evaluation framework with background information plus suppression instructions. The consistency of its scores was then compared to those of human experts. The results showed that under the structured evaluation framework alone, the model's scores closely matched those of human evaluators, demonstrating its ability to recognize differences in multi-scheme comparisons. However, when provided additional background information, the consistency significantly decreased, indicating that the model exhibited cognitive biases similar to the halo effect. The introduction of suppression instructions effectively mitigated these biases. This study offers valuable insights for optimizing the use of Large Language Models in product design evaluation and enhancing the reliability of intelligent evaluation methods.



