The creative musical achievement of AI systems compared to music students: A replication of the study by Schreiber et al. (2024) Die kreativen musikalischen Leistungen von KI-Systemen im Vergleich zu Musikstudierenden: Eine Replikation der Studie von Schreiber et al. (2024)
收藏PsychArchives2025-08-19 更新2026-04-25 收录
下载链接:
https://hdl.handle.net/20.500.12034/16514
下载链接
链接失效反馈官方服务:
资源简介:
Although the last two years have seen AI systems progress significantly when it comes to generating cultural products like literature, poems, or music, the jury is still out when it comes to determining whether the aesthetic quality of these products increases in tandem with the performance enhancements of underlying large language models (LLMs). We replicated the study by Schreiber et al. (2024) to test whether the creative performance of selected LLMs had improved over the past two years in the musical domain. In an online rating experiment based on a melody continuation paradigm, 75 melodic continuations generated by the AI systems Qwen 2 (Version 72B Instruct), Llama 3 (Version 70B Instruct), and ChatGPT (Version 4) were compared to 23 solutions composed by humans. The aesthetic quality of the sound examples was then evaluated by N = 54 listeners (music students) using four criteria (convincing, logical and meaningful, interesting, and liking). As the first main finding, human-based creative solutions outperformed all three AI systems on all four dependent variables (large effect sizes 1.11 ≤ dz ≤ 2.51), thus confirming the finding by Schreiber et al. (2024). The second main finding revealed a mean (and meaningful) discrimination sensitivity of d’ = 1.09 for AI- and human-based solutions. We conclude that merely boosting the volume of training of the AI systems does not guarantee correlating improvement in the creative musical output produced under controlled conditions. peerReviewed publishedVersion
提供机构:
PsychOpen GOLD
创建时间:
2025-08-19



