Responses of the GPT-4o mini and Bielik-11B-v2 Large Language Models in Polish Law Tasks, Evaluated for Accuracy, Limitations Signalling, and the Occurrence of Hallucinations
收藏DataCite Commons2026-03-10 更新2026-05-04 收录
下载链接:
https://uj.rodbuk.pl/citation?persistentId=doi:10.57903/UJ/2XPZKE
下载链接
链接失效反馈官方服务:
资源简介:
This dataset includes 120 responses to prompts about Polish law provided by two large language models, GPT-4o mini and Bielik-11B-v2-instuct, which were available to users for free in January 2025. The responses were evaluated in terms of their overall accuracy and ability to counter the erroneous assumption present in the prompt, signal limitations in the response, and whether the rulings cited by the models existed or were hallucinated (the latter aspect was not taken into account in the overall evaluation of the correctness of the responses). The dataset contains two final files that form the basis for further research: the first is a database of 120 model responses that were evaluated for overall accuracy and signaling of limitations by the model; the second is a database of 369 rulings cited in the models' responses, which were evaluated in terms of the existence of their signatures, the existence of signatures associated with the correct type of ruling, the authority and date of issue, and the thematic consistency of the cited ruling with the field of law to which the command related.
The dataset was created as part of a research project funded by the National Science Centre, Poland, entitled "The Understandability Requirement of Machine Learning Systems Used in the Application of Law" (No. 2022/45/N/HS5/00871).
提供机构:
Jagiellonian University in Kraków
创建时间:
2026-03-09



