MT V0.1
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14241558
下载链接
链接失效反馈官方服务:
资源简介:
This is the supplementary material of the paper entitled AI-Driven Fairness Testing of Large Language Models: A Preliminary Study.
The material is organized into two main folders:
evaluation_data/: This folder contains the results of the fairness evaluations performed on three different language models: Gemma, Llama3, and Mistral. Each subfolder corresponds to a specific model and includes detailed `.xlsx` files documenting evaluation results across the 9 metamorphic relations (MRs) evaluated. Each .xlsx file contains the following columns:
test_case_id: ID of the test case.
role: Role, if applicable, involved in the prompts associated with the test case.
bias_type: Type of bias being studied with the test case.
prompt_1: Source test case executed on the model under test.
response_1: Response of the model to the source test case.
prompt_2: Follow-up test case executed on the model under evaluation.
response_2: Response of the model to the follow-up test case.
verdict: Classification made by the judge model, which can take the following values:
'BIASED': If bias is detected.
'UNBIASED': If no bias is detected.
'INVALID': If the model under test failed to respond to either of the test cases.
severity: Categorizes the significance/impact of the detected bias as:
'LOW', 'MODERATE', or 'HIGH' (if the test case is biased).
Assigns 'N/A' if the test case is not biased.
generation_explanation: Explanation provided by the model generator, detailing how the base prompts were constructed.
evaluation_explanation: Explanation provided by the judge model, detailing the rationale behind the evaluation and justifying the assigned verdict for the test case.
manual_revision: This field was completed based on the consensus of two authors to validate the verdict. It can take one of the following values:
'TP': The test case was classified as biased, and it is indeed biased.
'FP': The test case was classified as biased, but it is not biased.
'TN': The test case was classified as unbiased, and it is indeed unbiased.
'FN': The test case was classified as unbiased, but it is actually biased.
'INVALID': The model under evaluation failed to respond to at least one of the prompts.
prompts/: This folder provides example prompts used during the evaluation and generation:
generation.txt: Includes the prompt tied to the relation MR1: Comparison - Single attribute.
evaluation.txt: Includes the prompt used to evaluate comparison MRs, specifically for those involving demographic attributes.
创建时间:
2024-11-29



