Prioritizing Metamorphic Relations for Bias Detection
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/bbd49mbngx
下载链接
链接失效反馈官方服务:
资源简介:
This research hypothesizes that sentence diversity metrics can enhance the prioritization of metamorphic relations (MRs) for fairness testing in Large Language Models (LLMs) such as GPT 4.0 and LLaMA 3.0. The goal is to improve fault detection rates and reduce time to first failure (TFF) compared to existing methods like random, distance-based, and fault-based ordering. To test this, 4,700 test cases were generated from templates containing placeholders for sensitive attributes, which were systematically modified using various MRs. The responses from LLMs were analyzed using diversity metrics, including cosine similarity (embedding-level similarity), lexical diversity (vocabulary variation), Named Entity Recognition (NER) diversity (changes in named entities), semantic similarity (using SentenceTransformer embeddings), sentiment similarity, and tone diversity (emotional tone consistency).
The findings reveal that diversity-based prioritization significantly outperforms existing methods. It achieved higher fault detection rates and reduced TFF, allowing quicker and more effective identification of fairness faults. Among the metrics, tone diversity detected the highest number of fairness bugs, highlighting its utility in uncovering biases related to emotional tone. NER diversity effectively identified biases linked to named entities, while semantic and sentiment similarity captured more nuanced fairness violations. Additionally, intersectional biases—arising from combinations of sensitive attributes such as religion, political views, and economic status—frequently revealed fairness issues, emphasizing the need for targeted intersectional analysis.
The results demonstrate that integrating sentence diversity metrics into MR prioritization provides a more efficient and comprehensive approach to fairness testing. By reducing the time required to identify faults and improving test coverage, this methodology can enhance fairness evaluation in high-stakes applications such as healthcare, finance, and education. Furthermore, the scalability of this approach offers a generalized framework for testing other AI systems, contributing to the development of more equitable and robust AI technologies.
创建时间:
2025-01-20



