Paul/hatecheck
收藏数据集概述:HateCheck
数据集描述
HateCheck是一个用于测试仇恨言论检测模型的功能测试套件。该数据集包含3,728个经过验证的测试案例,分布在29个功能测试中。其中,19个功能测试对应于不同的仇恨类型,另外11个功能测试涵盖了非仇恨的挑战性类型。这允许对模型性能进行有针对性的诊断洞察。
数据集结构
数据集文件"test.csv"包含所有3,728个验证过的测试案例。每个测试案例(行)具有以下属性:
- functionality:测试案例所测试的功能的简称。
- case_id:测试案例的唯一ID。
- test_case:测试案例的文本内容。
- label_gold:测试案例的金标准标签(仇恨/非仇恨)。
- target_ident:适用的保护群体,涵盖七个保护群体:女性、跨性别者、同性恋者、黑人、残疾人、穆斯林和移民。
- direction:对于仇恨案例,指示其是针对个人还是针对群体的二元次级标签。
- focus_words:适用的关键词或短语。
- focus_lemma:适用的相应词形。
- ref_case_id:适用的更简单的仇恨案例的ID,用于生成它们。
- ref_templ_id:模板ID的等效项。
- templ_id:生成测试案例的模板的唯一ID。
引用信息
使用HateCheck时,请引用以下论文:
@inproceedings{rottger-etal-2021-hatecheck, title = "{H}ate{C}heck: Functional Tests for Hate Speech Detection Models", author = {R{"o}ttger, Paul and Vidgen, Bertie and Nguyen, Dong and Waseem, Zeerak and Margetts, Helen and Pierrehumbert, Janet}, booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.acl-long.4", doi = "10.18653/v1/2021.acl-long.4", pages = "41--58", abstract = "Detecting online hate is a difficult task that even state-of-the-art models struggle with. Typically, hate speech detection models are evaluated by measuring their performance on held-out test data using metrics such as accuracy and F1 score. However, this approach makes it difficult to identify specific model weak points. It also risks overestimating generalisable model performance due to increasingly well-evidenced systematic gaps and biases in hate speech datasets. To enable more targeted diagnostic insights, we introduce HateCheck, a suite of functional tests for hate speech detection models. We specify 29 model functionalities motivated by a review of previous research and a series of interviews with civil society stakeholders. We craft test cases for each functionality and validate their quality through a structured annotation process. To illustrate HateCheck{}s utility, we test near-state-of-the-art transformer models as well as two popular commercial models, revealing critical model weaknesses.", }




