Maltese crowS-pairs dataset

Name: Maltese crowS-pairs dataset
Creator: University of Malta
Published: 2024-06-25 00:00:00
License: 暂无描述

drum.um.edu.mt2024-06-25 更新2025-01-21 收录

下载链接：

https://drum.um.edu.mt/articles/dataset/Maltese_crowS-pairs_dataset/26056957/1

下载链接

链接失效反馈

官方服务：

资源简介：

Warning: This dataset contains explicit statements of offensive stereotypes which may be upsetting.The study of bias, fairness and social impact in Natural Language Processing (NLP) lacks resources in languages other than English. Our objective is to support the evaluation of bias in language models in a multilingual setting. We use stereotypes across nine types of biases to build a corpus containing contrasting sentence pairs, one sentence that presents a stereotype concerning an underadvantaged group and another minimally changed sentence, concerning a matching advantaged group.In total, we produced 11,139 new sentence pairs that cover stereotypes dealing with nine types of biases in seven cultural contexts. We use the final resource for the evaluation of relevant monolingual and multilingual masked language models.This file contains the sentence pairs localised to the Maltese context in the Maltese language.Other languages are available here: https://gitlab.inria.fr/corpus4ethics/multilingualcrowspairsThe paper describing this work is available here: https://www.um.edu.mt/library/oar/handle/123456789/121722https://aclanthology.org/2024.lrec-main.1545/To use this dataset, please use the following citation:Karen Fort, Laura Alonso Alemany, Luciana Benotti, Julien Bezançon, Claudia Borg, Marthese Borg, Yongjian Chen, Fanny Ducel, Yoann Dupont, Guido Ivetta, Zhijian Li, Margot Mieskes, Marco Naguib, Yuyan Qian, Matteo Radaelli, Wolfgang S. Schmeisser-Nieto, Emma Raimundo Schulz, Thiziri Saci, Sarah Saidi, et al.. 2024. Your Stereotypical Mileage May Vary: Practical Challenges of Evaluating Biases in Multiple Languages and Cultural Contexts. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 17764–17769, Torino, Italia. ELRA and ICCL.

警告：本数据集包含可能引起不适的具有冒犯性的刻板印象表述。在自然语言处理（NLP）领域中，关于偏见、公平性及社会影响的研究在除英语之外的语言上缺乏资源。我们的目标在于支持多语言环境下语言模型中偏见评估的进行。我们跨越九种类型的偏见，利用刻板印象构建了一个包含对比句对的语料库，其中一个句子表达了对弱势群体的刻板印象，另一个则是针对相对应的强势群体的最小程度改动句子。总计，我们产生了11,139个新的句对，涵盖了涉及七种文化背景中九种类型的偏见。我们使用最终资源对相关单语和多语言掩码语言模型进行评估。此文件包含本地化为马耳他语语境的句子对。其他语言资源可在此处获取：https://gitlab.inria.fr/corpus4ethics/multilingualcrowspairs。描述该工作的论文可在以下链接找到：https://www.um.edu.mt/library/oar/handle/123456789/121722https://aclanthology.org/2024.lrec-main.1545/。使用此数据集时，请引用以下文献：Karen Fort, Laura Alonso Alemany, Luciana Benotti, Julien Bezançon, Claudia Borg, Marthese Borg, Yongjian Chen, Fanny Ducel, Yoann Dupont, Guido Ivetta, Zhijian Li, Margot Mieskes, Marco Naguib, Yuyan Qian, Matteo Radaelli, Wolfgang S. Schmeisser-Nieto, Emma Raimundo Schulz, Thiziri Saci, Sarah Saidi, 等人。2024. Your Stereotypical Mileage May Vary: Practical Challenges of Evaluating Biases in Multiple Languages and Cultural Contexts. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 17764–17769, Torino, Italia. ELRA and ICCL.

提供机构：

University of Malta

5,000+

优质数据集

54 个

任务类型

进入经典数据集