five

multilingual-crows-pairs/multilingual-crows-pairs

收藏
Hugging Face2024-09-12 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/multilingual-crows-pairs/multilingual-crows-pairs
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: sent_more dtype: string - name: sent_less dtype: string - name: stereo_antistereo dtype: string - name: bias_type dtype: string - name: id dtype: int64 splits: - name: es_AR num_bytes: 275173 num_examples: 1509 download_size: 165548 dataset_size: 275173 configs: - config_name: default data_files: - split: es_AR path: data/es_AR-* license: cc-by-sa-4.0 language: - es --- ## Citation ``` @inproceedings{fort-etal-2024-stereotypical, title = "Your Stereotypical Mileage May Vary: Practical Challenges of Evaluating Biases in Multiple Languages and Cultural Contexts", author = "Fort, Karen and Alonso Alemany, Laura and Benotti, Luciana and Bezan{\c{c}}on, Julien and Borg, Claudia and Borg, Marthese and Chen, Yongjian and Ducel, Fanny and Dupont, Yoann and Ivetta, Guido and Li, Zhijian and Mieskes, Margot and Naguib, Marco and Qian, Yuyan and Radaelli, Matteo and Schmeisser-Nieto, Wolfgang S. and Raimundo Schulz, Emma and Saci, Thiziri and Saidi, Sarah and Torroba Marchante, Javier and Xie, Shilin and Zanotto, Sergio E. and N{\'e}v{\'e}ol, Aur{\'e}lie", editor = "Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen", booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)", month = may, year = "2024", address = "Torino, Italia", publisher = "ELRA and ICCL", url = "https://aclanthology.org/2024.lrec-main.1545", pages = "17764--17769", abstract = "Warning: This paper contains explicit statements of offensive stereotypes which may be upsetting The study of bias, fairness and social impact in Natural Language Processing (NLP) lacks resources in languages other than English. Our objective is to support the evaluation of bias in language models in a multilingual setting. We use stereotypes across nine types of biases to build a corpus containing contrasting sentence pairs, one sentence that presents a stereotype concerning an underadvantaged group and another minimally changed sentence, concerning a matching advantaged group. We build on the French CrowS-Pairs corpus and guidelines to provide translations of the existing material into seven additional languages. In total, we produce 11,139 new sentence pairs that cover stereotypes dealing with nine types of biases in seven cultural contexts. We use the final resource for the evaluation of relevant monolingual and multilingual masked language models. We find that language models in all languages favor sentences that express stereotypes in most bias categories. The process of creating a resource that covers a wide range of language types and cultural settings highlights the difficulty of bias evaluation, in particular comparability across languages and contexts.", } ```
提供机构:
multilingual-crows-pairs
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作