multilingual-crows-pairs/multilingual-crows-pairs
收藏Hugging Face2024-09-12 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/multilingual-crows-pairs/multilingual-crows-pairs
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: sent_more
dtype: string
- name: sent_less
dtype: string
- name: stereo_antistereo
dtype: string
- name: bias_type
dtype: string
- name: id
dtype: int64
splits:
- name: es_AR
num_bytes: 275173
num_examples: 1509
download_size: 165548
dataset_size: 275173
configs:
- config_name: default
data_files:
- split: es_AR
path: data/es_AR-*
license: cc-by-sa-4.0
language:
- es
---
## Citation
```
@inproceedings{fort-etal-2024-stereotypical,
title = "Your Stereotypical Mileage May Vary: Practical Challenges of Evaluating Biases in Multiple Languages and Cultural Contexts",
author = "Fort, Karen and
Alonso Alemany, Laura and
Benotti, Luciana and
Bezan{\c{c}}on, Julien and
Borg, Claudia and
Borg, Marthese and
Chen, Yongjian and
Ducel, Fanny and
Dupont, Yoann and
Ivetta, Guido and
Li, Zhijian and
Mieskes, Margot and
Naguib, Marco and
Qian, Yuyan and
Radaelli, Matteo and
Schmeisser-Nieto, Wolfgang S. and
Raimundo Schulz, Emma and
Saci, Thiziri and
Saidi, Sarah and
Torroba Marchante, Javier and
Xie, Shilin and
Zanotto, Sergio E. and
N{\'e}v{\'e}ol, Aur{\'e}lie",
editor = "Calzolari, Nicoletta and
Kan, Min-Yen and
Hoste, Veronique and
Lenci, Alessandro and
Sakti, Sakriani and
Xue, Nianwen",
booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
month = may,
year = "2024",
address = "Torino, Italia",
publisher = "ELRA and ICCL",
url = "https://aclanthology.org/2024.lrec-main.1545",
pages = "17764--17769",
abstract = "Warning: This paper contains explicit statements of offensive stereotypes which may be upsetting The study of bias, fairness and social impact in Natural Language Processing (NLP) lacks resources in languages other than English. Our objective is to support the evaluation of bias in language models in a multilingual setting. We use stereotypes across nine types of biases to build a corpus containing contrasting sentence pairs, one sentence that presents a stereotype concerning an underadvantaged group and another minimally changed sentence, concerning a matching advantaged group. We build on the French CrowS-Pairs corpus and guidelines to provide translations of the existing material into seven additional languages. In total, we produce 11,139 new sentence pairs that cover stereotypes dealing with nine types of biases in seven cultural contexts. We use the final resource for the evaluation of relevant monolingual and multilingual masked language models. We find that language models in all languages favor sentences that express stereotypes in most bias categories. The process of creating a resource that covers a wide range of language types and cultural settings highlights the difficulty of bias evaluation, in particular comparability across languages and contexts.",
}
```
提供机构:
multilingual-crows-pairs



