five

mittens

收藏
魔搭社区2025-12-05 更新2025-04-26 收录
下载链接:
https://modelscope.cn/datasets/google/mittens
下载链接
链接失效反馈
官方服务:
资源简介:
# MiTTenS: A Dataset for Evaluating Misgendering in Translation Misgendering is the act of referring to someone in a way that does not reflect their gender identity. Translation systems, including foundation models capable of translation, can produce errors that result in misgendering harms. To measure the extent of such potential harms when translating into and out of English, we introduce a dataset, MiTTenS, covering 26 languages from a variety of language families and scripts, including several traditionally underpresented in digital resources. The dataset is constructed with handcrafted passages that target known failure patterns, longer synthetically generated passages, and natural passages sourced from multiple domains. We demonstrate the usefulness of the dataset by evaluating both dedicated neural machine translation systems and foundation models, and show that all systems exhibit errors resulting in misgendering harms, even in high resource languages. ## HuggingFace dataset This mirrors the GitHub repository at https://github.com/google-research-datasets/mittens

# MiTTenS:用于评估翻译中性别误称(Misgendering)的数据集 性别误称(Misgendering)指使用与他人性别认同不符的称谓指代个体的行为。包括具备翻译能力的基础模型在内的各类翻译系统,均可能生成引发性别误称伤害的错误输出。为量化评估英译及译入英语过程中此类潜在伤害的程度,我们推出了MiTTenS数据集。该数据集涵盖来自多种语系与文字系统的26种语言,其中包含若干在数字资源中传统上代表性不足的语种。本数据集的语料由三部分构成:针对已被证实的模型失效模式人工撰写的段落、较长的合成生成段落,以及来自多个领域的自然段落。我们通过对专用神经机器翻译系统与基础模型两类模型开展评估,验证了本数据集的实用价值;实验结果显示,所有模型均会生成引发性别误称伤害的错误,即便在高资源语言中亦是如此。 ## HuggingFace数据集 本数据集镜像了对应GitHub仓库:https://github.com/google-research-datasets/mittens
提供机构:
maas
创建时间:
2025-04-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作