five

mGeNTE-supplementary

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/FBK-MT/mGeNTE-supplementary
下载链接
链接失效反馈
官方服务:
资源简介:
Supplementary data released alongside the paper: **[Mind the Inclusivity Gap: Multilingual Gender-Neutral Translation Evaluation with mGeNTE](https://arxiv.org/html/2501.09409v3#:~:text=To%20address%20this%20gap%2C%20we%20introduce%20mGeNTE%2C%20an,inclusive%20translation%20with%20state-of-the-art%20instruction-following%20language%20models%20%28LMs%29.)**. The official mGeNTE dataset is at [this link](https://huggingface.co/datasets/FBK-MT/mGeNTE). ## Structure The supplementary material is organized as follows: - `attributions`: contains all the token-level attributions to be used with the class defined in our main repository, found at [this link](https://github.com/g8a9/mgente-gap). Moreover, we also include a "processed_" version of each file where we aggregated the contributions following the context parts as described in the paper. - `gnt`: contains the gender neutral evaluation labels for translations as they are estimated by an LLM judge as described in the paper. - `manual_eval`: contains TSV files manually annotated for gender neutral evaluation labels of the translation, which, as described in the paper, are used to estimate the accuracy of the LLM judge evaluator. Manual evaluation guidelines available at [this link](https://github.com/g8a9/mgente-gap/tree/main/guidelines) - `translations`: contains all the raw translations as they are generated by the models described in the paper. ## Contacts If you have any doubts or curiosity about this material, feel free to open an issue on this repository or [the official one](https://github.com/g8a9/mgente-gap). ## Citation If you use any of the materials related to the paper, please cite: ```bibtex @misc{savoldi2025mindinclusivitygapmultilingual, title={Mind the Inclusivity Gap: Multilingual Gender-Neutral Translation Evaluation with mGeNTE}, author={Beatrice Savoldi and Giuseppe Attanasio and Eleonora Cupin and Eleni Gkovedarou and Janiça Hackenbuchner and Anne Lauscher and Matteo Negri and Andrea Piergentili and Manjinder Thind and Luisa Bentivogli}, year={2025}, eprint={2501.09409}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2501.09409}, } ```

本补充数据随论文**《Mind the Inclusivity Gap: Multilingual Gender-Neutral Translation Evaluation with mGeNTE》**(链接:https://arxiv.org/html/2501.09409v3#:~:text=To%20address%20this%20gap%2C%20we%20introduce%20mGeNTE%2C%20an,inclusive%20translation%20with%20state-of-the-art%20instruction-following%20language%20models%20%28LMs%29.)一同发布。 官方mGeNTE(多语言性别中立翻译评估)数据集的获取链接为:[此链接](https://huggingface.co/datasets/FBK-MT/mGeNTE)。 ## 补充材料结构 本补充材料的结构如下: - `attributions`:包含所有可配合本研究主代码库中定义的类使用的词元(Token)级归因结果,主代码库链接为[此链接](https://github.com/g8a9/mgente-gap)。此外,我们还为每个文件提供了`processed_`版本,该版本会按照论文所述的上下文片段对贡献度进行聚合。 - `gnt`:包含论文所述的由大语言模型(LLM)评估器生成的翻译文本性别中立性评估标签。 - `manual_eval`:包含用于翻译性别中立性评估标签的人工标注TSV(制表符分隔值)文件,按照论文所述,该文件可用于评估大语言模型评估器的准确率。人工评估指南可在[此链接](https://github.com/g8a9/mgente-gap/tree/main/guidelines)获取。 - `translations`:包含论文所述模型生成的所有原始翻译文本。 ## 联系方式 若您对本补充材料存在任何疑问或感兴趣,欢迎在本代码库或[官方代码库](https://github.com/g8a9/mgente-gap)中提交Issue。 ## 引用格式 若您使用本论文相关的任何材料,请引用如下文献: bibtex @misc{savoldi2025mindinclusivitygapmultilingual, title={Mind the Inclusivity Gap: Multilingual Gender-Neutral Translation Evaluation with mGeNTE}, author={Beatrice Savoldi and Giuseppe Attanasio and Eleonora Cupin and Eleni Gkovedarou and Janiça Hackenbuchner and Anne Lauscher and Matteo Negri and Andrea Piergentili and Manjinder Thind and Luisa Bentivogli}, year={2025}, eprint={2501.09409}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2501.09409}, }
提供机构:
maas
创建时间:
2025-10-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作