ag2003/morphogen
收藏Hugging Face2026-04-23 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ag2003/morphogen
下载链接
链接失效反馈官方服务:
资源简介:
MORPHOGEN是一个多语言基准数据集,旨在评估大型语言模型(LLMs)在法语、阿拉伯语和印地语中性别感知形态生成的能力。核心任务GENFORM要求模型以相反的性别重写第一人称句子,同时保持原始意义、流畅性和句法结构。数据集包含9,999对法语句子、2,719对阿拉伯语句子和7,610对印地语句子,涵盖了多达七个性别化元素的复杂句子,涉及动词变位、形容词和角色名词等多种形态规则。
MORPHOGEN is a morphologically grounded, large-scale benchmark designed to evaluate the gender-aware generation capabilities of Large Language Models (LLMs) in three typologically diverse languages: French, Arabic, and Hindi. The core task, GENFORM, requires models to rewrite first-person sentences in the opposite gender while preserving the original meaning, fluency, and syntactic structure. The dataset includes 9,999 French pairs, 2,719 Arabic pairs, and 7,610 Hindi pairs, featuring sentences with up to seven gendered elements and covering diverse morphological rules like verb conjugation, adjectives, and role nouns.
提供机构:
ag2003



