liu-nlp/minimal_pair_mpararel
收藏Hugging Face2025-12-12 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/liu-nlp/minimal_pair_mpararel
下载链接
链接失效反馈官方服务:
资源简介:
Minimal Pair mParalel (multilingual)是一个多语言最小对数据集,包含五种语言的子集:英语(en)、波斯语(fa)、冰岛语(is)、爱沙尼亚语(et)和瑞典语(sv)。该数据集最初用于研究多语言预训练语言模型的事实一致性,后来被用于探讨高效语言适应的扩展策略。数据集文件格式为parquet,许可证为Apache-2.0。
Minimal Pair mParalel (multilingual) is a multilingual minimal pair dataset that groups five language-specific subsets: English (en), Persian (fa), Icelandic (is), Estonian (et), and Swedish (sv). The dataset was originally introduced to study the factual consistency of multilingual pretrained language models and later used in research on scaling strategies for efficient language adaptation. The data files are in parquet format, and the dataset is licensed under Apache-2.0.
提供机构:
liu-nlp



