five

FarsTail

收藏
arXiv2021-07-08 更新2024-06-21 收录
下载链接:
https://github.com/dml-qom/FarsTail
下载链接
链接失效反馈
官方服务:
资源简介:
FarsTail是首个针对波斯语的自然语言推理(NLI)数据集,由伊朗库姆大学的研究团队开发。该数据集包含10,367个样本,源自3,539个多选题,旨在提供真实的语言使用场景。数据集的创建过程经过精心设计,确保数据质量,包括从网络提取文本片段以形成假设,并通过多轮标注确保一致性。FarsTail不仅适用于NLI任务,还可用于问答、摘要、语义搜索和机器翻译等多个领域,旨在推动波斯语及其他数据稀缺语言的NLP技术发展。

FarsTail is the first natural language inference (NLI) dataset for Persian, developed by a research team from the University of Qom in Iran. This dataset contains 10,367 samples derived from 3,539 multiple-choice questions, aiming to provide authentic language usage scenarios. The dataset's development process was meticulously designed to ensure data quality, including extracting text snippets from the web to formulate hypotheses and guaranteeing annotation consistency via multi-round annotation. FarsTail is not only suitable for NLI tasks but also applicable to multiple domains including question answering, text summarization, semantic search, and machine translation. It is intended to advance the development of NLP technologies for Persian and other data-scarce languages.
提供机构:
计算机工程与IT系,库姆大学,伊朗
创建时间:
2020-09-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作