Persian Causality Corpus (PerCause)
收藏arXiv2021-06-27 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2106.14165v1
下载链接
链接失效反馈官方服务:
资源简介:
Persian Causality Corpus (PerCause) 是由谢里夫理工大学自然语言处理实验室开发的因果关系标注语料库,专门针对波斯语。该数据集包含4446个句子,总计5128个因果关系,每个关系被标注为原因、效果或因果标记。PerCause旨在为波斯语提供一个全面的因果关系资源,用于训练因果关系检测系统。数据集从Bijankhan语料库和一般书籍中选取,涵盖了广泛的因果关系类型,但不包括复杂的因果关系如隐含的或嵌套的因果关系。该数据集的应用领域包括自然语言处理任务,如文本蕴含识别、问答、事件预测和叙述提取,旨在解决波斯语中因果关系识别的挑战。
The Persian Causality Corpus (PerCause) is a causality-annotated corpus developed by the Natural Language Processing Laboratory of Sharif University of Technology, specifically tailored for the Persian language. This dataset contains 4,446 sentences and a total of 5,128 causal relations, each annotated as a cause, effect, or causal marker. PerCause is designed to offer a comprehensive causal relation resource for Persian, enabling the training of causality detection systems. The dataset is curated from the Bijankhan Corpus and general books, covering a wide range of causal relation types, but excluding complex causal relations such as implicit or nested ones. Application scenarios of this dataset cover natural language processing tasks including textual entailment recognition, question answering, event prediction, and narrative extraction, with the goal of addressing the challenges of causal relation recognition in Persian.
提供机构:
谢里夫理工大学自然语言处理实验室
创建时间:
2021-06-27



