five

PerKey

收藏
arXiv2020-09-25 更新2024-06-21 收录
下载链接:
https://github.com/edoost/perkey
下载链接
链接失效反馈
官方服务:
资源简介:
PerKey是由伊朗信息技术研究所和Sharif科技大学合作创建的波斯语新闻语料库,包含553,111篇来自六个波斯新闻网站的文章。该数据集旨在为关键短语提取和生成提供高质量的作者提取关键短语。数据集经过筛选和清洗,以提高关键短语的质量,并通过人工评估确保其准确性。PerKey适用于自然语言处理任务,如信息检索和文本摘要,特别关注解决波斯语环境中关键短语提取的问题。

PerKey is a Persian news corpus created in collaboration between the Iran Research Institute for Information Technology and Sharif University of Technology. It contains 553,111 articles from six Persian news websites. This dataset is designed to provide high-quality author-extracted key phrases for key phrase extraction and generation tasks. The dataset has been filtered and cleaned to improve the quality of the key phrases, and its accuracy has been verified via manual evaluation. PerKey is applicable to natural language processing tasks such as information retrieval and text summarization, with a particular focus on addressing the challenges of key phrase extraction in the Persian language context.
提供机构:
伊朗信息技术研究所
创建时间:
2020-09-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作