five

HeSum

收藏
arXiv2024-06-06 更新2024-06-21 收录
下载链接:
https://github.com/OnlpLab/HeSum
下载链接
链接失效反馈
官方服务:
资源简介:
HeSum是由以色列巴伊兰大学创建的一个针对现代希伯来语的抽象文本摘要数据集,包含10,000对由专业记者撰写的希伯来新闻网站文章及其摘要。该数据集以其高抽象性和独特的希伯来语形态学挑战而著称,为评估大型语言模型在低资源语言中的表现提供了宝贵的测试平台。HeSum的创建过程涉及从多个希伯来新闻网站收集数据,并经过严格的语言学分析以确保数据质量。该数据集主要用于推动抽象文本摘要技术在形态丰富语言环境中的发展,特别是在理解和生成具有复杂语法和语义结构的希伯来语文本方面。

HeSum is an abstractive text summarization dataset for Modern Hebrew, developed by Bar-Ilan University in Israel. It contains 10,000 pairs of Hebrew news articles and their corresponding summaries authored by professional journalists. This dataset is renowned for its high level of abstractiveness and the unique morphological challenges inherent to the Hebrew language, serving as a valuable testbed for evaluating the performance of large language models in low-resource languages. The creation of HeSum involved collecting data from multiple Hebrew news websites, followed by rigorous linguistic analysis to ensure data quality. This dataset is primarily utilized to advance the development of abstractive text summarization technologies in morphologically rich language environments, particularly for the comprehension and generation of Hebrew texts featuring complex grammatical and semantic structures.
提供机构:
巴伊兰大学
创建时间:
2024-06-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作