LatinNLP/latin-summarizer-dataset
收藏Hugging Face2025-06-13 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/LatinNLP/latin-summarizer-dataset
下载链接
链接失效反馈官方服务:
资源简介:
Latin Summarizer数据集是一个全面的拉丁文本集合,旨在支持自然语言处理研究,特别是在低资源语言拉丁语方面。该数据集提供了各种任务的并行数据,包括翻译(拉丁语到英语)和摘要(提取式和抽象式)。它从多个来源收集文本,包括原始文本的清洁版本、人工创建的翻译和摘要,以及由Google的Gemini生成的机器摘要和翻译。数据集以多种配置结构化,以适应不同的任务需求。该数据集包括来自拉丁语维基百科、Grosenthal、Opus、拉丁语图书馆等来源的数据。
The Latin Summarizer Dataset is a comprehensive collection of Latin texts designed to support natural language processing research for a low-resource language. It provides parallel data for various tasks, including translation (Latin-to-English) and summarization (extractive and abstractive). This dataset aggregates texts from multiple sources, including raw text, cleaned versions, human-created translations and summaries, and machine-generated summaries and translations from Googles Gemini. With over 320,000 total rows, it is a valuable resource for training and evaluating models on complex generation tasks in Latin.
提供机构:
LatinNLP



