LatinNLP/latin-summarizer-dataset

Name: LatinNLP/latin-summarizer-dataset
Creator: LatinNLP
Published: 2025-06-13 03:39:54
License: 暂无描述

Hugging Face2025-06-13 更新2025-11-01 收录

下载链接：

https://hf-mirror.com/datasets/LatinNLP/latin-summarizer-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

Latin Summarizer数据集是一个全面的拉丁文本集合，旨在支持自然语言处理研究，特别是在低资源语言拉丁语方面。该数据集提供了各种任务的并行数据，包括翻译（拉丁语到英语）和摘要（提取式和抽象式）。它从多个来源收集文本，包括原始文本的清洁版本、人工创建的翻译和摘要，以及由Google的Gemini生成的机器摘要和翻译。数据集以多种配置结构化，以适应不同的任务需求。该数据集包括来自拉丁语维基百科、Grosenthal、Opus、拉丁语图书馆等来源的数据。

The Latin Summarizer Dataset is a comprehensive collection of Latin texts designed to support natural language processing research for a low-resource language. It provides parallel data for various tasks, including translation (Latin-to-English) and summarization (extractive and abstractive). This dataset aggregates texts from multiple sources, including raw text, cleaned versions, human-created translations and summaries, and machine-generated summaries and translations from Googles Gemini. With over 320,000 total rows, it is a valuable resource for training and evaluating models on complex generation tasks in Latin.

提供机构：

LatinNLP

5,000+

优质数据集

54 个

任务类型

进入经典数据集