avemio/German-RAG-ORPO-Long-Context-Alpaca-HESSIAN-AI

Name: avemio/German-RAG-ORPO-Long-Context-Alpaca-HESSIAN-AI
Creator: avemio
Published: 2025-02-06 15:33:10
License: 暂无描述

Hugging Face2025-02-06 更新2025-04-08 收录

下载链接：

https://hf-mirror.com/datasets/avemio/German-RAG-ORPO-Long-Context-Alpaca-HESSIAN-AI

下载链接

链接失效反馈

官方服务：

资源简介：

German-RAG-ORPO Long-Context Alpaca数据集是一个专门为微调语言模型而设计的集合，专注于RAG（Retrieval Augmented Generation）特定的能力。该数据集由合成生成数据组成，灵感来源于Tencent的相关研究。数据集包括三个子集：hard-qa-with-multiple-references、qa-meeting-attendee-topic和qa-meeting-topic，每个子集都有不同的示例数量。数据集的源数据包括增强的德语维基百科内容、PersonaHub数据集和合成数据生成。数据集旨在用于问题回答、总结、文本分类、提取性回忆、OCR校正和带有多个引用的问答等任务。

The German-RAG-ORPO Long-Context Alpaca Dataset is a specialized collection designed for fine-tuning language models with a focus on RAG-specific capabilities. The dataset consists of synthetic generation data inspired by Tencents research. It includes three subsets: hard-qa-with-multiple-references, qa-meeting-attendee-topic, and qa-meeting-topic, each with a different number of examples. The source data for the dataset includes enhanced German Wikipedia content, the PersonaHub dataset, and synthetic data generation. The dataset is intended for tasks such as question answering, summarization, text classification, extractive recall, OCR correction, and QA with multiple references.

提供机构：

avemio

5,000+

优质数据集

54 个

任务类型

进入经典数据集