ReDiX/QA-ita-200k

Name: ReDiX/QA-ita-200k
Creator: ReDiX
Published: 2025-01-07 08:28:24
License: 暂无描述

Hugging Face2025-01-07 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/ReDiX/QA-ita-200k

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个包含20.2万条问题-上下文-答案对的意大利语数据集，主要用于RAG任务的微调。数据集的内容主要来自维基百科，遵循CC BY-SA 4.0许可证。数据集的结构包括记录来源、生成的问题、上下文文本以及基于上下文生成的答案。

QA-ITA-200k is a synthetically generated Italian question-answering dataset containing 202k question-context-answer records, specifically designed for RAG fine-tuning. The dataset content mainly comes from Wikipedia, thus following the CC BY-SA 4.0 license. The structure of the dataset includes the record source, generated question, text context, and answer generated based on the context. The purpose of this dataset is for fine-tuning LLM on RAG tasks and fine-tuning embedding models for Italian retrieval tasks. The dataset is licensed under CC BY 4.0, allowing free sharing and adaptation, provided proper attribution is given.

提供机构：

ReDiX

5,000+

优质数据集

54 个

任务类型

进入经典数据集