ReDiX/QA-ita-200k
收藏Hugging Face2025-01-07 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/ReDiX/QA-ita-200k
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个包含20.2万条问题-上下文-答案对的意大利语数据集,主要用于RAG任务的微调。数据集的内容主要来自维基百科,遵循CC BY-SA 4.0许可证。数据集的结构包括记录来源、生成的问题、上下文文本以及基于上下文生成的答案。
QA-ITA-200k is a synthetically generated Italian question-answering dataset containing 202k question-context-answer records, specifically designed for RAG fine-tuning. The dataset content mainly comes from Wikipedia, thus following the CC BY-SA 4.0 license. The structure of the dataset includes the record source, generated question, text context, and answer generated based on the context. The purpose of this dataset is for fine-tuning LLM on RAG tasks and fine-tuning embedding models for Italian retrieval tasks. The dataset is licensed under CC BY 4.0, allowing free sharing and adaptation, provided proper attribution is given.
提供机构:
ReDiX



