five

ReDiX/QA-ita-200k

收藏
Hugging Face2025-01-07 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/ReDiX/QA-ita-200k
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是一个包含20.2万条问题-上下文-答案对的意大利语数据集,主要用于RAG任务的微调。数据集的内容主要来自维基百科,遵循CC BY-SA 4.0许可证。数据集的结构包括记录来源、生成的问题、上下文文本以及基于上下文生成的答案。

QA-ITA-200k is a synthetically generated Italian question-answering dataset containing 202k question-context-answer records, specifically designed for RAG fine-tuning. The dataset content mainly comes from Wikipedia, thus following the CC BY-SA 4.0 license. The structure of the dataset includes the record source, generated question, text context, and answer generated based on the context. The purpose of this dataset is for fine-tuning LLM on RAG tasks and fine-tuning embedding models for Italian retrieval tasks. The dataset is licensed under CC BY 4.0, allowing free sharing and adaptation, provided proper attribution is given.
提供机构:
ReDiX
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作