HiTZ/RAG_eu
收藏Hugging Face2026-01-23 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/HiTZ/RAG_eu
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个巴斯克语(eu)的领域特定数据集集合,专为模型评估设计。包含三个领域的三种任务类型:新闻文章(news)、议会演讲(parl)和法律文本(bopv)。数据集适用于评估以下任务:1. 领域分类(DC):预测给定文本片段的领域;2. 问题可回答性预测(QAP):确定问题是否可以从给定上下文中回答;3. 信息检索(IR):为给定查询检索相关段落/文档。该数据集旨在作为评估面向巴斯克语的模型的基准,包括检索增强生成(RAG)系统,适用于多样化的领域特定场景。它为分类和检索任务的模型性能评估提供了全面的框架,适用于低资源NLP研究。
This dataset is a collection of three domain-specific datasets in Basque (eu) designed for model evaluation. It includes three types of tasks across three domains: News articles (news), Parliamentary discourses (parl), and Legal texts (bopv). The dataset is suitable for evaluating models in: 1. Domain Classification (DC): Predict the domain of a given text snippet. 2. Question Answerability Prediction (QAP): Determine if a question is answerable from a given context. 3. Information Retrieval (IR): Retrieve relevant passages/documents for a given query. The dataset is intended as a benchmark for evaluating models oriented toward the Basque language, including Retrieval-Augmented Generation (RAG) systems, across diverse domain-specific scenarios. It enables the assessment of model performance in classification and retrieval tasks, providing a comprehensive framework for research in low-resource NLP.
提供机构:
HiTZ



