nadiva1243/wikipediaEs-Ca4RAG
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/nadiva1243/wikipediaEs-Ca4RAG
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- es
- ca
pretty_name: Wikipedia ES/CA for RAG evaluation (MonkeyGrab)
tags:
- rag
- retrieval
- wikipedia
---
# Wikipedia ES/CA for RAG evaluation
Merged evaluation split used in the **MonkeyGrab** TFG project (UPV ETSINF):
Spanish (`es`) and Catalan (`ca`) question–answer pairs with short contexts
extracted from Wikipedia articles.
## Splits
| Split | Description |
|-------|-------------|
| `train` | All rows from `dataset_eval_es.json` + `dataset_eval_ca.json` (field `language`: `es` or `ca`) |
## Schema
| Column | Type | Description |
|--------|------|-------------|
| `id` | string | Stable sample id (e.g. `wiki_es_001`) |
| `language` | string | `es` or `ca` |
| `source_url` | string | Wikipedia article URL |
| `context` | string | Retrieved passage used as RAG context |
| `question` | string | User question |
| `ground_truth` | string | Reference answer |
| `source_type` | string | e.g. `wikipedia` |
## Source and license
Contexts and questions were built from **Wikipedia** content; each row cites the
article URL in `source_url`. Respect Wikipedia's [Terms of use](https://foundation.wikimedia.org/wiki/Policy:Terms_of_Use)
and [licensing](https://en.wikipedia.org/wiki/Wikipedia:Copyrights) when redistributing or
deriving new works. This repository release is marked **MIT** for the packaging and
metadata; the underlying text remains subject to Wikipedia's CC BY-SA where applicable.
## Project repository
Full source code (RAG pipeline, CLI, training scripts, evaluation workflows):
> **[https://github.com/iDiagoValeta/localOllamaRAG](https://github.com/iDiagoValeta/localOllamaRAG)**
## Citation (project)
If you use this dataset, cite the MonkeyGrab / TFG work and link this dataset on the Hub:
```bibtex
@misc{monkeygrab_wikipedia_es_ca,
title = {Wikipedia ES/CA for RAG evaluation (MonkeyGrab)},
author = {nadiva1243},
year = {2026},
howpublished = {Hugging Face Datasets: \url{https://huggingface.co/datasets/nadiva1243/wikipediaEs-Ca4RAG}},
note = {Source: https://github.com/iDiagoValeta/localOllamaRAG}
}
```
提供机构:
nadiva1243



