five

nadiva1243/wikipediaEs-Ca4RAG

收藏
Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/nadiva1243/wikipediaEs-Ca4RAG
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - es - ca pretty_name: Wikipedia ES/CA for RAG evaluation (MonkeyGrab) tags: - rag - retrieval - wikipedia --- # Wikipedia ES/CA for RAG evaluation Merged evaluation split used in the **MonkeyGrab** TFG project (UPV ETSINF): Spanish (`es`) and Catalan (`ca`) question–answer pairs with short contexts extracted from Wikipedia articles. ## Splits | Split | Description | |-------|-------------| | `train` | All rows from `dataset_eval_es.json` + `dataset_eval_ca.json` (field `language`: `es` or `ca`) | ## Schema | Column | Type | Description | |--------|------|-------------| | `id` | string | Stable sample id (e.g. `wiki_es_001`) | | `language` | string | `es` or `ca` | | `source_url` | string | Wikipedia article URL | | `context` | string | Retrieved passage used as RAG context | | `question` | string | User question | | `ground_truth` | string | Reference answer | | `source_type` | string | e.g. `wikipedia` | ## Source and license Contexts and questions were built from **Wikipedia** content; each row cites the article URL in `source_url`. Respect Wikipedia's [Terms of use](https://foundation.wikimedia.org/wiki/Policy:Terms_of_Use) and [licensing](https://en.wikipedia.org/wiki/Wikipedia:Copyrights) when redistributing or deriving new works. This repository release is marked **MIT** for the packaging and metadata; the underlying text remains subject to Wikipedia's CC BY-SA where applicable. ## Project repository Full source code (RAG pipeline, CLI, training scripts, evaluation workflows): > **[https://github.com/iDiagoValeta/localOllamaRAG](https://github.com/iDiagoValeta/localOllamaRAG)** ## Citation (project) If you use this dataset, cite the MonkeyGrab / TFG work and link this dataset on the Hub: ```bibtex @misc{monkeygrab_wikipedia_es_ca, title = {Wikipedia ES/CA for RAG evaluation (MonkeyGrab)}, author = {nadiva1243}, year = {2026}, howpublished = {Hugging Face Datasets: \url{https://huggingface.co/datasets/nadiva1243/wikipediaEs-Ca4RAG}}, note = {Source: https://github.com/iDiagoValeta/localOllamaRAG} } ```
提供机构:
nadiva1243
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作