five

Necent/efficientrag-filter-training-data

收藏
Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Necent/efficientrag-filter-training-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en - ru tags: - efficientrag - multi-hop-qa - token-classification license: mit --- # EfficientRAG Filter Training Data Training data for the **Filter** component of [EfficientRAG](https://arxiv.org/abs/2408.04259). ## Format JSONL with fields: - `query_info` — concatenation of original question + extracted useful tokens - `token_labels` — per-word binary labels (1=keep, 0=discard) ## Statistics | | Count | |--|-------| | Total samples | 5,691 | ## Data Sources | Source | Language | Samples | Method | |--------|----------|---------|--------| | HotpotQA (5K questions) | EN | ~5K | Heuristic labels | | Dragon-derec multi-hop (690) | RU | ~700 | LLM-synthesized (gpt-4o-mini) | ## Usage ## Related - Model: [Necent/efficientrag-filter-mdeberta-v3-base](https://huggingface.co/Necent/efficientrag-filter-mdeberta-v3-base) - Labeler data: [Necent/efficientrag-labeler-training-data](https://huggingface.co/datasets/Necent/efficientrag-labeler-training-data) - Labeler model: [Necent/efficientrag-labeler-mdeberta-v3-base](https://huggingface.co/Necent/efficientrag-labeler-mdeberta-v3-base) - Paper: [EfficientRAG (arXiv:2408.04259)](https://arxiv.org/abs/2408.04259) - Base dataset: [Makson4ic/dragon-derec-dataset](https://huggingface.co/datasets/Makson4ic/dragon-derec-dataset)
提供机构:
Necent
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作