MinaGabriel/sentence-relevance-extractor
收藏Hugging Face2025-11-20 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/MinaGabriel/sentence-relevance-extractor
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: Sentence Relevance Extractor (SRE)
license: mit
language:
- en
---
# Sentence Relevance Extractor (SRE)
**Sentence Relevance Extractor (SRE)** is a large-scale dataset for **binary evidence selection** in multi-document, multi-hop question answering.
The goal:
> Given a question and a sentence from the context, predict whether this sentence is **relevant evidence** ("Yes") or **irrelevant** ("No").
This dataset is suitable for training:
- **Sentence-level RAG rerankers**
- **Binary relevance classifiers**
- **Optimization-based truth discovery systems**
- **Multi-hop QA evidence selectors**
---
## Dataset Statistics
| Split | # Samples |
|-------|------------|
| **Train** | **1,902,056** |
| **Validation** | **211,340** |
| **Test** | **141,726** |
| **Total** | **2,255,122** |
### Dataset Source Summary
- **From HF train splits:** 2,113,396
- **From HF validation/test splits:** 141,726
- After balancing & sampling → final splits above.
---
## Provided Files
- `multihop_sentrel_train.jsonl`
- `multihop_sentrel_val.jsonl`
- `multihop_sentrel_test.jsonl`
Each line corresponds to one `(question, sentence)` relevance judgment.
---
## Data Format (JSONL)
Each row:
```json
{
"dataset": "2wikimultihopqa",
"source_id": "7f23725...",
"question": "Who is the child of the director of Inquilaab (2002 film)?",
"full_context": "Inquilaab ... (titles and sentences)",
"sentence": "Inquilaab is a 2002 Bengali action thriller film directed by Anup Sengupta.",
"label": "Yes",
"title": "Inquilaab (2002 film)",
"doc_index": 0,
"sent_index": 0,
"split": "train"
}
提供机构:
MinaGabriel



