temsa/govie-dsp-rates-reranker-bilingual-v1
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/temsa/govie-dsp-rates-reranker-bilingual-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: gov.ie DSP Rates Reranker Bilingual v1
license: other
language:
- en
- ga
task_categories:
- text-ranking
size_categories:
- n<1K
annotations_creators:
- machine-generated
multilinguality: multilingual
tags:
- ireland
- gov-ie
- dsp
- social-protection
- retrieval
- reranking
- welfare-rates
configs:
- config_name: default
data_files:
- split: train
path: train.jsonl
- split: validation
path: validation.jsonl
---
# gov.ie DSP Rates Reranker Bilingual v1
Bilingual `(query, candidate_page)` reranking dataset for `gov.ie` Department of Social Protection payment-rate lookup.
This release is built from a curated DSP scheme catalog grounded in public `gov.ie` service pages and the official `SW19 Rates of Payment 2026` booklet.
## What is in the dataset
- English and Irish Gaelic query variants for DSP allowance, benefit and grant lookup
- exact-name and rate-lookup queries
- page-level positives with hard negatives drawn from confusable DSP pages
- shared-page targets where the public source page covers more than one scheme
## Shared-page targets
- `Electricity Allowance` and `Gas Allowance` resolve to `Household Benefits Package`
- `Half-rate Carer's Allowance` resolves to `Carer's Allowance`
- `Jobseeker's Benefit for the Self-Employed` resolves to `Jobseeker's Benefit`
- `Newborn Baby Grant` resolves to `Child Benefit`
- `Cost of Education Allowance` resolves to `Back to Education Allowance`
- `Island Allowance` resolves to `Increase for Living on a Specified Island`
## Splits
- `train`: `512`
- `validation`: `240`
Base query ids are split deterministically so English and Irish variants stay together.
## Query policy
- English queries use an ASCII-friendly version of the public scheme title because that is closer to how people typically type searches
- Irish variants use Irish query framing while keeping the public scheme title stable when there is no single official Irish slug to anchor on
- broad catalog queries are included for the 2026 DSP rate book and Budget 2026 overview page
## Included metadata
- `metadata/meta.json`: packaging metadata and query policy
- `metadata/targets.json`: canonical target catalog and query examples
## Intended use
- train or distill bilingual rerankers for DSP scheme-rate lookup
- benchmark `gov.ie` welfare-rate search quality
- test exact-scheme lookups against confusable DSP candidate pages
## Caveats
- The dataset is optimized for page selection, not for extracting a full structured rate table
- Some target names share a single official page, so multiple labels can point to the same candidate page across different queries
- This dataset reflects public information gathered as of the packaging date and should be refreshed when rates or page content change
## License and attribution
This dataset is a derivative packaging of public `gov.ie` source material and DSP publications.
Preserve attribution to the Government of Ireland and the Department of Social Protection and do not imply official endorsement.
Generated from `release_datasets/govie-dsp-rates-reranker-bilingual-v1`.
提供机构:
temsa



