five

temsa/govie-dsp-rates-reranker-bilingual-v1

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/temsa/govie-dsp-rates-reranker-bilingual-v1
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: gov.ie DSP Rates Reranker Bilingual v1 license: other language: - en - ga task_categories: - text-ranking size_categories: - n<1K annotations_creators: - machine-generated multilinguality: multilingual tags: - ireland - gov-ie - dsp - social-protection - retrieval - reranking - welfare-rates configs: - config_name: default data_files: - split: train path: train.jsonl - split: validation path: validation.jsonl --- # gov.ie DSP Rates Reranker Bilingual v1 Bilingual `(query, candidate_page)` reranking dataset for `gov.ie` Department of Social Protection payment-rate lookup. This release is built from a curated DSP scheme catalog grounded in public `gov.ie` service pages and the official `SW19 Rates of Payment 2026` booklet. ## What is in the dataset - English and Irish Gaelic query variants for DSP allowance, benefit and grant lookup - exact-name and rate-lookup queries - page-level positives with hard negatives drawn from confusable DSP pages - shared-page targets where the public source page covers more than one scheme ## Shared-page targets - `Electricity Allowance` and `Gas Allowance` resolve to `Household Benefits Package` - `Half-rate Carer's Allowance` resolves to `Carer's Allowance` - `Jobseeker's Benefit for the Self-Employed` resolves to `Jobseeker's Benefit` - `Newborn Baby Grant` resolves to `Child Benefit` - `Cost of Education Allowance` resolves to `Back to Education Allowance` - `Island Allowance` resolves to `Increase for Living on a Specified Island` ## Splits - `train`: `512` - `validation`: `240` Base query ids are split deterministically so English and Irish variants stay together. ## Query policy - English queries use an ASCII-friendly version of the public scheme title because that is closer to how people typically type searches - Irish variants use Irish query framing while keeping the public scheme title stable when there is no single official Irish slug to anchor on - broad catalog queries are included for the 2026 DSP rate book and Budget 2026 overview page ## Included metadata - `metadata/meta.json`: packaging metadata and query policy - `metadata/targets.json`: canonical target catalog and query examples ## Intended use - train or distill bilingual rerankers for DSP scheme-rate lookup - benchmark `gov.ie` welfare-rate search quality - test exact-scheme lookups against confusable DSP candidate pages ## Caveats - The dataset is optimized for page selection, not for extracting a full structured rate table - Some target names share a single official page, so multiple labels can point to the same candidate page across different queries - This dataset reflects public information gathered as of the packaging date and should be refreshed when rates or page content change ## License and attribution This dataset is a derivative packaging of public `gov.ie` source material and DSP publications. Preserve attribution to the Government of Ireland and the Department of Social Protection and do not imply official endorsement. Generated from `release_datasets/govie-dsp-rates-reranker-bilingual-v1`.
提供机构:
temsa
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作