five

wrmthorne/ukri-funding-opportunities

收藏
Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/wrmthorne/ukri-funding-opportunities
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-sa-4.0 language: - en tags: - funding - ukri - research-funding - opportunities - uk-research pretty_name: UKRI Funding Opportunities size_categories: - 1K<n<10K --- # UKRI Funding Opportunities A structured dataset of 2,102 funding opportunities published by [UK Research and Innovation (UKRI)](https://www.ukri.org/opportunity/) across all nine constituent bodies (AHRC, BBSRC, EPSRC, ESRC, IUK, MRC, NERC, RE, STFC, and cross-council UKRI calls). This dataset accompanies the paper *"Demystifying Funding: Reconstructing a Unified Dataset of the UK Funding Lifecycle"* (NSLP 2026) and the [GtR Database repository](https://github.com/wrmthorne/GtR-Extended). ## Background UKRI publishes funding opportunities on its website but provides no structured, machine-readable export. Key metadata such as award ranges, total fund values, UKRI contribution percentages, and project durations are frequently stated only within prose, distributed across document sections. This dataset makes that information available in a structured format for research and analysis. ## Collection All pending, open, and closed opportunities were scraped from the [UKRI Opportunity Finder](https://www.ukri.org/opportunity/) between **12 February 2026 and 22 March 2026** using Playwright browser automation. Raw HTML pages were parsed into a tree-structured representation preserving the document hierarchy (summary, accordion sections, subsections). Because opportunity status and closing dates change over time, this dataset represents a **snapshot** at the time of collection. ## Metadata Extraction Several fields (`total_funding`, `minimum_award`, `maximum_award`, `funding_percentage`, `minimum_funding_duration`, `maximum_funding_duration`) are not always present in the structured metadata table on the opportunity page. Where absent, they were extracted from the opportunity prose using an LLM-based closed-domain question-answering pipeline. The pipeline uses hierarchical retrieval (hybrid BM25 + dense similarity with branch-deduplication) to select relevant passages, which are then passed to GPT-4o-mini for extraction. Evaluation on 101 manually annotated opportunities (489 question-answer pairs) showed 87.0% overall accuracy. See the accompanying paper for full details. Fields sourced from the structured metadata table on each opportunity page (`status`, `funders`, `co_funders`, `funding_type`, `publication_date`, `opening_date`, `closing_date`, `href`) are deterministically parsed and are not subject to extraction errors. ## Fields | Field | Type | Description | |---|---|---| | `id` | string | UUID5 identifier derived from the canonical URL | | `title` | string | Opportunity title | | `status` | string | Status at time of collection (Closed, Open, Upcoming) | | `funders` | string | Comma-separated UKRI funder codes (e.g. "EPSRC", "AHRC, ESRC") | | `co_funders` | string | Comma-separated co-funding organisations, if any | | `funding_type` | string | Type of funding (e.g. Grant, Fellowship, Studentship) | | `publication_date` | string | ISO 8601 date the opportunity was published | | `opening_date` | string | ISO 8601 date the opportunity opened for applications | | `closing_date` | string | ISO 8601 date the opportunity closed | | `summary` | string | Plain-text summary of the opportunity | | `full_text` | string | Full opportunity content as markdown | | `sections` | list | Hierarchical tree of accordion sections (header, content, children) | | `updates` | list | Date-text pairs of opportunity updates, if any | | `href` | string | Canonical URL to the opportunity page | | `total_funding` | int | Total fund value in GBP (may be LLM-extracted) | | `minimum_award` | int | Minimum award value in GBP (may be LLM-extracted) | | `maximum_award` | int | Maximum award value in GBP (may be LLM-extracted) | | `funding_percentage` | int | UKRI contribution percentage (may be LLM-extracted) | | `minimum_funding_duration` | int | Minimum project duration in months (may be LLM-extracted) | | `maximum_funding_duration` | int | Maximum project duration in months (may be LLM-extracted) | ## Limitations - **Snapshot**: Opportunity statuses and closing dates are as of collection (Feb--Mar 2026) and may have since changed. - **Coverage**: Not all UKRI-funded activity corresponds to a published opportunity; responsive mode schemes and formula-based allocations are not represented. - **Extraction accuracy**: LLM-extracted fields have an overall accuracy of 87.0%. The most common errors are field confusion (e.g. returning total funding when asked for maximum award) and hallucination. Maximum award is the hardest field (66.7% accuracy). See the paper for a detailed error analysis. - **Innovate UK**: IUK opportunities typically publish short summaries linking to the separate Innovation Funding Service, so their `full_text` and `sections` fields contain less detail than other councils.
提供机构:
wrmthorne
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作