five

SDG_TargetX_v1: Semantically Enriched SDG Target Dataset for LLM-based Mapping

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/mx69jyjvfb
下载链接
链接失效反馈
官方服务:
资源简介:
The SDG_TargetX_v1 dataset is a semantically enriched representation of the United Nations Sustainable Development Goal (SDG) targets, designed to support general-purpose mapping of textual inputs to sustainability objectives. The dataset is based on the premise that raw SDG target descriptions, while comprehensive, are often difficult to interpret directly in computational systems due to their policy-oriented language and lack of structured semantic features. To address this, all 169 SDG targets were extracted from the official United Nations framework and transformed into a multi-layer semantic dataset. Each target is represented through multiple attributes, including the original target text, a simplified layman description, extended contextual information, thematic classification, and mapping to one of the five sustainability dimensions (People, Planet, Prosperity, Peace, Partnership). This structured representation enhances interpretability for both human users and machine learning systems, particularly in natural language processing (NLP) and large language model (LLM) applications. The dataset was constructed through a systematic enrichment process that preserves the original intent of each SDG target while improving clarity and contextual depth. The inclusion of multiple semantic layers reduces ambiguity in policy language and enables more effective alignment with diverse textual inputs. To demonstrate usability, the dataset was applied in a mapping exercise using varied textual inputs such as institutional objectives and domain-specific statements. The results indicate improved alignment with a broader range of SDG targets and better representation across multiple sustainability dimensions. The data can be interpreted as a semantic bridge between unstructured text and SDG targets. Each record provides both policy-level meaning and structured features, enabling tasks such as semantic similarity matching, SDG classification, sustainability assessment, and alignment analysis. The dataset is domain-agnostic and can be used across applications including policy mapping, sustainability analytics, project evaluation, and AI-driven SDG alignment systems. It is particularly useful for scenarios involving unstructured textual data where direct mapping to SDG targets is challenging without semantic enrichment.
创建时间:
2026-04-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作