SDG_TargetX_v1: Semantically Enriched SDG Target Dataset for LLM-based Mapping

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://data.mendeley.com/datasets/mx69jyjvfb

下载链接

链接失效反馈

官方服务：

资源简介：

The SDG_TargetX_v1 dataset is a semantically enriched representation of the United Nations Sustainable Development Goal (SDG) targets, designed to support general-purpose mapping of textual inputs to sustainability objectives. The dataset is based on the premise that raw SDG target descriptions, while comprehensive, are often difficult to interpret directly in computational systems due to their policy-oriented language and lack of structured semantic features. To address this, all 169 SDG targets were extracted from the official United Nations framework and transformed into a multi-layer semantic dataset. Each target is represented through multiple attributes, including the original target text, a simplified layman description, extended contextual information, thematic classification, and mapping to one of the five sustainability dimensions (People, Planet, Prosperity, Peace, Partnership). This structured representation enhances interpretability for both human users and machine learning systems, particularly in natural language processing (NLP) and large language model (LLM) applications. The dataset was constructed through a systematic enrichment process that preserves the original intent of each SDG target while improving clarity and contextual depth. The inclusion of multiple semantic layers reduces ambiguity in policy language and enables more effective alignment with diverse textual inputs. To demonstrate usability, the dataset was applied in a mapping exercise using varied textual inputs such as institutional objectives and domain-specific statements. The results indicate improved alignment with a broader range of SDG targets and better representation across multiple sustainability dimensions. The data can be interpreted as a semantic bridge between unstructured text and SDG targets. Each record provides both policy-level meaning and structured features, enabling tasks such as semantic similarity matching, SDG classification, sustainability assessment, and alignment analysis. The dataset is domain-agnostic and can be used across applications including policy mapping, sustainability analytics, project evaluation, and AI-driven SDG alignment systems. It is particularly useful for scenarios involving unstructured textual data where direct mapping to SDG targets is challenging without semantic enrichment.

创建时间：

2026-04-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集