SG-NLG (Schema-Guided Natural Language Generation)

Name: SG-NLG (Schema-Guided Natural Language Generation)
Creator: OpenDataLab
Published: 2026-05-24 07:30:10
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/SG-NLG

下载链接

链接失效反馈

官方服务：

资源简介：

SG-NLG 数据集是 DSTC8 模式引导对话 SGD 数据集的预处理版本，专为数据到文本的自然语言生成 (NLG) 而设计。原始 DSTC8 SGD 包含约 20,000 个对话，跨越约 20 个域。这个 SG-NLG 数据集旨在使对 SGD 数据进行 NLG 实验变得更加容易。它由预处理的 SGD 数据组成，通过将每个系统轮次的模式与实现它的相应自然语言字符串集配对。它还将提示“去词汇化”（用固定名称替换相关值）以将它们转换为模板，使它们在对话系统中更通用。最终的 SG-NLG 数据集由近 4K MR 和超过 140K 模板组成。

The SG-NLG dataset is a preprocessed version of the DSTC8 Schema-Guided Dialogue (SGD) dataset, specifically designed for data-to-text natural language generation (NLG) tasks. The original DSTC8 SGD dataset contains approximately 20,000 dialogues across roughly 20 domains. This SG-NLG dataset aims to simplify NLG experiments conducted on SGD data. It comprises preprocessed SGD data, where each system turn's schema is paired with the corresponding set of natural language strings that realize the schema. It also applies "delexicalization" to prompts—replacing contextually relevant values with fixed nominal labels—to convert them into templates, thereby enhancing their generalizability in dialogue systems. The finalized SG-NLG dataset consists of nearly 4K meaning representations (MRs) and over 140K templates.

提供机构：

OpenDataLab

创建时间：

2022-05-23

搜集汇总

数据集介绍