Stack Exchange Natural Language Interface to Database (SENLIDB) corpus

Name: Stack Exchange Natural Language Interface to Database (SENLIDB) corpus
Creator: 布加勒斯特理工大学
Published: 2017-07-11 16:33:55
License: 暂无描述

arXiv2017-07-11 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/1707.03172v1

下载链接

链接失效反馈

官方服务：

资源简介：

SENLIDB数据集是由布加勒斯特理工大学的研究团队创建，旨在为数据库的自然语言接口（NLIDB）提供大规模训练数据。该数据集包含24,890对文本描述和SQL代码片段，数据来源于Stack Exchange Data Explorer网站。创建过程中，研究团队通过爬虫技术收集并筛选了大量用户查询，确保数据的多样性和实用性。SENLIDB数据集主要用于训练和评估神经网络模型，以实现从自然语言到SQL查询的自动转换，解决非技术用户与数据库交互的难题。

The SENLIDB dataset was developed by a research team at the Polytechnic University of Bucharest, with the core objective of supplying large-scale training data for Natural Language Interfaces to Databases (NLIDB). It comprises 24,890 pairs of text descriptions and SQL code snippets, which were sourced from the Stack Exchange Data Explorer website. During the dataset construction process, the research team collected and filtered a substantial volume of user queries via web crawling technologies to ensure the diversity and practicality of the dataset. Primarily, the SENLIDB dataset is used for training and evaluating neural network models to achieve automatic conversion from natural language to SQL queries, thereby addressing the challenge of non-technical users interacting with databases.

提供机构：

布加勒斯特理工大学

创建时间：

2017-07-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集