MULTISPIDER
收藏arXiv2022-12-27 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2212.13492v1
下载链接
链接失效反馈官方服务:
资源简介:
MULTISPIDER是由哈尔滨工业大学和微软亚洲研究院共同创建的多语言文本到SQL语义解析数据集,涵盖英语、德语、法语、西班牙语、日语、中文和越南语七种语言。该数据集基于Spider数据集构建,包含9691个问题和5263个SQL查询,覆盖166个数据库。MULTISPIDER旨在解决多语言环境下文本到SQL解析的挑战,通过多轮翻译和验证确保数据质量,特别考虑了特定语言属性以使问题更自然和真实。该数据集不仅质量高,而且在多语言文本到SQL解析方面极具挑战性,适用于开发和测试多语言交互系统的关键组件。
MULTISPIDER is a multilingual text-to-SQL semantic parsing dataset jointly created by Harbin Institute of Technology and Microsoft Research Asia. It covers seven languages including English, German, French, Spanish, Japanese, Chinese and Vietnamese. Built upon the Spider dataset, this dataset contains 9,691 questions and 5,263 SQL queries spanning 166 databases. MULTISPIDER aims to address the challenges of text-to-SQL parsing in multilingual environments. It ensures data quality through multi-round translation and validation, and specially considers language-specific attributes to make the questions more natural and authentic. This dataset not only boasts high quality, but also is highly challenging in the field of multilingual text-to-SQL parsing, making it suitable for developing and testing core components of multilingual interactive systems.
提供机构:
哈尔滨工业大学 微软亚洲研究院
创建时间:
2022-12-27



