five

StaQC

收藏
arXiv2018-03-26 更新2024-06-21 收录
下载链接:
https://github.com/LittleYUYU/StackOverflow-Question-Code-Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
StaQC数据集是由俄亥俄州立大学等机构的研究人员创建,包含约148,000个Python问题-代码对和约120,000个SQL问题-代码对。该数据集通过自动化的双视图层次神经网络框架从Stack Overflow中挖掘,旨在提供高质量的问题与代码解决方案配对。数据集不仅规模庞大,而且具有多样性,包括同一问题的多个解决方案和不同文本描述的相似代码片段。StaQC数据集适用于开发自然语言与编程语言关联的模型,如代码检索和生成,有助于提高这些模型的性能和准确性。

The StaQC dataset was developed by researchers from institutions including The Ohio State University, comprising approximately 148,000 Python question-code pairs and around 120,000 SQL question-code pairs. It was mined from Stack Overflow through an automated dual-view hierarchical neural network framework, with the goal of providing high-quality pairings of natural language questions and their matching code solutions. The dataset not only features a large scale but also boasts rich diversity, covering multiple solutions to the same problem as well as similar code snippets with distinct textual descriptions. The StaQC dataset is suitable for developing models that associate natural languages with programming languages, such as code retrieval and generation, and helps improve the performance and accuracy of these models.
提供机构:
俄亥俄州立大学
创建时间:
2018-03-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作