five

Search4Code

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/microsoft/search4code
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是从必应网络搜索引擎中挖掘出的首批大规模真实世界代码搜索查询集,涵盖了C#和Java两种编程语言,其中包含大约4,974个C#查询和6,596个Java查询。该数据集不仅包含真实的查询语句,还包含了每个查询对应的点击量排名前三的链接URL,以及每个查询的流行度评分。这一数据集规模庞大,查询量超过100万个,适用于分析代码搜索和非代码搜索查询,其任务是对代码搜索意图进行分类。

This dataset is the first large-scale real-world code search query dataset mined from the Bing web search engine. It covers two programming languages, C# and Java, and contains approximately 4,974 C# queries and 6,596 Java queries. In addition to authentic query statements, this dataset also includes the top three URLs ranked by click volume corresponding to each query, as well as the popularity score of each query. Boasting a vast scale with over one million queries in total, this dataset is applicable for analyzing both code search and non-code search queries, with its target task being code search intent classification.
提供机构:
Authors of the paper
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作