drndr/statcodesearch
收藏Hugging Face2024-11-28 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/drndr/statcodesearch
下载链接
链接失效反馈官方服务:
资源简介:
StatCodeSearch数据集是一个基准测试集,包含从主要由研究人员编写的R编程语言脚本中提取的代码注释对。数据集来源于Open Science Framework (OSF),包含来自R项目的文本和代码样本,这些项目涉及社会科学和心理学领域,重点是研究数据的统计分析。作为GenCodeSearchNet测试套件的一部分,该数据集可用于测试低资源编程语言的理解能力。数据集的应用包括使用注释作为查询返回匹配的代码片段,以及使用标签将代码片段分类为四个类别:数据变量、可视化、统计建模和统计测试。数据集的结构包括唯一标识符、注释、代码、标签、来源和文件名。
The StatCodeSearch dataset is a benchmark test set consisting of code comment pairs extracted from R programming language scripts authored mostly by researchers. The dataset is sourced from the Open Science Framework (OSF). It includes text and code samples from R projects that pertain to the fields of social science and psychology with a focus on the statistical analysis of research data. As part of the GenCodeSearchNet test suite, this dataset can be used to test programming language understanding on a low resource programming language. The uses of the dataset include semantic code search, where comments are used as queries to return matching code snippets, and code classification, where labels are used to classify code snippets into four categories: Data Variable, Visualization, Statistical Modeling, and Statistical Test. The dataset structure includes unique identifiers, comments, code, labels, sources, and file names.
提供机构:
drndr



