synthetic-code-training/swe_doc_gen_locate_2000

Name: synthetic-code-training/swe_doc_gen_locate_2000
Creator: synthetic-code-training
Published: 2025-12-15 04:43:46
License: 暂无描述

Hugging Face2025-12-15 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/synthetic-code-training/swe_doc_gen_locate_2000

下载链接

链接失效反馈

官方服务：

资源简介：

SWE-Doc-Gen-Locate数据集（2000条条目）用于评估代理基于功能描述定位目标Python函数/类并添加文档字符串的能力。任务描述为：给定一个Python仓库和一个函数/类的描述（无名称、无文件路径），代理必须搜索代码库以找到目标函数/类的定义位置，阅读其实现以理解其行为，并生成并添加适当的文档字符串。每个条目包含唯一标识符、GitHub仓库、基础提交哈希、目标文件路径（不提供给代理）、模块名称（在屏蔽模式下不提供给代理）、模块类型（函数或类）、目标行范围、原始文档字符串（真实值）、功能描述（无函数名称）、参数详细信息、返回类型和描述、调用详细信息以及原始AST提取的信息。数据集从SWE-Gym-Raw使用基于LLM的描述生成（GPT-4o-mini）生成，包含2000条条目，涉及243个独特的仓库。

The SWE-Doc-Gen-Locate Dataset (2000 entries) is designed to evaluate an agents ability to locate a target Python function/class based on its functionality description and add a docstring. The task description is: Given a Python repository and a description of a function/class (NO name, NO file path), the agent must search the codebase to find where the target function/class is defined, read the implementation to understand its behavior, and generate and add an appropriate docstring. Each entry contains a unique identifier, GitHub repository (owner/repo), base commit hash, target file path (NOT given to the agent), module name (NOT given to the agent in masked mode), module type ("function" or "class"), line range of the target, original docstring (ground truth), functionality description WITHOUT function name, parameter details with types and descriptions, return type and description, call details (functions called and their purposes), and raw AST-extracted information. The dataset is generated from SWE-Gym-Raw using LLM-based description generation (GPT-4o-mini) and consists of 2000 entries from 243 unique repositories.

提供机构：

synthetic-code-training

5,000+

优质数据集

54 个

任务类型

进入经典数据集