lt-asset/CoRe
收藏Hugging Face2025-09-24 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/lt-asset/CoRe
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- question-answering
language:
- en
tags:
- LLM4code
- code_reasoning
- neurips25
size_categories:
- 10K<n<100K
---
# CoRe: Benchmarking LLMs’ Code Reasoning Capabilities through Static Analysis Tasks
This repository hosts the **CoRe** benchmark, designed to evaluate the reasoning capabilities of large language models on **program analysis tasks** including data dependency, control dependency, and information flow. Each task instance is represented as a structured JSON object with detailed metadata for evaluation and reproduction.
It contains 25k data points (last update: Sep. 24th, 2025).
Each example is a JSON object with the following fields:
```json
{
"label_file": "codenet_p00496_s700056700_main_12_40.yaml",
"code_file": "codenet_p00496_s700056700_main_12_40.c",
"pid": "p00496",
"sid": "s700056700",
"funname": "main",
"start": 12,
"end": 40,
"dataset": "codenet",
"language": "C",
"src": 30,
"dst": 33,
"groundtruth": true,
"task_id": "control_codenet_p00496_s700056700_main_12_40_k_33_1",
"prompt": "..."
"category": trace/all_source
}
```
### 🏷 Category Field
The `category` field specifies the type of prompt associated with each task instance:
* **trace**: The prompt asks the model to produce a dependency trace if the answer is `yes` (e.g., the control or data dependency exists).
* **all\_source**: The prompt asks the model to enumerate all source elements involved in the dependency.
## 🧩 Field Descriptions
| Field | Description |
|------------------|-------------|
| `label_file` | Path to the YAML file containing ground truth annotations for the current task instance. |
| `code_file` | Path to the corresponding C/Java/Python source code file. |
| `pid` | Problem ID from the original source dataset (e.g., CodeNet or GCJ). |
| `sid` | Solution ID identifying the specific program implementation. |
| `funname` | Name of the target function in which the analysis is conducted. |
| `start`, `end` | Line numbers defining the start and end of the target function. |
| `dataset` | Original dataset source (`codenet` or `gcj`). |
| `language` | Programming language of the source file (`C`, `Java`, `Python`). |
| `src`, `dst` | Defines the two program elements queried in this task. In control dependency, these are line numbers. In data dependency and information flow, they are structured as `["varname", line_no]`, representing variable instances. |
| `groundtruth` | Boolean indicating whether the specified dependency relationship holds (i.e., true if `src` has the given dependency on `dst`). |
| `task_id` | A unique ID for the task instance. The prefix (`control_`, `data_`, `infoflow_`) identifies the task type. |
| `prompt` | The prompt string used in the experiment for this task instance. It includes the instruction, examples, query, and code context provided to the LLM. Content-specific fields (e.g., source/target names, line numbers) are filled into a standardized prompt template. |
## 📚 Task Types
The benchmark contains three types of program reasoning tasks:
- `control`: Control dependency between lines.
- `data`: Data dependency between variables.
- `infoflow`: Information flow (explicit or implicit) between variables.
Each instance is designed to assess whether an LLM can understand and reason over static semantics in real-world source code.
## 🛠 Scripts and Usage
For scripts, evaluation tools, and detailed instructions on running inference over CoRe, please check out our companion GitHub repository:
🔗 Website: [https://corebench.github.io/](https://corebench.github.io/)
🔗 Source code: [https://github.com/CoReBench/CoRe](https://github.com/CoReBench/CoRe)
🔗 Paper: [https://arxiv.org/abs/2507.05269](https://arxiv.org/abs/2507.05269)
The github repo includes:
- Raw annotation data that could be used to generate various static analysis tasks
- Predefined prompts for each task and language
- Scripts for invoking models and parsing responses
- Evaluation scripts for dependency classification, trace generation, and dependency source enumeration
### 📄 License
Apache License 2.0
许可证:Apache-2.0
任务类别:问答
语言:英语
标签:LLM4代码、代码推理、NeurIPS 2025
数据规模:10000 < 样本数量 < 100000
# CoRe:基于静态分析任务的大语言模型(Large Language Model, LLM)代码推理能力基准测试集
本仓库托管**CoRe基准测试集**,旨在评估大语言模型在程序分析任务上的推理能力,涵盖数据依赖、控制依赖与信息流三类任务。每个任务实例以结构化JSON对象形式呈现,包含用于评估与复现的详细元数据。
本数据集共包含25000条数据(最后更新时间:2025年9月24日)。
每个示例均为包含以下字段的JSON对象:
json
{
"label_file": "codenet_p00496_s700056700_main_12_40.yaml",
"code_file": "codenet_p00496_s700056700_main_12_40.c",
"pid": "p00496",
"sid": "s700056700",
"funname": "main",
"start": 12,
"end": 40,
"dataset": "codenet",
"language": "C",
"src": 30,
"dst": 33,
"groundtruth": true,
"task_id": "control_codenet_p00496_s700056700_main_12_40_k_33_1",
"prompt": "...",
"category": trace/all_source
}
### 🏷 类别字段
`category`字段用于指定每个任务实例对应的提示词类型:
* **trace**:当答案为`yes`(即存在控制依赖或数据依赖)时,提示词要求模型输出依赖追踪路径。
* **all_source**:提示词要求模型枚举该依赖关系涉及的所有源元素。
## 🧩 字段说明
| 字段名 | 说明 |
|-------|------|
| `label_file` | 当前任务实例的真实标注YAML文件路径 |
| `code_file` | 对应C/Java/Python源代码文件的路径 |
| `pid` | 原始数据集(如CodeNet或GCJ)中的问题ID |
| `sid` | 用于标识特定程序实现的解决方案ID |
| `funname` | 本次分析所针对的目标函数名称 |
| `start`、`end` | 定义目标函数起止行号的两个参数 |
| `dataset` | 原始数据集来源(`codenet`或`gcj`) |
| `language` | 源代码文件所使用的编程语言(`C`、`Java`或`Python`) |
| `src`、`dst` | 定义本次任务中查询的两个程序元素。对于控制依赖,二者为行号;对于数据依赖与信息流任务,二者采用`["变量名", 行号]`的格式,代表对应变量实例 |
| `groundtruth` | 布尔类型字段,用于指示指定的依赖关系是否成立:若`src`对`dst`存在指定依赖,则字段值为`true` |
| `task_id` | 任务实例的唯一标识符。前缀(`control_`、`data_`、`infoflow_`)用于标识任务类型 |
| `prompt` | 本次任务实例在实验中使用的提示词字符串,包含向大语言模型提供的任务指令、示例、查询内容与代码上下文。任务特定字段(如源/目标名称、行号)均通过标准化提示词模板填充 |
## 📚 任务类型
本基准测试集包含三类程序推理任务:
- `control`:行与行之间的控制依赖关系
- `data`:变量之间的数据依赖关系
- `infoflow`:变量之间的信息流(显式或隐式)
每个任务实例均用于评估大语言模型能否理解并推理真实世界源代码中的静态语义。
## 🛠 脚本与使用方法
如需获取脚本、评估工具以及针对CoRe基准测试集的推理运行详细指南,请访问我们的配套GitHub仓库:
🔗 官方网站:[https://corebench.github.io/](https://corebench.github.io/)
🔗 源代码仓库:[https://github.com/CoReBench/CoRe](https://github.com/CoReBench/CoRe)
🔗 研究论文:[https://arxiv.org/abs/2507.05269](https://arxiv.org/abs/2507.05269)
该GitHub仓库包含以下内容:
- 可用于生成各类静态分析任务的原始标注数据
- 针对各类任务与编程语言的预定义提示词
- 用于调用模型与解析模型输出的脚本
- 用于依赖关系分类、依赖追踪路径生成与依赖源元素枚举的评估脚本
### 📄 许可证
Apache许可证2.0
提供机构:
lt-asset



