FStarDataSet-V2
收藏魔搭社区2025-12-05 更新2025-07-26 收录
下载链接:
https://modelscope.cn/datasets/microsoft/FStarDataSet-V2
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is the Version 2.0 of [`microsoft/FStarDataSet`](https://huggingface.co/datasets/microsoft/FStarDataSet).
## Primary-Objective
This dataset's primary objective is to train and evaluate Proof-oriented Programming with AI (PoPAI, in short). Given a specification of a program and proof in F*,
the objective of a AI model is to synthesize the implemantation (see [below](#usage) for details about the usage of this dataset, including the input and output).
## Data Format
Each of the examples in this dataset are organized as dictionaries with the following schema
```json
{
"file_name": <str: Name of the file>,
"name": <str: name of the example, can be used to uniquely identify the example>,
"original_source_type": <str: actual source type, to be used for type checking>,
"source_type": <str: modified source type, to be used to formulate prompt>,
"source_definition": <str: target definition>,
"source": <dict: contains metadata about the source of this example, including project_name, git url, git sha, etc.>,
"source_range": <dict: metadata containing start and end lines and columns of this definition in the source file>,
"file_context": <str: extracted file context upto the point of current definition>,
"dependencies": <dict: build dependencies for this file>,
"opens_and_abbrevs": <list[dict]: List of opened modules and abbreviated modules in the file, necessary for evaluation.>,
"vconfig": <dict: SMT solver flags for this definition>,
"interleaved": <bool: whether this definition is interleaved from the interface file>,
"verbose_type": <str: the verbose type of this definition as resolved by the type checker>,
"effect": <str: effect>,
"effect_flags": <list[str]: any effect flags>,
"mutual_with": <list: if this definition is mutually recursive with another, list of those names>,
"ideal_premises": <list[str]: Other definitions that are used in the ground truth definition>,
"proof_features": <list[str]>,
"is_simple_lemma": <bool/null>,
"is_div": <bool: if this definition has the divergent effect>,
"is_proof": <bool>,
"is_simply_typed": <bool>,
"is_type": <bool/null>,
"partial_definition": <str>,
"completed_definiton": <str>,
"isa_cross_project_example": <bool: if this example belongs to the cross-project evaluation set>
}
```
# Usage
To use this dataset with [`datasets`](https://pypi.org/project/datasets/),
```python
from datasets import load_dataset
data = load_dataset("microsoft/FStarDataSet-V2")
train_data = data["train"]
eval_data = data["validation"]
test_data = data["test"]
intra_project_test = test_data.filter(lambda x: x["isa_cross_project_example"] == False)
cross_project_test = test_data.filter(lambda x: x["isa_cross_project_example"] == True)
```
## Input
The primary input for generating F* definition is **`source_type`**.
All other information in an example may be used directly or to derive an input except
**`source_definition`**, **`ideal_premises`**, and **`completed_definiton`**.
## Output
The primary output is **`source_definition`**, which is the ground truth definition, that can be evaluated with the [proof checker](#evaluation-on-this-dataset).
The **`completed_definiton`** may be used as ground truth when a model is used as a text completion setting (though the evaluator does not support evaluation in this setting).
In addition, **`ideal_premises`** may be used for evaluating premise selection models.
# Evaluation on this dataset
Generated F* definitions should be evaluated the proof checker tool from
[https://github.com/FStarLang/fstar_dataset/releases/tag/eval-v2.0](https://github.com/FStarLang/fstar_dataset/releases/tag/eval-v2.0).
Download the source code and the `helpers.zip` file from the release.
## Troubleshooting
The attached binaries in the evaluator (i.e., `fstar.exe` and `z3`) are built on
**`Ubuntu 20.04.6 LTS (GNU/Linux 5.4.0-189-generic x86_64)`**, **`gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2)`**, **`OCaml 4.12.0`**.
If any of the binaries do not work properly, build F* from [this commit (10183ea187da8e8c426b799df6c825e24c0767d3)](https://github.com/FStarLang/FStar/commit/10183ea187da8e8c426b799df6c825e24c0767d3)
from the [F* repository](https://github.com/FStarLang/FStar), using the [installation guide](https://github.com/FStarLang/FStar/blob/master/INSTALL.md).
# Data Source
In addition to the eight projects in `microsoft/FStarDataSet`, data from four more projects are included in this version.
1. [Starmada](https://github.com/microsoft/Armada): a framework for doing proofs by stepwise refinement for concurrent programs in a weak memory model. Starmada is an experimental version of Armada implemented in F⋆, relying on various advanced features of F⋆’s dependent type system for more generic and abstract proofs.
2. [Zeta](https://github.com/project-everest/zeta): a high performance, concurrent monitor for stateful services proven correct in F⋆ and its Steel concurrent separation logic
3. [Dice-star](https://github.com/verified-HRoT/dice-star): a verified implementation of the DICE measured boot protocol for embedded devices
4. [Noise-star](https://github.com/Inria-Prosecco/noise-star): a verified compiler for implementations of Noise protocols, a family of key-exchange protocols
# Limitations
**TDB**
# Citation
```
@inproceedings{chakraborty2024towards,
title={Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming},
author={Chakraborty, Saikat and Ebner, Gabriel and Bhat, Siddharth and Fakhoury, Sarah and Fatima, Sakina and Lahiri, Shuvendu and Swamy, Nikhil},
booktitle={Proceedings of the IEEE/ACM 47th International Conference on Software Engineering (To Appear)},
pages={1--12},
year={2025}
}
```
本数据集为 [`microsoft/FStarDataSet`](https://huggingface.co/datasets/microsoft/FStarDataSet) 的2.0版本。
## 核心目标
本数据集的核心目标是训练与评估面向证明的人工智能编程(Proof-oriented Programming with AI,简称PoPAI)。给定F*语言中的程序规约与证明,AI模型的任务是合成对应的实现(关于本数据集的使用细节,包括输入与输出格式,请参见下文[#usage](#usage)章节)。
## 数据格式
本数据集的每个示例均采用如下结构的字典组织:
json
{
"file_name": <字符串: 文件名>,
"name": <字符串: 示例名称,可用于唯一标识该示例>,
"original_source_type": <字符串: 原始源类型,用于类型检查>,
"source_type": <字符串: 修改后的源类型,用于构建提示词>,
"source_definition": <字符串: 目标定义>,
"source": <字典: 包含该示例来源的元数据,如项目名、git地址、git哈希等>,
"source_range": <字典: 包含该定义在源文件中的起止行与列的元数据>,
"file_context": <字符串: 当前定义之前提取的文件上下文>,
"dependencies": <字典: 该文件的构建依赖>,
"opens_and_abbrevs": <列表[字典]: 文件中打开的模块与缩写模块列表,为评估所必需>,
"vconfig": <字典: 该定义的SMT求解器标志>,
"interleaved": <布尔值: 该定义是否来自接口文件的交错定义>,
"verbose_type": <字符串: 类型检查器解析出的该定义的详细类型>,
"effect": <字符串: 效应>,
"effect_flags": <列表[字符串]: 所有效应标志>,
"mutual_with": <列表: 若该定义与其他定义互递归,则列出对应的名称列表>,
"ideal_premises": <列表[字符串]: 真实定义中使用的其他定义>,
"proof_features": <列表[字符串]: 证明特征列表>,
"is_simple_lemma": <布尔值/空值>,
"is_div": <布尔值: 该定义是否具有发散效应>,
"is_proof": <布尔值>,
"is_simply_typed": <布尔值>,
"is_type": <布尔值/空值>,
"partial_definition": <字符串: 部分定义>,
"completed_definiton": <字符串: 完整定义>,
"isa_cross_project_example": <布尔值: 该示例是否属于跨项目评估集>
}
## 使用方法
若要结合[`datasets`](https://pypi.org/project/datasets/)库使用本数据集,可使用如下代码:
python
from datasets import load_dataset
data = load_dataset("microsoft/FStarDataSet-V2")
train_data = data["train"]
eval_data = data["validation"]
test_data = data["test"]
intra_project_test = test_data.filter(lambda x: x["isa_cross_project_example"] == False)
cross_project_test = test_data.filter(lambda x: x["isa_cross_project_example"] == True)
## 输入
生成F*语言定义的核心输入为**`source_type`**。示例中的其余信息均可直接使用或用于推导输入,但需排除**`source_definition`**、**`ideal_premises`**与**`completed_definiton`**。
## 输出
核心输出为**`source_definition`**,即可通过[证明检查器](#evaluation-on-this-dataset)进行评估的真实定义。当模型用于文本补全任务时,**`completed_definiton`**可作为真实定义使用(尽管评估器暂不支持该场景下的评估)。此外,**`ideal_premises`**可用于评估前提选择模型。
## 数据集评估
生成的F*语言定义需通过[https://github.com/FStarLang/fstar_dataset/releases/tag/eval-v2.0](https://github.com/FStarLang/fstar_dataset/releases/tag/eval-v2.0)提供的证明检查器工具进行评估。请从该发布页面下载源代码与`helpers.zip`文件。
### 故障排查
评估器中附带的二进制文件(即`fstar.exe`与`z3`)基于**`Ubuntu 20.04.6 LTS (GNU/Linux 5.4.0-189-generic x86_64)`**、**`gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2)`**与**`OCaml 4.12.0`**构建。若二进制文件无法正常运行,请根据[F*仓库](https://github.com/FStarLang/FStar)的[指定提交(10183ea187da8e8c426b799df6c825e24c0767d3)](https://github.com/FStarLang/FStar/commit/10183ea187da8e8c426b799df6c825e24c0767d3),并参照[安装指南](https://github.com/FStarLang/FStar/blob/master/INSTALL.md)自行编译F*。
## 数据来源
相较于`microsoft/FStarDataSet`中的8个项目,本版本额外纳入了4个项目的数据:
1. [Starmada](https://github.com/microsoft/Armada):一种针对弱内存模型下并发程序的逐步精化证明框架。Starmada是用F⋆实现的Armada实验版本,依托F⋆的多种高级依赖类型系统特性,实现更具泛化性与抽象性的证明。
2. [Zeta](https://github.com/project-everest/zeta):面向有状态服务的高性能并发监视器,已通过F⋆及其Steel并发分离逻辑验证其正确性。
3. [Dice-star](https://github.com/verified-HRoT/dice-star):面向嵌入式设备的DICE度量启动协议的可验证实现。
4. [Noise-star](https://github.com/Inria-Prosecco/noise-star):针对Noise协议(一类密钥交换协议)实现的可验证编译器。
## 局限性
**TDB**
## 引用
@inproceedings{chakraborty2024towards,
title={Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming},
author={Chakraborty, Saikat and Ebner, Gabriel and Bhat, Siddharth and Fakhoury, Sarah and Fatima, Sakina and Lahiri, Shuvendu and Swamy, Nikhil},
booktitle={Proceedings of the IEEE/ACM 47th International Conference on Software Engineering (To Appear)},
pages={1--12},
year={2025}
}
提供机构:
maas
创建时间:
2025-07-22



