lizn-zn/algoveri-lean
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/lizn-zn/algoveri-lean
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
tags:
- lean4
- lean
- formal-verification
- theorem-proving
- algorithms
- code-generation
- benchmark
- vericoding
pretty_name: "AlgoVeri-Lean: Verified Algorithm Specs in Lean 4"
size_categories:
- n<1K
dataset_info:
features:
- name: task_id
dtype: string
- name: lean_code
dtype: string
splits:
- name: train
num_examples: 78
configs:
- config_name: default
data_files:
- split: train
path: data.jsonl
---
<div align="center">
# AlgoVeri-Lean
**77 classical algorithm verification tasks in Lean 4**
[](https://arxiv.org/abs/2602.09464)
[](https://github.com/haoyuzhao123/algoveri)
[](https://opensource.org/licenses/Apache-2.0)
</div>
---
## What is this?
This is the **Lean 4 subset** of the [AlgoVeri](https://github.com/haoyuzhao123/algoveri) benchmark — a cross-language benchmark for *vericoding* (generating formally verified code from specifications).
Each task provides a **Lean 4 specification** that includes:
- **Preconditions** — constraints on valid inputs
- **Function signature** — with a `sorry`'d implementation to be filled in
- **Postconditions** — formal properties the implementation must satisfy
- **Theorem stub** — a `sorry`'d correctness proof to be completed
The goal: implement the algorithm **and** prove the postconditions hold — all in Lean 4.
## Quick numbers
| | |
|---|---|
| Tasks | **77** algorithm problems |
| Files | **78** `.lean` specs (gcd has two variants) |
| Lean toolchain | **4.25.0-rc2** + Mathlib |
| Best model score | **7.8%** pass rate (Gemini-3 Flash) |
## Algorithm categories
| Category | Tasks |
|---|---|
| **Sorting** | bubble_sort, insertion_sort, merge_sort, quick_sort, k_smallest |
| **Search** | binary_search, linear_search, string_search_naive, kmp, ac_automata |
| **Graph** | bfs, dfs, dijkstra, bellman_ford, kruskal, prim, topological_sort, scc_tarjan, cycle_detection, bipartite_check, push_relabel, edmond_karp, max_matching, lca |
| **DP** | coin_change, house_robber, jump_game, knapsack_01, knapsack_unbounded, longest_common_subsequence, longest_increasing_subsequence, longest_palindrome_substring, maximum_subarray_sum, rod_cutting |
| **Data structures** | bst_insert, bst_search, bst_delete, bst_zig, bst_zigzag, bst_zigzig, splaytree_splay, llrbt_insert, llrbt_delete, llrbt_flipcolor, llrbt_rotateleft, llrbt_rotateright, maxheap_push, maxheap_popmax, trie_insert, trie_search, trie_delete, ternarysearchtree_insert, ternarysearchtree_search, ternarysearchtree_delete, segmenttree_build, segmenttree_modify, segmenttree_query, stack_push, stack_pop, queue_enqueue, queue_dequeue, ringbuffer_enqueue, ringbuffer_dequeue, unionfind_find, unionfind_linkroots |
| **Math / number theory** | gcd, fast_exponential, integer_exponential, trial_division_naive, trial_division_optimized, sieve_method, discrete_logarithm |
| **Other** | bracket_matching, matrix_multiplication, linearsys_gf2, polymul_naive, polymul_karatsuba |
## Usage
```python
from datasets import load_dataset
ds = load_dataset("lizn-zn/algoveri-lean", split="train")
print(ds[0]["task_id"], ds[0]["lean_code"][:200])
```
## Spec structure (example: `binary_search`)
Every `.lean` file follows the same pattern:
```lean
import Mathlib
-- Precondition
def binary_search_lower_bound_precond (seq : Array Int) (target : Int) : Prop :=
seq.size ≤ 0x7FFFFFFF ∧
(∀ i j : Nat, i < j ∧ j < seq.size → seq.getD i 0 ≤ seq.getD j 0)
-- Implementation stub (fill this in)
def binary_search_lower_bound (seq : Array Int) (target : Int)
(h_precond : ...) : Nat :=
sorry
-- Postcondition
def binary_search_lower_bound_postcond (seq : Array Int) (target : Int)
(result : Nat) (h_precond : ...) : Prop :=
result ≤ seq.size ∧
(∀ i : Nat, i < result → seq.getD i 0 < target) ∧
(∀ i : Nat, result ≤ i ∧ i < seq.size → seq.getD i 0 ≥ target)
-- Prove correctness
theorem binary_search_lower_bound_postcond_satisfied ... := by
sorry
```
## Citation
```bibtex
@article{zhao2026algoveri,
title = {AlgoVeri: An Aligned Benchmark for Verified Code Generation on Classical Algorithms},
author = {Haoyu Zhao and Ziran Yang and Jiawei Li and Deyuan He and Zenan Li and Chi Jin and Venugopal V. Veeravalli and Aarti Gupta and Sanjeev Arora},
journal = {arXiv preprint arXiv:2602.09464},
year = {2026}
}
```
## License
Apache 2.0 — same as the upstream AlgoVeri repository.
提供机构:
lizn-zn



