five

lizn-zn/algoveri-lean

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/lizn-zn/algoveri-lean
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 language: - en tags: - lean4 - lean - formal-verification - theorem-proving - algorithms - code-generation - benchmark - vericoding pretty_name: "AlgoVeri-Lean: Verified Algorithm Specs in Lean 4" size_categories: - n<1K dataset_info: features: - name: task_id dtype: string - name: lean_code dtype: string splits: - name: train num_examples: 78 configs: - config_name: default data_files: - split: train path: data.jsonl --- <div align="center"> # AlgoVeri-Lean **77 classical algorithm verification tasks in Lean 4** [![Paper](https://img.shields.io/badge/arXiv-2602.09464-b31b1b?logo=arxiv)](https://arxiv.org/abs/2602.09464) [![GitHub](https://img.shields.io/badge/GitHub-haoyuzhao123%2Falgoveri-181717?logo=github)](https://github.com/haoyuzhao123/algoveri) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) </div> --- ## What is this? This is the **Lean 4 subset** of the [AlgoVeri](https://github.com/haoyuzhao123/algoveri) benchmark — a cross-language benchmark for *vericoding* (generating formally verified code from specifications). Each task provides a **Lean 4 specification** that includes: - **Preconditions** — constraints on valid inputs - **Function signature** — with a `sorry`'d implementation to be filled in - **Postconditions** — formal properties the implementation must satisfy - **Theorem stub** — a `sorry`'d correctness proof to be completed The goal: implement the algorithm **and** prove the postconditions hold — all in Lean 4. ## Quick numbers | | | |---|---| | Tasks | **77** algorithm problems | | Files | **78** `.lean` specs (gcd has two variants) | | Lean toolchain | **4.25.0-rc2** + Mathlib | | Best model score | **7.8%** pass rate (Gemini-3 Flash) | ## Algorithm categories | Category | Tasks | |---|---| | **Sorting** | bubble_sort, insertion_sort, merge_sort, quick_sort, k_smallest | | **Search** | binary_search, linear_search, string_search_naive, kmp, ac_automata | | **Graph** | bfs, dfs, dijkstra, bellman_ford, kruskal, prim, topological_sort, scc_tarjan, cycle_detection, bipartite_check, push_relabel, edmond_karp, max_matching, lca | | **DP** | coin_change, house_robber, jump_game, knapsack_01, knapsack_unbounded, longest_common_subsequence, longest_increasing_subsequence, longest_palindrome_substring, maximum_subarray_sum, rod_cutting | | **Data structures** | bst_insert, bst_search, bst_delete, bst_zig, bst_zigzag, bst_zigzig, splaytree_splay, llrbt_insert, llrbt_delete, llrbt_flipcolor, llrbt_rotateleft, llrbt_rotateright, maxheap_push, maxheap_popmax, trie_insert, trie_search, trie_delete, ternarysearchtree_insert, ternarysearchtree_search, ternarysearchtree_delete, segmenttree_build, segmenttree_modify, segmenttree_query, stack_push, stack_pop, queue_enqueue, queue_dequeue, ringbuffer_enqueue, ringbuffer_dequeue, unionfind_find, unionfind_linkroots | | **Math / number theory** | gcd, fast_exponential, integer_exponential, trial_division_naive, trial_division_optimized, sieve_method, discrete_logarithm | | **Other** | bracket_matching, matrix_multiplication, linearsys_gf2, polymul_naive, polymul_karatsuba | ## Usage ```python from datasets import load_dataset ds = load_dataset("lizn-zn/algoveri-lean", split="train") print(ds[0]["task_id"], ds[0]["lean_code"][:200]) ``` ## Spec structure (example: `binary_search`) Every `.lean` file follows the same pattern: ```lean import Mathlib -- Precondition def binary_search_lower_bound_precond (seq : Array Int) (target : Int) : Prop := seq.size ≤ 0x7FFFFFFF ∧ (∀ i j : Nat, i < j ∧ j < seq.size → seq.getD i 0 ≤ seq.getD j 0) -- Implementation stub (fill this in) def binary_search_lower_bound (seq : Array Int) (target : Int) (h_precond : ...) : Nat := sorry -- Postcondition def binary_search_lower_bound_postcond (seq : Array Int) (target : Int) (result : Nat) (h_precond : ...) : Prop := result ≤ seq.size ∧ (∀ i : Nat, i < result → seq.getD i 0 < target) ∧ (∀ i : Nat, result ≤ i ∧ i < seq.size → seq.getD i 0 ≥ target) -- Prove correctness theorem binary_search_lower_bound_postcond_satisfied ... := by sorry ``` ## Citation ```bibtex @article{zhao2026algoveri, title = {AlgoVeri: An Aligned Benchmark for Verified Code Generation on Classical Algorithms}, author = {Haoyu Zhao and Ziran Yang and Jiawei Li and Deyuan He and Zenan Li and Chi Jin and Venugopal V. Veeravalli and Aarti Gupta and Sanjeev Arora}, journal = {arXiv preprint arXiv:2602.09464}, year = {2026} } ``` ## License Apache 2.0 — same as the upstream AlgoVeri repository.
提供机构:
lizn-zn
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作