five

elohn/miniCodeProps

收藏
Hugging Face2024-06-13 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/elohn/miniCodeProps
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: bsd-3-clause language: - en tags: - croissant size_categories: - n<1K configs: - config_name: default data_files: - split: test path: codeprops_bench_ps.jsonl --- # Getting Started First install [Lean 4](https://leanprover-community.github.io/get_started.html). Then clone this repo: `git clone --recurse-submodules https://huggingface.co/datasets/elohn/miniCodeProps` The outer LeanSrc folder is a [Lean Project](https://leanprover-community.github.io/install/project.html). You can open that folder directly in VSCode and check that the proofs in `LeanSrc/Sorts.lean` type check after following the instructions for working on an existing lean project in the Lean 4 documentation. The main miniCodeProps folder handles extracting the benchmark and calculating baselines. If anything fails when building Lean or running `lake exe cache get` from LeanSrc, the [Zulip Chat](https://leanprover.zulipchat.com/) is the best resource for troubleshooting. After cloning the repo, you will need to install [Lean REPL](https://github.com/leanprover-community/repl). By default, our scripts expect the `repl` folder to be directly inside the miniCodeProps folder. run `lake build` from within the `repl` folder. The `extract.py` script is used only to create the json-formatted benchmark. The `baseline.py` script contains the code we used to get our baseline results. It shows how to interact with Lean Repl programmatically, although some interactions are still somewhat buggy in that the repl will send i.e. an extra newline or weirdly formatted message that requires our script to restart the repl. Regardless, if you would like to use our setup, We ran our baselines using [LLMStep](https://github.com/wellecks/llmstep). However, our code also includes a natural place to write your own function to generate tactics given the goal and file context (see `get_tactics_llmstep` in `baseline.py`). We [modified the LLMStep server](https://github.com/evanlohn/llmstep) to return average suggestion log-probabilities per suggestion to implement best-first search. # Reproducing Baselines First, ensure that you have installed Lean and Lean REPL as detailed above. Before running `baseline.py` with any arguments, check that your OS has been set at the top of `utils.py`. At the moment we support interacting with Lean in MacOS and Ubuntu (20.04). ## Next-Step Baselines Our experiments were run on an A100 GPU. Smaller GPUs may not be able to run Llemma7B, but will likely work with Pythia and ntp-context. Clone [our fork of LLMStep](https://github.com/evanlohn/llmstep). After following the LLMStep setup instructions, - For Pythia2.8B, run `python3 python/server_vllm.py` (or, if CPU-bound, run `python3 python/server.py`) - For Llemma7B, run `python3 python/server_llemma.py` - For ntp-context-1.3B, run `python3 python/server_context.py` In another terminal, run `python baseline.py --bench_type nextstep` ## Full-Proof Baseline run `export OPENAI_API_KEY=<your key here>`. Then, simply run `python3 baseline.py` You can also specify which openai LLM to use for proof generation via `python3 baseline.py --gpt_model <your model name>` although our tests only used gpt-4-turbo.
提供机构:
elohn
原始信息汇总

数据集概述

基本信息

  • 许可证: BSD-3-Clause
  • 语言: 英语
  • 标签: croissant
  • 大小类别: 小于1KB

配置详情

  • 配置名称: default
  • 数据文件:
    • 分割: test
    • 路径: codeprops_bench_ps.jsonl
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作