five

Maniple

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10853003
下载链接
链接失效反馈
官方服务:
资源简介:
Maniple This repository contains code, scripts and data necessary to reproduce the paper "The Fact Selection Problem in LLM-Based Program Repair". Installation Before installing the project, ensure you have the following prerequisites installed on your system: - Python version 3.10 or higher. Follow these steps to install and set up the project on your local machine: cd maniple python3 -m pip install . Structure of Directories The project is organized into several directories, each serving a specific purpose: data/ # Training and testing datasets BGP32.zip/ # Sampled 32 bugs from the BugsInPy dataset black/ # The bug project folder 10/ # The bug ID folder 100000001/ # The bitvector used for prompting prompt.md # The prompt used for this bitvector response_1.md # The response from the model response_1.json # The response in JSON format response_1.patch # The response in patch format result_1.json # Testing result ... BGP32-without-cot.zip # GPT response for 32 bugs without CoT prompting BGP314.zip # 314 bugs from the BugsInPy dataset BGP157Ply1-llama3-70b.zip # experiment with llama3 model on BGP157Ply1 dataset BGP32-permutation.zip # permutation experiment on BGP32 dataset maniple/ # Scripts for getting facts and generate prompts strata_based/ # Scripts for generating prompts utils/ # Utility functions metrics/ # Scripts for calculating metrics for dataset patch_correctness_labelling.xlsx # The labelling of patch correctness experiment.ipynb # Jupyter notebook for training models experiment-initialization-resources/ # Contains raw facts for each bug bug-data/ # row facts for each bug ansible/ # Bug project folder 5/ # Bug ID folder bug-info.json # Metadata for the bug facts_in_prompt.json # Facts used in the prompt processed_facts.json # Processed facts external_facts.json # GitHub issues for this bug static-dynamic-facts.json # Static and dynamic facts ... datasets-list/ # Subsets from BugsInPy dataset strata-bitvector/ # Debugging information for bitvectors Steps to Reproduce the Experiments Please follow the steps below sequentially to reproduce the experiments on 314 bugs in BugsInPy with our bitvector based prompt Prepare the Dataset The CLI scripts under the `maniple` directory provide useful commands to download and prepare environments for each bug. To download and prepare environments for each bugs, you can use the `prep` command. maniple prep --dataset 314-dataset This script will automatically download all 314 bugs from GitHub, create a virtual environment for the bug and install the necessary dependencies. Fact Extraction Then you can extract facts from the bug data using the `extract` command as follows: maniple extract --dataset 314-dataset --output-dir data/BGP314 This script will extract facts from the bug data and save them in the specified output directory. You can find all extracted facts under the `experiment-initialization-resources/bug-data` directory. Generate Bitvector Specific Prompts and Responses First, you need to generate bitvector for the facts. The 128 bitvector for our paper can be generated by the following command. python3 -m maniple.strata_based.fact_bitvector_generator You can customize your bitvectors, they should be put under `experiment-initialization-resources/strata-bitvectors` directory. You can refer the example bitvector format used for our paper. To reproduce our experiment prompt and response, please use the command below, and replace with your own key. # On Linux/macOS: export OPENAI_API_KEY= # On windows: setx OPENAI_API_KEY python3 -m maniple.strata_based.prompt_generator --database BGP314 --partition 10 --start_index 1 --trial 15 Again, you can build your own customize prompt with customize bitvector using our extracted facts. Above is only for reproducing our prompt and response. This script will generate prompts and responses for all 314 bugs in the dataset by enumerating all possible bitvectors according to current strata design specified in `maniple/strata_based/fact_strata_table.json`. By specifying `--trial 15`, the script will generate 15 responses for each prompt. And by specifying `--partition 10` the script will start 10 threads to speed up the process. Testing Generated Patches Please use following command: maniple validate --output-dir data/BGP314 This script will validate the generated patches for the specified bug and save the results in the specified output directory. The test comes from the developer's fix commit. Contributing Contributions to this project are welcome! Please submit a PR if you find any bugs or have any suggestions.   License This project is licensed under the MIT - see the LICENSE file for details.
创建时间:
2024-08-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作