LAIL

Name: LAIL
Creator: figshare
Published: 2024-07-30 09:03:41
License: 暂无描述

DataCite Commons2024-07-30 更新2024-08-19 收录

下载链接：

https://figshare.com/articles/dataset/LAIL/22014596/1

下载链接

链接失效反馈

官方服务：

资源简介：

<pre># LAIL LAIL is a Large language model-Aware selection approach for In-context-Learning-based code generation named LAIL. LAIL uses LLMs themselves to select examples. It requires LLMs themselves to label a candidate example as a positive example or a negative example for a requirement. ## Requirements - openai - tqdm - java We also privide a scripts (``/Evaluation/evaluation_setup.sh``) to help set up programming language dependencies that are used in evaluation. ```bash bash evaluation_setup.sh ``` ###### Dataset The datasets contain DevEval, MBJP, MBPP, MBCPP, and HumanEval. DevEval is a repository-level code generation dataset, which is collected from real-word code repositories. The dataset aligns with real-world code repositories in multiple dimensions. Thus, we take DevEval as the example to demonstrate how to process the dataset. Take `../Dataset/DevEval` as example: `train.jsonl` and `test.jsonl`: (1) We randomly select two domains to evaluate LAIL and baselines, including the scientific engineering domain and text processing domain. (2) We randomly split the tasks of the two domains into the training set and the test set. Finally, we acquire 101 examples in the training set and 49 examples in the test set. (3) Given a requirement from a repository, we use tree-sitter to parse the repository and acquire all functions of the repository. (4) We treat functions contained in the repository as the candidate pool. Then LAIL and baselines retrieve a few functions from the candidate pool as demonstration examples. `source data` and `test_source data` folders consist of the original code repositories collected from Github. `estimate_prompt` folder contain the constructed prompts to estimate candidate examples. `generation_prompt` folder contains the constructed prompts where the demonstration examples are selected by LAIL and different baselines. For example: (1) `ICL_LAIL` folder provides the selected examples' id in `LAIL_id` by our LAIL. Developers can directly use these provided prompts through `codellama_completion.py` to generate programs. (2) After generating programs, developers need to process generated programs with `process_generation.py`. (3) Finally, developers evaluate the generated programs with the source code in `Evaluation` folder. ###### ###### LAIL ### Estimate candidate examples by LLMs themselves We leverage LLM themseleves to estimate candidate examples. The code is storaged in the `LAIL/estimate_examples` package. Take `DevEval` as example: (1) `/Dataset/DevEval/estimate_prompt` folder contains the constructed prompts to estimate candidate examples. (2) Developers run the following command to estimate candidate examples by CodeLlama-7B: ```bash bash make_estimation_prompt.sh ../Dataset/DevEval/estimation_prompt ``` (3) According to the probability feedback of LLMs, we acquire the positive and negative examples. ### ### Train a neural retriever (1) We use the labeled positive and negative examples to train a neural retriever with contrastive learning. The code is storaged in the `/LAIL/LAIL/retriever/train` folder. ```bash export CUDA_VISIBLE_DEVICES=0 nohup python run.py \ --output_dir=/saved_models \ --model_type=roberta \ --config_name=microsoft/graphcodebert-base \ --model_name_or_path=microsoft/graphcodebert-base \ --tokenizer_name=microsoft/graphcodebert-base \ --do_train \ --train_data_file=/id.jsonl \ --epoch 100 \ --block_size 128 \ --train_batch_size 16 \ --learning_rate 1e-4 \ --max_grad_norm 1.0 \ --seed 123456 >mbpp.txt 2>&1 & ``` ## Select a few demonstration examples using the trained retriever (2) Given a test requirement, developers use the trained retriever to select a few demonstration examples. The code is storaged in the `/LAIL/LAIL/retriever/train` folder. ```bash bash run_inference.sh ../Dataset/DevEval ``` ### ### Code Generation (1) After acquired the prompt context consisting of a few selected examples, developers input a test requirement and the prompt context into LLMs and acquire desired programs. For example, developers use CodeLlama ( `../LAIL/ICL_LAIL/codellama_completion.py`) to generate programs: ```bash export CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=16665 codellama_completion.py Salesforce/CodeLlama-7b ../Dataset/DevEval/prompt_LAIL.jsonl --temperature=0.8 --max_batch_size=4 --output_base=output_random --get_logits=False ``` (2) After generating programs, developers need to process generated programs with `../LAIL/ICL_LAIL/process_generation.py`. ```bash python process_generation.py ``` ### ### Baselines This paper contains seven baselines that use different approaches to select demonstration examples for ICL_based code generation. (1) The source code is in the `baselines` folder and each baseline is in a individual folder. Developers can acquire the selected examples of all baselines by runing source code as follows: ```bash python baselines.py ``` (2) Then, developers use `/baselines/make_prompt.py` to contruct a prompt context using the selected candidate examples as follows: ```bash python make_prompt.py ICLCoder ICLCoder -1 ``` ### ### Evaluation In this paper, we use Pass@k to evaluate the performances of LAIL and baselines by the source code in `LAIL/Evaluation` Since the DevEval dataset is a repository-level code generation which is complex to evaluate, developers can use the following pipeline to evaluate different approaches by the source code in `/LAIL/Evaluation/`. ## Citation If you have any questions or suggestions, please email us at `lijiaa@pku.edu.cn`. </pre>

提供机构：

figshare

创建时间：

2024-07-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集