Benchmarking LLMs for Self-healing Code
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14871705
下载链接
链接失效反馈官方服务:
资源简介:
FILES OVERVIEW:
JSON-files following the naming convetion below contain answers/responses generated by the LLMs.
[modelname]_[prog_language]_fixed_solutions.json
cpp.json - contains the content for each C++ solutions such as the desc, erroneous solutions, difficulty levels and so on.
java.json - contains the content for each Java solutions such as the desc, erroneous solutions, difficulty levels and so on.
cpp_prompts.json - contains the prompts for the C++ solutions sent to each model.
java_prompts.json - contains the prompts for the Java solutions sent to each model.
gpt_pipeline.py - A Python application for sending the C++ and Java prompts to GPT-models using the OpenAI API.
openrouter_pipeline.py - A Python application for sending the C++ and Java prompts to all other models using the OpenRouter API.
KEY POINTERS WHEN RE-DOING THE TESTS:
Ignore extraneous and still test the solutions, don't outright consider it as a failure.
Redefinitions (the struct given at times by the models for some problems) can be tested but likely to cause compile-time errors.
The prompts are to be combined with sys_prompts, a variable within gpt_pipeline.py and openrouter_pipeline.py. This combination is then sent to the models and the responses stored.
Keep the cpp_prompts.json and java_prompts.json in the directory.
STEPS TO REPLICATE MY PROCESS:
OpenAI process (DONT FORGET TO ADD AN API_KEY)
1. To run the GPT-models, use gpt_pipeline.py. Modify the variable model to either gpt-4o or gpt-4o-mini, as both need to be tested.
2. If starting with gpt-4o, adjust the "with open"-statement in the code accordingly. Repeat the same process for gpt-4o-mini.
3. After making the neccessary changes, execute the Python application.
4. Once execution is complete, two new files will be generated. Repeat the process for the other model (gpt-4o or gpt-4o-mini). In total, four new files will be created.
OpenRouter process (DONT FORGET TO ADD AN API_KEY)
1. To run the other models, use the openrouter_pipeline.py. Modify the variable model to one of these models (cohere/command-r-plus-08-2024, meta-llama/llama-3.2-3b-instruct, google/gemini-flash-1.5-8b, x-ai/grok-beta, anthropic/claude-3.5-haiku, anthropic/claude-3.5-sonnet, mistralai/mistral-nemo).
2. Change the name of the variables at the top cpp_filename and java_filename to the models names.
3. Then execute the openrouter app and change for each of the eight different model. This will result in the creation of 16 different files containing the responses of the models.
TESTING THE FIXED SOLUTIONS:
1. Find the programming problems corresponding to the fixed solutions on LeetCode.
2. If the solution contains extraneous text, remove it to ensure it can be tested properly.
3. Copy and paste the entire solution into LeetCode's built-in environment.
4. Submit the solution.
5. If the submission is successful (no run-time errors or compile-time errors), document the outcome in a document of your choice. I used Google Sheets to track the results for each model.
If the submission fails, document that as well.
6. Repeat this process for all 20 different files.
创建时间:
2025-02-16



