five

zjunlp/PredictBeforeExecute

收藏
Hugging Face2026-03-10 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/zjunlp/PredictBeforeExecute
下载链接
链接失效反馈
官方服务:
资源简介:
# Can We Predict Before Executing Machine Learning Agents? -- Data <h4 align="center"> <a href="https://arxiv.org/abs/2601.05930" target="_blank">📄Paper</a> • <a href="https://arxiv.org/abs/2601.05930" target="_blank">🛠️Code</a> • <a href="https://huggingface.co/papers/2601.05930" target="_blank">🤗HFPaper</a> • <a href="https://drive.google.com/drive/folders/1rn3GuRcl-BrnPG2xUJYCOJB-BwGp7bp0?usp=sharing" target="_blank">📦Data & Runtime (GoogleDrive)</a> • <a href="https://x.com/zxlzr/status/2010603724931285141" target="_blank">𝕏Blog</a> • <a href="http://xhslink.com/o/8Ac0jDoHeyw" target="_blank">📕小红书</a> </h4> This project studies **Data-centric Solution Preference**—predicting which ML solution will perform better *before* executing it—by leveraging data analysis context and LLM reasoning. The repository provides curated solution corpora, task resources, agent-run outputs, and analysis artifacts to support both the main evaluation and follow-up studies. This directory is the **central data workspace** for the project. It contains the full solution corpus, experiment subsets, agent-run outputs, analysis artifacts, task resources, and a cached Docker image. --- ## Top-level contents - [solutions_all/](solutions_all/) The **full solution corpus** provided by us (all available solutions). This is the source pool from which all subsets are sampled. - [solutions_subset_50/](solutions_subset_50/) The **main-experiment subset**, capped at **50 solutions per task** (used in the paper’s primary experiments). - [solutions_subset_15/](solutions_subset_15/) The **analysis subset**, sampled from `solutions_subset_50/`, capped at **15 solutions per task** (used for downstream analysis experiments). - [agent_runs/](agent_runs/) Outputs from agent executions. Subfolders include: - [agent_runs/AIDE/](agent_runs/AIDE/) — AIDE-generated runs (task-name + UUID per run). - [agent_runs/ForeAgent/](agent_runs/ForeAgent/) — ForeAgent-generated runs (task-name + UUID per run). - For detailed per-run structure and where to find trajectories/logs, see [agent_runs/README.md](agent_runs/README.md). - [analysis_exp/](analysis_exp/) Analysis experiment artifacts (RQ1–RQ4). See its README for details. - [tasks/](tasks/) The **shared data hub** for competitions, prepared data, task descriptions, data analysis reports, and task lists. See its README for details. - [docker_images/](docker_images/) Cached Docker images used by the execution pipeline. - [docker_images/predict-before-execute.tar](docker_images/predict-before-execute.tar) — a prebuilt image archive matching the base image referenced in the Dockerfile. - [2601.05930v1.pdf](2601.05930v1.pdf) A local copy of the paper PDF. --- ## Docker image note (for the execution pipeline) The Dockerfile in prepare_bench_subset/env/Dockerfile uses: - `FROM johnsonzheng03/predict-before-execute` If pulling this base image directly is slow or unstable, you can load the cached image tarball from `docker_images/` instead. This extraction may take **a long time** depending on disk and Docker performance. Command: ``` docker load -i path/to/predict-before-execute.tar ``` --- ## Solutions directories (shared layout) The three `solutions_*` directories share the **same internal layout**. Each task folder typically looks like: ``` solutions_root/ <task_name>/ annotation/ annotations_semantic.json keywords_by_rank.json code/ solution_*.py submission_solution_*/ eval_output.json exec_output.txt submission.csv ground_truth/ groups_<task_name>_n*.json output/ output_*.txt report/ alignment_*.json grade_report_*.txt ``` Each task folder contains: - `annotation/` - `annotations_semantic.json`: per-solution semantic labels used for subset sampling and analysis. - `keywords_by_rank.json`: aggregated keyword statistics by rank. - `code/` - `solution_*.py`: the runnable solution files. - `submission_solution_*/`: execution artifacts for each solution (created after running). - `submission.csv`: the model’s predicted submission. - `exec_output.txt`: execution logs / stdout+stderr. - `eval_output.json`: grading results (if already evaluated). - `ground_truth/` - `groups_<task_name>_n*.json`: ground-truth comparison groups for evaluation. - `output/` - `output_*.txt`: optional runtime or extraction logs. - `report/` - `grade_report_*.txt`: human-readable grading reports. - `alignment_*.json`: alignment artifacts derived from reports. This is the canonical layout used by our preparation, grading, and analysis scripts. --- ## Where to find what (quick locator) - **Main experiment solutions/logs**: [solutions_subset_50/](solutions_subset_50/) - **Analysis experiment solutions/logs**: [solutions_subset_15/](solutions_subset_15/) - **Full corpus (all solutions)**: [solutions_all/](solutions_all/) - **Agent trajectories and logs**: [agent_runs/](agent_runs/) (details in [agent_runs/README.md](agent_runs/README.md)) - **Analysis experiment artifacts (RQ1–RQ4)**: [analysis_exp/](analysis_exp/) - **Task resources (competition configs, prepared data, descriptions, data analysis)**: [tasks/](tasks/)
提供机构:
zjunlp
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作