CO-Bench
收藏CO-Bench数据集概述
数据集基本信息
- 名称: CO-Bench
- 类型: 语言模型代理在组合优化算法搜索中的基准测试数据集
- 论文: CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization
- 数据地址: CO-Bench
数据下载
- 下载方式: 使用
huggingface_hub库的snapshot_download方法 - 下载代码示例: python from huggingface_hub import snapshot_download snapshot_download( repo_id=CO-Bench/CO-Bench, repo_type=dataset, local_dir=data )
评估方法
-
评估代理: 支持
GreedyRefine,DirectAnswer,FunSearch,AIDE等代理 -
评估流程:
- 加载数据
- 定义代理
- 运行迭代评估
- 获取最终解决方案
-
评估代码示例: python from agents import GreedyRefine, DirectAnswer, FunSearch, AIDE from evaluation import Evaluator, get_data data = get_data(Aircraft landing, src_dir=data) agent = GreedyRefine( problem_description=data.problem_description, timeout=10, model=openai/o3-mini, ) evaluator = Evaluator(data, timeout=10) for it in range(64): code = agent.step() if code is None: break feedback = evaluator.evaluate(code) agent.feedback(feedback.dev_score, feedback.dev_feedback) code = agent.finalize() feedback = evaluator.evaluate(code) print(feedback.test_feedback)
自定义问题使用
-
步骤:
- 包含问题描述和解决模板
- 定义代理
- 定义评估函数并运行循环
-
代码示例: python problem_description = The Traveling Salesman Problem (TSP)... from agents import GreedyRefine, DirectAnswer, FunSearch, AIDE agent = GreedyRefine( problem_description=problem_description, timeout=10, model=openai/o3-mini) evaluate = ... # Define evaluate() to return score (float) and feedback (str) for it in range(64): code = agent.step() dev_score, dev_feedback = evaluate(code) agent.feedback(feedback.dev_score, feedback.dev_feedback) code = agent.finalize() print(code)
引用格式
bibtex @article{Sun2025COBench, title={CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization}, author={Weiwei Sun and Shengyu Feng and Shanda Li and Yiming Yang}, journal={ArXiv}, year={2025}, volume={abs/2504.04310}, url={https://arxiv.org/abs/2504.04310}, }




