five

zacCMU/dbench_opt_chall

收藏
Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/zacCMU/dbench_opt_chall
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含机械工程编码挑战,旨在测试大型语言模型(LLMs)在解决特定物理和工程问题时生成功能性Python代码的能力。数据集的结构包括多个字段,如domain(机械工程的特定领域,例如linkages、truss design)、problem(需要实现的Python函数的详细描述)、difficulty(问题的难度级别,如easy、medium、hard、expert)和test_suite_code(用于评估LLM生成代码的完整且有效的Python unittest套件)。数据集的预期用途是作为基准测试(类似于HumanEval或MBPP),但专注于机械工程逻辑、数学和优化问题。数据生成方式是通过高级LLMs(通过OpenRouter)生成的问题和相应的测试套件。

This dataset contains mechanical engineering coding challenges designed to test Large Language Models (LLMs) on their ability to generate functional Python code to solve specific physics and engineering problems. Each row in the dataset represents a unique problem and contains fields such as domain (the specific area of mechanical engineering, e.g., linkages, truss design), problem (a detailed description of the Python function the LLM needs to implement), difficulty (the estimated difficulty level of the problem, e.g., easy, medium, hard, expert), and test_suite_code (a complete, valid Python unittest suite that can be used to evaluate the LLM-generated code). The dataset is intended to be used as a benchmark (like HumanEval or MBPP) but specifically focused on mechanical engineering logic, math, and optimization problems. The problems and their corresponding test suites were generated using advanced LLMs (via OpenRouter).
提供机构:
zacCMU
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作