NPHardEval
收藏arXiv2024-02-13 更新2024-06-21 收录
下载链接:
https://github.com/casmlab/NPHardEval
下载链接
链接失效反馈官方服务:
资源简介:
NPHardEval是由密歇根大学信息学院创建的数据集,包含900个算法问题,覆盖从P到NP-Hard的复杂度级别。该数据集通过精心挑选的问题,为评估大型语言模型(LLMs)的推理能力提供了严格的衡量标准。数据集每月更新,以减少模型对基准的过拟合风险,促进对LLMs推理能力的更准确和可靠评估。NPHardEval的应用领域包括优化和高级决策制定场景,旨在解决复杂问题解决任务中的挑战。
NPHardEval is a dataset developed by the School of Information, University of Michigan. It contains 900 algorithmic problems spanning complexity classes from P to NP-Hard. This dataset provides a rigorous benchmark for evaluating the reasoning capabilities of Large Language Models (LLMs) through carefully curated problems. Updated monthly, the dataset mitigates the risk of model overfitting to the benchmark, enabling more accurate and reliable evaluations of LLMs' reasoning abilities. Its application scenarios cover optimization and advanced decision-making contexts, targeting challenges in complex problem-solving tasks.
提供机构:
密歇根大学信息学院
创建时间:
2023-12-23



