inverse-scaling-ttc/inverse-scaling-ttc-main
收藏Hugging Face2025-07-23 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/inverse-scaling-ttc/inverse-scaling-ttc-main
下载链接
链接失效反馈官方服务:
资源简介:
Inverse Scaling in Test-Time Compute数据集是一个用于评估大型推理模型在不同推理长度下性能的数据集。它包含多种任务,如简单计数任务、回归任务、推理任务以及高级AI风险相关的任务。该数据集旨在研究推理时间增加时,模型性能是否会出现下降,并揭示了五种模型在推理时间延长时可能出现的失败模式。
Inverse Scaling in Test-Time Compute dataset is designed for evaluating the performance of Large Reasoning Models (LRMs) at different lengths of reasoning. It includes various tasks such as simple counting tasks, regression tasks, deduction tasks, and advanced AI risk-related tasks. The dataset aims to investigate whether the performance of models decreases as reasoning time increases and reveals five distinct failure modes that may occur when models extend their reasoning time.
提供机构:
inverse-scaling-ttc



