Test of Time 大模型时间推理能力的基准测试数据集

超神经2024-07-16 更新2024-07-13 收录

下载链接：

https://hyper.ai/cn/datasets/32816

下载链接

链接失效反馈

官方服务：

资源简介：

Test of Time，简称 ToT，是由谷歌 DeepMind 的研究人员于 2024 年推出的专门用于评估大语言模型时间推理能力的基准测试，从两个独立的维度分别考察了 LLM 的时间理解和算术能力。相关论文成果为「Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning」

Test of Time, abbreviated as ToT, is a benchmark specifically designed to evaluate the temporal reasoning capabilities of Large Language Models (LLMs). Developed by researchers from Google DeepMind and released in 2024, it assesses LLMs' temporal comprehension and arithmetic proficiency across two independent dimensions. The associated academic paper is titled "Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning".

创建时间：

2024-07-09

搜集汇总

数据集介绍

背景与挑战

背景概述

Test of Time（ToT）是由谷歌DeepMind推出的用于评估大语言模型时间推理能力的基准测试数据集，包含三个子集（ToT-semantic、ToT-arithmetic和ToT-semantic-large），分别考察时间理解和算术能力。数据集仅用于测试，禁止作为训练集使用。

以上内容由遇见数据集搜集并总结生成