Time referenced List based Question Answering (TLQA)

arXiv2025-09-30 收录

下载链接：

https://github.com/elixir-research-group/TLQA

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是TLQA基准，它要求回答以列表形式呈现，并与相应的时间段对齐，以此来评估大型语言模型在列表构建和时间理解方面的能力。该基准涵盖了黄金证据、闭卷和开放领域三种评估设置，重点关注不同大型语言模型的表现。任务旨在评估大型语言模型在时间理解以及列表构建方面的能力。

This dataset is the TLQA benchmark, which requires answers to be presented in list format and aligned with their corresponding time periods, aiming to evaluate the capabilities of large language models (LLMs) in list construction and temporal understanding. The benchmark covers three evaluation settings: gold evidence, closed-book, and open-domain, with a focus on the performance of various large language models. The task aims to assess the temporal understanding and list construction abilities of large language models.

5,000+

优质数据集

54 个

任务类型

进入经典数据集