umd-zhou-lab/TSRBench
收藏Hugging Face2026-01-27 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/umd-zhou-lab/TSRBench
下载链接
链接失效反馈官方服务:
资源简介:
TSRBench是一个大规模、全面的基准测试,旨在测试通用模型(如LLMs、VLMs和TSLLMs)的时间序列理解和推理能力。时间序列数据在现实环境中广泛存在,并在金融、医疗和工业系统等高风险领域的决策中起关键作用。然而,现有的基准测试往往将时间序列视为孤立的数值序列,剥离了复杂问题解决所需的语义背景,或仅关注表面模式识别。TSRBench不仅是一个基准测试,还是一个多方面的标准化评估平台,不仅能揭示当前时间序列推理的挑战,还能提供推动时间序列推理边界的可行见解。数据集包含超过4,000个时间序列-文本问题,覆盖了多种场景和实际挑战,分为4个类别和15个任务,包括感知(感知、推理、预测、决策)、推理(溯因推理、数值推理、演绎推理等)、预测(时间序列预测、事件预测)和决策(定性决策、定量决策)。数据集字段包括问题、答案、领域、系列名称、时间序列数据和选择项。
TSRBench is a large-scale, comprehensive benchmark designed to stress-test the time series understanding and reasoning capabilities of generalist models (LLMs, VLMs, and TSLLMs). Time series data pervades real-world environments and underpins decision-making in high-stakes domains like finance, healthcare, and industrial systems. However, existing benchmarks often treat time series as isolated numerical sequences, stripping away the semantic context essential for complex problem-solving, or focusing solely on surface-level pattern recognition. TSRBench is more than a benchmark—it’s a multifaceted, standardized evaluation platform that not only uncovers the current challenges in time series reasoning but also provides actionable insights to push the boundaries of time series reasoning. It provides more than 4,000 timeseries-text questions covering diverse scenarios and practical challenges for generalist models, and 4 categories and 15 tasks for various time series reasoning capabilities evaluation, including Perception (Perception, Reasoning, Prediction, Decision-Making), Reasoning (Abductive Reasoning, Numerical Reasoning, Deductive Reasoning, and more), Prediction (Time Series Forecasting, Event Prediction), and Decision-Making (Qualitative Decision-Making, Quantitative Decision-Making). The dataset fields include question, answer, domain, name_of_series, timeseries, and choices.
提供机构:
umd-zhou-lab



