JIUTIAN-TReB|表格推理数据集|自然语言处理数据集

魔搭社区2025-09-15 更新2025-06-28 收录

表格推理

自然语言处理

下载链接：

https://modelscope.cn/datasets/JiuTian-AI/JIUTIAN-TReB

下载链接

链接失效反馈

资源简介：

# Dataset Summary TReB is a comprehensive, multi-dimensional and hierarchical evaluation dataset designed to evaluate the performance of large models in table reasoning, comprehension and processing. It contains 7,790 high-quality test cases, spanning the complete capability spectrum from fundamental language understanding to advanced data analysis, including 6 core skills with 26 subtasks. **We recommend reading the [paper](http://arxiv.org/abs/2506.18421) for more background on task significance.** # Dataset Description ## Supported Tasks TReB covers 6 core skills: * **Natural Language Understanding (NLU)**: Evaluates foundational language capabilities across 6 subtasks, including Understanding, Instruction Following, Hallucination Evaluation, Robustness Evaluation, Code Generation and Mathematical Reasoning. * **Table Understanding (TU)**: Assesses the ability to parse table structures and comprehend full or partial table content across 6 subtasks, including Table Retrieval, Table Summary, Table Column Naming, Table Title Naming, Table Fact Checking and Table Plausibility Verification. * **Table Basic Operation (TBO)**: Measures the accuracy of mapping natural language intents to fundamental table operations , with 2 subtasks: Table Query and Table Selection. * **Table Computational Operation (TCO)**: Tests the ability to execute complex computational procedures in table reasoning scenarios, through two key subtasks: Table General Operations and Table Domain-specific Operations. * **Data Analysis (DA)**：Focuses on statistical analysis and pattern recognition in table reasoning scenarios across 4 subtasks, including Table Outlier Detection, Table Correlation Analysis, Table Hypothesis Testing, and Table Distribution Testing. * **Advanced Data Analysis (ADA)**: Targets multi-step analytical reasoning (≥3 steps) across 6 subtasks, including Multi-step Retrieval, Multi-step Fact Checking, Multi-step Operations, Multi-step Correlation Analysis, Multi-step Hypothesis Testing and Multi-step Conditional Calculation. ## Languages The Code Generation task and Instruction Following task are only available in English. Other tasks in TReB have both Chinese and English, organized within their respective directories. # Dataset Configurations ## Data Fields TReB consists of two folders, Chinese and English, each containing 26 JSON files named after their corresponding subtasks. All files follow a unified schema with consistent field definitions as described below. | ID | String | Description | | :------------: | :----------: | :----------------------------------------------------------: | | id | String | Task name | | file_path | List[String] | Relative file path | | title | List[String] | Table title | | columnslable | List[String] | Number of rows in the table header (e.g., 1 for single-row , 2 for two-row header, etc.) | | Table_markdown | List[String] | Table format in markdown | | Table_html | List[String] | Table format in html | | instruction | String | Instruction to prompt LLM | | question | String | Question | | answer | String | Answer | | number_answer | Float | The answer can be represented by a single numerical value | ## Data Example ```json { "id": "Multi-step_Operations|e2df19fcaf6d405", "file_path": [ "./CSV/298f1e69f367431.csv" ], "instruction": "Please answer user questions based on the table.", "question": "What is the total sum of the insurance fees in the records where the insurance name contains \"Liability Insurance\", the insurer is \"PICC Property & Casualty\" or \"Ping An Insurance\", the premium exceeds 800 CNY, and the policy effective date is on or after January 1, 2022?", "answer": "3000", "title": [ "The title of this table is \"Vehicle mileage Vehicle mileage\"." ], "columnslable": [ 1 ], "Table_markdown": [ "||车辆行驶里程|保险名称|保险公司|保险费用|保险起期|保险止期|\n|---:|---------:|:---------|:-------|-------:|:-----------|:-----------|\n|0|10000|车损险|平安保险|2000|2022-01-01|2023-01-01|\n|1|20000|第三者责任险|中国人保|1000|2022-01-01|2023-01-01|\n|2|30000|全车盗抢险|太平洋保险|500|2022-01-01|2023-01-01|\n|3|40000|玻璃单独破碎险|太平保险|300|2022-01-01|2023-01-01|\n|4|50000|车上人员责任险|中国平安|400|2022-01-01|2023-01-01|\n|5|60000|不计免赔险|中国太平|600|2022-01-01|2023-01-01|\n|6|70000|自燃损失险|人保财险|200|2022-01-01|2023-01-01|\n|7|80000|发动机特别损失险|阳光保险|400|2022-01-01|2023-01-01|\n|8|90000|车身划痕损失险|泰山保险|300|2022-01-01|2023-01-01|\n|9|100000|不计免赔特约险|中华保险|600|2022-01-01|2023-01-01|\n|10|110000|车损险|平安保险|2000|2022-01-01|2023-01-01|\n|11|120000|第三者责任险|中国人保|1000|2022-01-01|2023-01-01|\n|12|130000|全车盗抢险|太平洋保险|500|2022-01-01|2023-01-01|\n|13|140000|玻璃单独破碎险|太平保险|300|2022-01-01|2023-01-01|\n|14|150000|车上人员责任险|中国平安|400|2022-01-01|2023-01-01|\n|15|160000|不计免赔险|中国太平|600|2022-01-01|2023-01-01|\n|16|170000|自燃损失险|人保财险|200|2022-01-01|2023-01-01|\n|17|180000|发动机特别损失险|阳光保险|400|2022-01-01|2023-01-01|\n|18|190000|车身划痕损失险|泰山保险|300|2022-01-01|2023-01-01|\n|19|200000|不计免赔特约险|中华保险|600|2022-01-01|2023-01-01|\n|20|210000|车损险|平安保险|2000|2022-01-01|2023-01-01|\n|21|220000|第三者责任险|中国人保|1000|2022-01-01|2023-01-01|\n|22|230000|全车盗抢险|太平洋保险|500|2022-01-01|2023-01-01|\n|23|240000|玻璃单独破碎险|太平保险|300|2022-01-01|2023-01-01|\n|24|250000|车上人员责任险|中国平安|400|2022-01-01|2023-01-01|\n|25|260000|不计免赔险|中国太平|600|2022-01-01|2023-01-01|\n|26|270000|自燃损失险|人保财险|200|2022-01-01|2023-01-01|\n|27|280000|发动机特别损失险|阳光保险|400|2022-01-01|2023-01-01|\n|28|290000|车身划痕损失险|泰山保险|300|2022-01-01|2023-01-01|\n|29|300000|不计免赔特约险|中华保险|600|2022-01-01|2023-01-01|" ], "Table_html": [ "车辆行驶里程保险名称保险公司保险费用保险起期保险止期010000车损险平安保险20002022-01-012023-01-01120000第三者责任险中国人保10002022-01-012023-01-01230000全车盗抢险太平洋保险5002022-01-012023-01-01340000玻璃单独破碎险太平保险3002022-01-012023-01-01450000车上人员责任险中国平安4002022-01-012023-01-01560000不计免赔险中国太平6002022-01-012023-01-01670000自燃损失险人保财险2002022-01-012023-01-01780000发动机特别损失险阳光保险4002022-01-012023-01-01890000车身划痕损失险泰山保险3002022-01-012023-01-019100000不计免赔特约险中华保险6002022-01-012023-01-0110110000车损险平安保险20002022-01-012023-01-0111120000第三者责任险中国人保10002022-01-012023-01-0112130000全车盗抢险太平洋保险5002022-01-012023-01-0113140000玻璃单独破碎险太平保险3002022-01-012023-01-0114150000车上人员责任险中国平安4002022-01-012023-01-0115160000不计免赔险中国太平6002022-01-012023-01-0116170000自燃损失险人保财险2002022-01-012023-01-0117180000发动机特别损失险阳光保险4002022-01-012023-01-0118190000车身划痕损失险泰山保险3002022-01-012023-01-0119200000不计免赔特约险中华保险6002022-01-012023-01-0120210000车损险平安保险20002022-01-012023-01-0121220000第三者责任险中国人保10002022-01-012023-01-0122230000全车盗抢险太平洋保险5002022-01-012023-01-0123240000玻璃单独破碎险太平保险3002022-01-012023-01-0124250000车上人员责任险中国平安4002022-01-012023-01-0125260000不计免赔险中国太平6002022-01-012023-01-0126270000自燃损失险人保财险2002022-01-012023-01-0127280000发动机特别损失险阳光保险4002022-01-012023-01-0128290000车身划痕损失险泰山保险3002022-01-012023-01-0129300000不计免赔特约险中华保险6002022-01-012023-01-01" ], "number_answer": 3000.0 } ``` ## Data Usage To evaluate your method on TReB, you can use the evaluation tools provided in our [Gitee repository](https://gitee.com/CMCC-jiutian/jiutian-treb). # Dataset Creation ## Curation Rationale TReB is designed to facilitate rigorous evaluation of large models’ table reasoning and analysis capabilities. ## Source Data TReB data is drawn from two sources. The first source involves collecting and organizing 39 existing datasets. Most of these datasets do not meet the evaluation requirements in terms of format and quality. To address this, we perform table cleaning, question-answer pair cleaning, and classification on the collected data. The second source comprises previously unpublished datasets, primarily generated by professionals through three data augmentation strategies: (1) Rule-based Data Generation, (2) End-to-End Generation with LLMs, and (3) Multi-Turn Adversarial Generation with LLMs. Please see the paper for more details. ## Who are the source language producers? TReB data was human-generated; however, demographic information of the contributors is unavailable. # Citation ``` @misc{li2025trebcomprehensivebenchmarkevaluating, title={TReB: A Comprehensive Benchmark for Evaluating Table Reasoning Capabilities of Large Language Models}, author={Ce Li and Xiaofan Liu and Zhiyan Song and Ce Chi and Chen Zhao and Jingjing Yang and Zhendong Wang and Kexin Yang and Boshen Shi and Xing Wang and Chao Deng and Junlan Feng}, year={2025}, eprint={2506.18421}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.18421}, }

提供机构：

maas

创建时间：

2025-06-19

用户留言

有没有相关的论文或文献参考？

这个数据集是基于什么背景创建的？

数据集的作者是谁？

能帮我联系到这个数据集的作者吗？

这个数据集如何下载？

点击留言

数据主题

具身智能

数据集 4099个

机构 8个

大模型

数据集 439个

机构 10个

无人机

数据集 37个

机构 6个

指令微调

数据集 36个

机构 6个

蛋白质结构

数据集 50个

机构 8个

空间智能

数据集 21个

机构 5个

5,000+

优质数据集

54 个

任务类型

进入经典数据集

热门数据集

中国近海台风路径集合数据集(1945-2024)

1945-2024年度，中国近海台风路径数据集，包含每个台风的真实路径信息、台风强度、气压、中心风速、移动速度、移动方向。数据源为获取温州台风网(http://www.wztf121.com/)的真实观测路径数据，经过处理整合后形成文件，如使用csv文件需使用文本编辑器打开浏览，否则会出现乱码，如要使用excel查看数据，请使用xlsx的格式。

国家海洋科学数据中心收录

中国陆域及周边逐日1km全天候地表温度数据集（TRIMS LST；2000-2024）

地表温度（Land surface temperature, LST）是地球表面与大气之间界面的重要参量之一。它既是地表与大气能量交互作用的直接体现，又对于地气过程具有复杂的反馈作用。因此，地表温度不仅是气候变化的敏感指示因子和掌握气候变化规律的重要前提，还是众多模型的直接输入参数，在许多领域有广泛的应用，如气象气候、环境生态、水文等。伴随地学及相关领域研究的深入和精细化，学术界对卫星遥感的全天候地表温度（All-weather LST）具有迫切的需求。本数据集的制备方法是增强型的卫星热红外遥感-再分析数据集成方法。方法的主要输入数据为Terra/Aqua MODIS LST产品和GLDAS等数据，辅助数据包括卫星遥感提供的植被指数、地表反照率等。方法充分利用了卫星热红外遥感和再分析数据提供的地表温度高频分量、低频分量以及地表温度的空间相关性，最终重建得到较高质量的全天候地表温度数据集。评价结果表明，本数据集具有良好的图像质量和精度，不仅在空间上无缝，还与当前学术界广泛采用的逐日1 km Terra/Aqua MODIS LST产品在幅值和空间分布上具有较高的一致性。当以MODIS LST为参考时，该数据集在白天和夜间的平均偏差（MBE）为0.09K和-0.03K，偏差标准差（STD）为1.45K和1.17K。基于19个站点实测数据的检验结果表明，其MBE为-2.26K至1.73K，RMSE为0.80K至3.68K，且在晴空与非晴空条件下无显著区别。本数据集的时间分辨率为逐日4次，空间分辨率为1km，时间跨度为2000年-2024年；空间范围包括我国陆域的主要区域（包含港澳台地区，暂不包含我国南海诸岛）及周边区域（72°E-135°E，19°N-55°N）。本数据集的缩写名为TRIMS LST（Thermal and Reanalysis Integrating Moderate-resolution Spatial-seamless LST），以便用户使用。需要说明的是，TRIMS LST的空间子集TRIMS LST-TP（中国西部逐日1 km全天候地表温度数据集（TRIMS LST-TP；2000-2024）V2）同步在国家青藏高原科学数据中心发布，以减少相关用户数据下载和处理的工作量。

国家青藏高原科学数据中心收录

Hang Seng Index

恒生指数（Hang Seng Index）是香港股市的主要股票市场指数，由恒生银行旗下的恒生指数有限公司编制。该指数涵盖了香港股票市场中最具代表性的50家上市公司，反映了香港股市的整体表现。

www.hsi.com.hk 收录

Materials Project 在线材料数据库

Materials Project 是一个由伯克利加州大学和劳伦斯伯克利国家实验室于 2011 年共同发起的大型开放式在线材料数据库。这个项目的目标是利用高通量第一性原理计算，为超过百万种无机材料提供全面的性能数据、结构信息和计算模拟结果，以此加速新材料的发现和创新过程。数据库中的数据不仅包括晶体结构和能量特性，还涵盖了电子结构和热力学性质等详尽信息，为研究人员提供了丰富的材料数据资源。相关论文成果为「Commentary: The Materials Project: A materials genome approach to accelerating materials innovation」。

超神经收录

QM9

QM9数据集包含134k个有机小分子化合物的量子化学计算结果，涵盖了12个量子化学性质，如分子能量、电离能、电子亲和能等。

quantum-machine.org 收录