TableBench
收藏数据集卡片 - TableBench
数据集概述
TableBench 是一个数据集,涵盖 4 个主要类别和 18 个子类别,专注于表格问答的多维度能力。
数据字段
| ID | 字段 | 描述 |
|---|---|---|
| id | 字符串 | 唯一标识符 |
| qtype | 字符串 | 问题类型(FactChecking, NumericalReasoning, DataAnalysis, Visualization) |
| qsubtype | 字符串 | 问题子类型 |
| instruction | 字符串 | 提示LLM的指令 |
| instruction_type | 字符串 | TableBench中的三种不同指令类型:TCoT(文本思维链)、SCoT(符号思维链)和PoT(思维程序) |
| table | 字符串 | 表格 |
| question | 字符串 | 问题 |
| answer | 字符串 | 答案 |
| answer_formatter | 字符串 | 答案输出格式的约束 |
数据示例
一个 validation 示例如下: json { "id": "60670a8d9b1e39dd845fb1639d0d8b86", "qtype": "DataAnalysis", "qsubtype": "StatisticalAnalysis", "instruction": "You are a data analyst proficient in Python ...", "instruction_type": "PoT", "table": "{"columns": ["rank", "circuit", "headquarters", "screens", "sites"], "data": [[1, "regal entertainment group", "knoxville , tn", 7367, 580], [2, "amc entertainment inc", "kansas city , mo", 5894, 483], [3, "cinemark theatres", "plano , tx", 3895, 298], [4, "carmike cinemas , inc", "columbus , ga", 2242, 232], [5, "cineplex entertainment", "toronto , on", 1438, 133], [6, "rave motion pictures", "dallas , tx", 939, 62], [7, "marcus theatres", "milwaukee , wi", 687, 55], [8, "national amusements", "dedham , ma", 450, 34], [9, "empire theatres", "stellarton , ns", 438, 53]]}", "question": "Can you calculate the standard deviation of the number of screens operated by the top 5 movie theater chains?", "answer": "2472.33", "answer_formatter": "The generated Python code should follow the format below, and ensure the first two code lines is exactly the same with the following code block: [Python Code Format] python import pandas as pd df = pd.read_csv(table.csv) ... print(fFinal Answer: {answer})
Ensure the final answer is the last line in python code and can only be in the "print(fFinal Answer: {answer})" form, no other from. Ensure variable "answer" can only be "AnswerName1, AnswerName2..." form, no other form, and "AnswerName" can only be a number or entity name, as short as possible, without any explanation." }
数据使用
- 如果你想直接评估LLMs在表格数据上的能力,可以使用
TableBench-PoT、TableBench-SCoT和TableBench-TCoT直接评估模型的能力。 - 如果你希望自定义评估提示方法,请遵循
answer_formatter中的规范,以减少因自由形式答案不一致导致的评估错误。




