Multilingual-Multimodal-NLP/TableBench
收藏数据集概述
数据集总结
TableBench 是一个涵盖 4 个主要类别和 18 个子类别的数据集,专注于表格问答的多维度能力。
数据字段
| ID | 类型 | 描述 |
|---|---|---|
| id | string | 唯一标识符 |
| qtype | string | 问题类型(FactChecking, NumericalReasoning, DataAnalysis, Visualization) |
| qsubtype | string | 问题子类型 |
| instruction | string | 提示 LLM 的指令 |
| instruction_type | string | TableBench 中的三种不同指令类型:TCoT(Textual Chain of Thought), SCoT(Symbolic Chain of Thought) 和 PoT(Program of Thought) |
| table | string | 表格 |
| question | string | 问题 |
| answer | string | 答案 |
| answer_formatter | string | 答案输出格式的约束 |
数据示例
一个 validation 的示例如下:
json { "id": "60670a8d9b1e39dd845fb1639d0d8b86", "qtype": "DataAnalysis", "qsubtype": "StatisticalAnalysis", "instruction": "You are a data analyst proficient in Python ...", "instruction_type": "PoT", "table": "{"columns": ["rank", "circuit", "headquarters", "screens", "sites"], "data": [[1, "regal entertainment group", "knoxville , tn", 7367, 580], [2, "amc entertainment inc", "kansas city , mo", 5894, 483], [3, "cinemark theatres", "plano , tx", 3895, 298], [4, "carmike cinemas , inc", "columbus , ga", 2242, 232], [5, "cineplex entertainment", "toronto , on", 1438, 133], [6, "rave motion pictures", "dallas , tx", 939, 62], [7, "marcus theatres", "milwaukee , wi", 687, 55], [8, "national amusements", "dedham , ma", 450, 34], [9, "empire theatres", "stellarton , ns", 438, 53]]}", "question": "Can you calculate the standard deviation of the number of screens operated by the top 5 movie theater chains?", "answer": "2472.33", "answer_formatter": "The generated Python code should follow the format below, and ensure the first two code lines is exactly the same with the following code block: [Python Code Format] python import pandas as pd df = pd.read_csv(table.csv) ... print(fFinal Answer: {{answer}})
Ensure the final answer is the last line in python code and can only be in the "print(fFinal Answer: {{answer}})" form, no other from. Ensure variable "answer" can only be "AnswerName1, AnswerName2..." form, no other form, and "AnswerName" can only be a number or entity name, as short as possible, without any explanation." }
数据使用
- 如果你想直接评估 LLMs 在表格数据上的能力,可以使用
TableBench-PoT,TableBench-SCoT和TableBench-TCoT来直接评估模型的能力。 - 如果你希望自定义评估的提示方法,请遵循
answer_formatter中的规范,以减少由于不一致的自由形式答案导致的评估错误。




