TACT

Name: TACT
Creator: maas
Published: 2025-12-05 12:14:07
License: 暂无描述

魔搭社区2025-12-05 更新2025-04-26 收录

下载链接：

https://modelscope.cn/datasets/google/TACT

下载链接

链接失效反馈

官方服务：

资源简介：

# TACT: A Complex Numerical Reasoning Benchmark ## [Paper - TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools](https://arxiv.org/abs/2406.03618) Website: https://tact-benchmark.github.io __Abstract:__ Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts. To better evaluate this setting and facilitate modeling efforts, we introduce TACT - Text And Calculations through Tables, a dataset crafted to evaluate LLMs' reasoning and computational abilities using complex instructions. TACT contains challenging instructions that demand stitching information scattered across one or more texts, and performing complex integration on this information to generate the answer. We construct this dataset by leveraging an existing dataset of texts and their associated tables. For each such tables, we formulate new queries, and gather their respective answers. We demonstrate that all contemporary LLMs perform poorly on this dataset, achieving an accuracy below 38\%. To pinpoint the difficulties and thoroughly dissect the problem, we analyze model performance across three components: table-generation, Pandas command-generation, and execution. Unexpectedly, we discover that each component presents substantial challenges for current LLMs. These insights lead us to propose a focused modeling framework, which we refer to as IE as a tool. Specifically, we propose to add "tools" for each of the above steps, and implement each such tool with few-shot prompting. This approach shows an improvement over existing prompting techniques, offering a promising direction for enhancing model capabilities in these tasks. ### Usage Run the following code to load the TACT dataset. Before executing this code, ensure that you are logged in using your Huggingface access token. ```python ! pip install datasets from datasets import load_dataset import json import pandas as pd # First, download the original InstructIE test set and load it into a DataFrame: !wget https://raw.githubusercontent.com/yzjiao/On-Demand-IE/main/dataset/test_data.json file_path = 'test_data.json' with open(file_path, 'r') as file: data = json.load(file) instructIE_df = pd.json_normalize(data)[['text', 'table']] # Now load the TACT dataset: tact_df = load_dataset("google/TACT")['test'].to_pandas() # Merge and prepare the final eval DataFrame: filtered_instructIE = instructIE_df[instructIE_df.index.isin(tact_df['InstructIE_index'])] tact_df.set_index('InstructIE_index', inplace=True) merged_tact_df = filtered_instructIE.merge(tact_df, left_index=True, right_index=True, how='inner')\ [['instruction', 'text', 'table', 'query_over_the_table', 'pandas_command', 'result']] ``` #### Fields Descriptions * **instruction**: The TACT numerical instruction * **text**: The accompanning text from the source InstructIE dataset * **table**: The accompanning table from the source InstructIE dataset * **query_over_the_table** The natural language query over the table, yilding the instruction result * **pandas_command**: The TACT Pandas command that fits the table and the instruction (the translation of the query_over_the_table) * **result**: The expected result of applying the Pandas command over the table The following code may be used in order to convert the tables into Pandas DataFrames: ```python def markdown2dic(input_str): input_str = str(input_str) rows = input_str.split("\n") rows = [row.strip() for row in rows if row] keys = [key.strip() for key in rows[0].split("|") if key] data = [] for row in rows[2:]: values = [value.strip() for value in row.split("|") if value] data.append(dict(zip(keys, values))) return data, len(rows)-2, len(keys) def get_df_from_markdown(input_str): data, l1, l2 = markdown2dic(input_str) df = pd.DataFrame(data) df.replace("N/A", None, inplace=True) return df ``` Then run the following: ```python merged_tact_df['tabel_pandas_df'] = merged_tact_df['table'].apply(get_df_from_markdown) ``` ### **Evaluation Benchmark Notice** This dataset is intended solely for evaluation purposes and must not be used in the training of NLP models. Please ensure that the dataset is not redistributed without adequate measures to prevent indexing by web-crawlers. To aid in the detection of potential data contamination in web-crawled corpora, each dataset instance includes a unique 64-character identifier string. The string identifier for this dataset is: TACT:QZHVnrtoCTsS6jgz0lplZqvnS2ISxhmEbUMjYAN9KdgTIMkIxsu0llvvQjE2VPAS

# TACT：复杂数值推理基准数据集 ## [论文 - TACT：借助信息抽取工具推进复杂聚合推理](https://arxiv.org/abs/2406.03618) 官方网站：https://tact-benchmark.github.io ### 摘要大语言模型（Large Language Models, LLMs）在需要整合跨文本信息的查询任务中往往表现欠佳。为更好地评估此类任务场景并推动相关建模研究，我们提出TACT——基于表格的文本与计算数据集（Text And Calculations through Tables），这一数据集专为通过复杂指令评估大语言模型的推理与计算能力而打造。TACT包含极具挑战性的指令，要求模型整合分散于一篇或多篇文本中的信息，并对这些信息开展复杂的综合处理以生成答案。我们依托现有包含文本及对应表格的数据集构建了本数据集。针对其中每张表格，我们设计全新的查询问题并收集对应的标准答案。实验表明，所有当前主流的大语言模型在本数据集上的表现均不佳，准确率均低于38%。为精准定位难点并全面剖析该问题，我们从三个组件维度分析模型性能：表格生成、Pandas命令生成以及命令执行。出乎意料的是，我们发现当前大语言模型在每个组件上都面临着显著的挑战。基于这些发现，我们提出了一个针对性的建模框架，我们将其称为“将信息抽取作为工具”（IE as a tool）。具体而言，我们为上述每个步骤都增设了“工具”，并通过少样本提示（few-shot prompting）实现每个工具。该方法相较于现有提示技术取得了性能提升，为提升模型在这类任务中的能力提供了极具前景的研究方向。 ### 使用方法运行以下代码即可加载TACT数据集。执行代码前，请确保已使用Huggingface访问令牌完成登录。 python ! pip install datasets from datasets import load_dataset import json import pandas as pd # 首先，下载原始InstructIE测试集并加载为DataFrame： !wget https://raw.githubusercontent.com/yzjiao/On-Demand-IE/main/dataset/test_data.json file_path = 'test_data.json' with open(file_path, 'r') as file: data = json.load(file) instructIE_df = pd.json_normalize(data)[['text', 'table']] # 接下来加载TACT数据集： tact_df = load_dataset("google/TACT")['test'].to_pandas() # 合并并准备最终的评估DataFrame： filtered_instructIE = instructIE_df[instructIE_df.index.isin(tact_df['InstructIE_index'])] tact_df.set_index('InstructIE_index', inplace=True) merged_tact_df = filtered_instructIE.merge(tact_df, left_index=True, right_index=True, how='inner') [['instruction', 'text', 'table', 'query_over_the_table', 'pandas_command', 'result']] #### 字段说明 * **instruction**：TACT数值推理指令 * **text**：源自原始InstructIE数据集的配套文本 * **table**：源自原始InstructIE数据集的配套表格 * **query_over_the_table**：针对表格的自然语言查询，其结果即为指令的最终输出 * **pandas_command**：适配给定表格与指令的TACT Pandas命令（即query_over_the_table的结构化表达） * **result**：对表格执行该Pandas命令后得到的预期结果可使用以下代码将表格转换为Pandas DataFrame： python def markdown2dic(input_str): input_str = str(input_str) rows = input_str.split(" ") rows = [row.strip() for row in rows if row] keys = [key.strip() for key in rows[0].split("|") if key] data = [] for row in rows[2:]: values = [value.strip() for value in row.split("|") if value] data.append(dict(zip(keys, values))) return data, len(rows)-2, len(keys) def get_df_from_markdown(input_str): data, l1, l2 = markdown2dic(input_str) df = pd.DataFrame(data) df.replace("N/A", None, inplace=True) return df 随后执行以下代码： python merged_tact_df['tabel_pandas_df'] = merged_tact_df['table'].apply(get_df_from_markdown) ### 评估基准说明本数据集仅用于评估用途，严禁用于自然语言处理模型的训练。请确保在未采取足够措施防止网络爬虫索引的情况下，不得重新分发本数据集。为协助检测网络爬取语料中可能存在的数据污染问题，每个数据集样本均包含一个唯一的64位字符标识符。本数据集的标识符为：TACT:QZHVnrtoCTsS6jgz0lplZqvnS2ISxhmEbUMjYAN9KdgTIMkIxsu0llvvQjE2VPAS

提供机构：

maas

创建时间：

2025-04-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集