next-tat/tat-llm-instructions

Name: next-tat/tat-llm-instructions
Creator: next-tat
Published: 2024-02-23 04:42:36
License: 暂无描述

Hugging Face2024-02-23 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/next-tat/tat-llm-instructions

下载链接

链接失效反馈

官方服务：

资源简介：

TAT-LLM-Instructions数据集是一个专门为金融领域设计的指令集，结合了表格和文本数据，旨在优化大型语言模型（LLMs）和外部执行器的性能。该数据集整合了三个公开的表格和文本问答数据集：FinQA、TAT-QA和TAT-DQA，并通过特定的模板将这些数据转换为适合LLMs的提示。数据集包含训练集和验证集，分别有32555和4136个样本，总大小为186799526字节。数据集的任务类别包括文本生成、问答和表格问答，标签为金融。

提供机构：

next-tat

原始信息汇总

TAT-LLM-Instructions 数据集概述

数据集信息

特征:
- resp: 字符串类型
- id: 字符串类型
- user_prompt: 字符串类型
分割:
- train: 165619445 字节, 32555 个样本
- validation: 21180081 字节, 4136 个样本
下载大小: 37315773 字节
数据集大小: 186799526 字节

配置

默认配置:
- train: data/train-*
- validation: data/validation-*

任务类别

文本生成
问答
表格问答

数据集描述

TAT-LLM-Instructions 数据集是一个精心收集的金融数据集，结构类似于指令。它从三个公开可用的表格和文本问答数据集（FinQA, TAT-QA, TAT-DQA）中聚合信息。通过使用专门的模板，TAT-LLM-Instructions 将原始数据集转换为优化的大型语言模型（LLMs）和外部执行器的提示，旨在显著提高它们的性能。

模板

FinQA 指令模板

Below is an instruction that describes a question answering task in the finance domain, paired with an input table and its relevant text that provide further context. The given question is relevant to the table and text. Generate an appropriate answer to the given question.

Instruction:

Given a table and a list of texts in the following, what is the answer to the question? Please complete the task in three steps:

In the first step, extract the relevant numerical values from the provided table or texts. Store these in the variable ‘{evidence}‘. If there are multiple values, separate them using the ’#’ symbol.
In the second step, generate an equation using the extracted numerical values. Store this equation in the variable ‘{equation}‘.
In the third step, calculate the answer based on the equation and store it in the variable ‘{answer}‘. Please organize the results in the following table: | step | output | | 1 | {evidence} | | 2 | {equation} | | 3 | {answer} | Finally, present the calculated answer in the format: "The answer is: {answer}"

Table {table}

Text {text}

Question {question}

Response

|step | output| |1 | {gold_evidence} | |2 | {gold_equation} | |3 | {gold_answer} | The answer is: {gold_answer}

TAT-QA 指令模板

Instruction

Given a table and a list of texts in the following, answer the question posed using the following five-step process:

Step 1: Predict the type of question being asked. Store this prediction in the variable ‘{question_type}‘. The value of ‘{question_type}‘ can be one of the following:‘Single span‘, ‘Multiple spans‘, ‘Count‘, or ‘Arithmetic‘.
Step 2: Extract the relevant strings or numerical values from the provided table or texts. Store these pieces of evidence in the variable ‘{evidence}‘. If there are multiple pieces of evidence, separate them using the ’#’ symbol.
Step 3: if the ‘{question_type}‘ is ‘Arithmetic‘, formulate an equation using values stored in ‘{evidence}‘. Store this equation in the variable ‘{equation}‘. For all other question types, set the value of {equation} to ’N.A.’.
Step 4: Predict or calculate the answer based on the question type, evidence and equation. Store it in the variable ‘{answer}‘. If there are multiple values, separate them using the ’#’ symbol.
Step 5: If the value of the ‘{answer}‘ is numerical, predict its scale and store it in a variable named ‘{scale}‘. The value of ‘{scale}‘ can be one of the following: ‘none‘, ‘percent‘, ‘thousand‘, ‘million‘, or ‘billion‘. For non-numerical values, set the value of ‘{scale}‘ to ’none’. Please organize the results in the following table: | step | output | | 1 | {question_type} | | 2 | {evidence} | | 3 | {equation} | | 4 | {answer} | | 5 | {scale} | Finally, present the final answer in the format: "The answer is: {answer} #### and its corresponding scale is: {scale}"

Table {table}

Text {text}

Question {question}

Response

| step | output | | 1 | {gold_question_type} | | 2 | {gold_evidence} | | 3 | {gold_equation} | | 4 | {gold_answer} | | 5 | {gold_scale} | The answer is: {gold_answer} #### and its corresponding scale is: {gold_scale}

TAT-DQA 指令模板

Below is an instruction that describes a question answering task in the finance domain, paired with an input document that has one or multiple pages that provide further context. The given question is relevant to the document. Generate an appropriate answer to the given question.

Instruction

Given a document that has one or multiple pages in the following, answer the question posed using the following five-step process:

Step 1: Predict the type of question being asked. Store this prediction in the variable ‘{question_type}‘. The value of ‘{question_type}‘ can be one of the following:‘Single span‘, ‘Multiple spans‘, ‘Count‘, or ‘Arithmetic‘.
Step 2: Extract the relevant strings or numerical values from the provided document. Store these pieces of evidence in the variable ‘{evidence}‘. If there are multiple pieces of evidence, separate them using the ’#’ symbol.
Step 3: if the ‘{question_type}‘ is ‘Arithmetic‘, formulate an equation using values stored in ‘{evidence}‘. Store this equation in the variable ‘{equation}‘. For all other question types, set the value of {equation} to ’N.A.’.
Step 4: Predict or calculate the answer based on the question type, evidence and equation. Store it in the variable ‘{answer}‘. If there are multiple values, separate them using the ’#’ symbol.
Step 5: If the value of the ‘{answer}‘ is numerical, predict its scale and store it in a variable named ‘{scale}‘. The value of ‘{scale}‘ can be one of the following: ‘none‘, ‘percent‘, ‘thousand‘, ‘million‘, or ‘billion‘. For non-numerical values, set the value of ‘{scale}‘ to ’none’. Please organize the results in the following table: | step | output | | 1 | {question_type} | | 2 | {evidence} | | 3 | {equation} | | 4 | {answer} | | 5 | {scale} | Finally, present the final answer in the format: "The answer is: {answer} #### and its corresponding scale is: {scale}"

Text {pages}

Question {question}

Response

5,000+

优质数据集

54 个

任务类型

进入经典数据集