The Financial Document Causality Detection Shared Task (FinCausal 2026): Dataset

Name: The Financial Document Causality Detection Shared Task (FinCausal 2026): Dataset
Creator: e-cienciaDatos
Published: 2026-01-30 09:29:58
License: 暂无描述

DataCite Commons2026-01-30 更新2026-04-25 收录

下载链接：

https://edatos.consorciomadrono.es/citation?persistentId=doi:10.21950/H7RKHH

下载链接

链接失效反馈

官方服务：

资源简介：

The Financial Document Causality Detection Shared Task (FinCausal 2026) aims to improve causality identification in the financial domain through its texts. This shared task focuses on determining the causality associated with both events and quantified facts. For this task, a cause can be the justification of a statement or the reason explaining an outcome; therefore, it is a relation detection task. The changes introduced in the 2026 edition compared to the 2025 edition are numerous. These improvements include an exhaustive review of the datasets to eliminate ambiguities, the expansion of the corpus with more than 500 new fragments for each language featuring complex causal structures—such as chains of three or more elements—and the reformulation of abstractive questions in 10% of the cases to require advanced reasoning. Additionally, a new evaluation metric based on "LLM-as-a-judge" has been implemented to assess the adequacy of the answers, aligning with current state-of-the-art practices. Using an "LLM-as-a-judge" consists of employing a language model specifically instructed to generate ratings from 1 to 5 following a specific set of criteria, which somewhat mimics human evaluation. Participants, given the context and the abstractive question, must extract the literal answer from the context that responds to that question. The questions seek causal-type relationships, whether they are the cause or the effect. The dataset for the Spanish subtask has been extracted from a corpus of Spanish annual financial reports from 2014 to 2018 (FinT-esp), while the English subtask uses the English version of the 2018 bilingual Spanish-English corpus of these reports, along with several annual financial reports from the Lancaster UCREL research team corpus. Participants receive a CSV file with the following fields: ID; Text; Question; Answer. The conventional way to participate is to fine-tune a model using data annotated by linguists (including Inter-Annotator Agreement, IAA) and subsequently use the fine-tuned model to predict the "ANSWER" field of the test set. This publication refers to the competition dataset, specifically the training split with its answers and the test split without answers (since it needs to be evaluated). There are 2,000 samples per language for training, 500 for the English test set, and 503 for the Spanish test set. This is a dataset from the FinCausal 2026 competition. It is designed for participants to use it to fine-tune their models and complete the task with the highest possible similarity to the gold standard, according to the established metrics. It consists of texts annotated by linguists, where a context, an abstractive question, and its corresponding extractive answer—which addresses the causal nature of the question—are provided. There are two versions available: one in English and one in Spanish.

提供机构：

e-cienciaDatos

创建时间：

2026-01-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集