Datasets for "ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships"
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15002370
下载链接
链接失效反馈官方服务:
资源简介:
ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships
These are the datasets for the paper ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships.
Dataset dictionary
This repository contains the splits that resulted from the research project "ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships". All the splits are in JSONL format and have the same fields per example:
sentence_1: First sentence of the pair.
sentence_2: Second sentence of the pair.
connector: Linking phrase used to extract pair.
connector_type: NLI label, between "contrasting", "entailment", "reasoning" or "neutral"
extraction_strategy: "linking_phrase" for "contrasting", "entailment", "reasoning" and "none" for neutral.
distance: How many sentences before the connector is the sentence_1
sentence_1_position: Number of sentence for sentence_1 in the source document
sentence_1_paragraph: Number of paragraph for sentence_1 in the source document
sentence_2_position: Number of sentence for sentence_2 in the source document
sentence_2_paragraph: Number of paragraph for sentence_2 in the source document
id: Unique identifier for the example
dataset: Source corpus of the pair. Metadata of corpus, including source can be found in dataset_metadata.xlsx.
genre: Writing genre of the dataset.
domain: Domain genre of the dataset.
Example:
{"sentence_1":"sefior Bcajavides no es moderado, tampoco lo convertirse e\u00f1 declarada divergencia de miras polileido en griego","sentence_2":"era mayor claricomentarios, as\u00ed de los peri\u00f3dicos como de los homes dado \u00e1 la voluntad de los hombres, sin que sobreticas","connector":"por consiguiente,","connector_type":"reasoning","extraction_strategy":"linking_phrase","distance":1.0,"sentence_1_paragraph":4,"sentence_1_position":86,"sentence_2_paragraph":4,"sentence_2_position":87,"id":"esnews__spanish_pd_news__531537","dataset":"esnews__spanish_pd_news","genre":"news","domain":"spanish_public_domain_news"}
Dataset files
ESNLIR_datasets.zip: Contains the splits used for BERT-based model training, validation and testing, including stress test splits.
labeled_final_dataset.jsonl: Is the final test dataset with 974 examples selected by human majority label matching the original linking phrase label.
创建时间:
2025-03-13



