five

TECA: Textual Entailment Catalan dataset

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4593271
下载链接
链接失效反馈
官方服务:
资源简介:
If you use this resource in your work, please cite our latest paper: @inproceedings{armengol-estape-etal-2021-multilingual,     title = "Are Multilingual Models the Best Choice for Moderately Under-resourced Languages? {A} Comprehensive Assessment for {C}atalan",     author = "Armengol-Estap{\'e}, Jordi  and       Carrino, Casimiro Pio  and       Rodriguez-Penagos, Carlos  and       de Gibert Bonet, Ona  and       Armentano-Oller, Carme  and       Gonzalez-Agirre, Aitor  and       Melero, Maite  and       Villegas, Marta",     booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",     month = aug,     year = "2021",     address = "Online",     publisher = "Association for Computational Linguistics",     url = "https://aclanthology.org/2021.findings-acl.437",     doi = "10.18653/v1/2021.findings-acl.437",     pages = "4933--4946", } TECA són dos subsets de TE en Català, catalan_TE1 i vilaweb_TE, que contenen 14997 i 6166 parells de premisses i hipòtesis, anotades segons la relació d'inferència que tenen (implicació, contradicció o neutra). TECa contains two Catalan TE sub-datasets, catalan_TE1 and vilaweb_TE, containing 14997 and 6166 annotated pairs of sentences. "Textual entailment (TE) in natural language processing is a directional relation between text fragments. The relation holds whenever the truth of one text fragment follows from another text. In the TE framework, the entailing and entailed texts are termed text (t) and hypothesis (h), respectively." From Wikpedia. In TECa datasets, each sentence has three hypotheses, annotated as follows: * "0": positive TE (Inference, text entails hypothesis) * "1": non-TE (Neutral, text does not entail nor contradict) * "2": negative TE (Contradiction, text contradicts hypothesis). Source sentences are extracted from the Catalan Textual Corpus (https://doi.org/10.5281/zenodo.4519349), and from Vilaweb newswire. Both sub-datasets are released under CC-by-4.0 licence.
创建时间:
2021-08-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作