five

VilaQuAD: an extractive QA dataset from Catalan newswire

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4562337
下载链接
链接失效反馈
官方服务:
资源简介:
If you use this resource in your work, please cite our latest paper: @inproceedings{armengol-estape-etal-2021-multilingual,     title = "Are Multilingual Models the Best Choice for Moderately Under-resourced Languages? {A} Comprehensive Assessment for {C}atalan",     author = "Armengol-Estap{\'e}, Jordi  and       Carrino, Casimiro Pio  and       Rodriguez-Penagos, Carlos  and       de Gibert Bonet, Ona  and       Armentano-Oller, Carme  and       Gonzalez-Agirre, Aitor  and       Melero, Maite  and       Villegas, Marta",     booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",     month = aug,     year = "2021",     address = "Online",     publisher = "Association for Computational Linguistics",     url = "https://aclanthology.org/2021.findings-acl.437",     doi = "10.18653/v1/2021.findings-acl.437",     pages = "4933--4946", } Dataset de QA extractiu amb 6282 parells de pregunta-resposta desenvolupats a partir de paràgrafs del diari en línia Vilaweb (https://www.vilaweb.cat) usats sota llicència CC-BY-NC-ND 4.0. This dataset contains 2095 of Catalan language news articles along with 1 to 5 questions referring to each fragment (or context). VilaQuad articles are extracted from the daily Vilaweb (www.vilaweb.cat) and used under CC-by-nc-sa-nd (https://creativecommons.org/licenses/by-nc-nd/3.0/deed.ca) licence. This dataset can be used to build extractive-QA and Language Models. Funded by the Generalitat de Catalunya, Departament de Polítiques Digitals i Administració Pública (AINA), MT4ALL and Plan de Impulso de las Tecnologías del Lenguaje (Plan TL).
创建时间:
2021-08-02
二维码
社区交流群
二维码
科研交流群
商业服务