QADO: An RDF Representation of Question Answering Datasets and their Analyses for Improving Reproducibility
收藏Figshare2022-12-19 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/QADO_An_RDF_Representation_of_Question_Answering_Datasets_and_their_Analyses_for_Improving_Reproducibility/21750029
下载链接
链接失效反馈官方服务:
资源简介:
Measuring the quality of Question Answering (QA) systems is a crucial task to validate the results of novel approaches. However, there are already indicators of a reproducibility crisis as many published systems have used outdated datasets or use subsets of QA benchmarks, making it hard to compare results. We identified the following core problems: there is no standard data format, instead, proprietary data representations are used by the different partly inconsistent datasets; additionally, the characteristics of datasets are typically not reflected by the dataset maintainers nor by the system publishers. To overcome these problems, we established an ontology---Question Answering Dataset Ontology (QADO)---for representing the QA datasets in RDF. The following datasets were mapped into the ontology: the QALD series, LC-QuAD series, RuBQ series, ComplexWebQuestions, and Mintaka. Hence, the integrated data in QADO covers widely used datasets and multilinguality. Additionally, we did intensive analyses of the datasets to identify their characteristics to make it easier for researchers to identify specific research questions and to select well-defined subsets. The provided resource will enable the research community to improve the quality of their research and support the reproducibility of experiments. Here, the mapping results of the QADO process, the SPARQL queries for data analytics, and the archived analytics results file are provided. Up-to-date statistics can be created automatically by the script provided at the corresponding QADO GitHub RDFizer repository.
创建时间:
2022-12-19



