PRELEARN Dataset
收藏DataCite Commons2022-06-01 更新2024-07-13 收录
下载链接:
https://live.european-language-grid.eu/catalogue/corpus/8084
下载链接
链接失效反馈官方服务:
资源简介:
The PRELEARN dataset contains 6607 concept pairs and a “Wikipedia pages file” containing the raw text of the Wikipedia pages referring to the concepts extracted (using WikiExtractor on a Wikipedia dump of Jan. 2020). The dataset has been used for the PRELEARN shared task (https://sites.google.com/view/prelearn20/), organised as part of Evalita 2020 evaluation campaign (http://www.evalita.it/2020). It was extracted from the ITA-PREREQ dataset (Miaschi et al., 2019), built upon the AL-CPL dataset (Liang et al., 2018), a collection of binary-labelled concept pairs extracted from textbooks on four domains: data mining, geometry, physics and pre-calculus. <p><p>The concept pairs consist of target and prerequisite concepts (A, B), labelled as follows:<p>1 if B is a prerequisite of A;<p>0 in all other cases.<p><p>Domain experts were asked to manually annotate if pairs of concepts showed a prerequisite relation or not. The dataset is split into a training set (5908 pairs) and a test set (699 pairs). The distribution of prerequisite and non- prerequisite labels was balanced (50/50) for each domain only in the test datasets.<p>
提供机构:
ELG
创建时间:
2022-06-01



