PRELEARN Dataset

DataCite Commons2022-06-01 更新2024-07-13 收录

下载链接：

https://live.european-language-grid.eu/catalogue/corpus/8084

下载链接

链接失效反馈

官方服务：

资源简介：

The PRELEARN dataset contains 6607 concept pairs and a “Wikipedia pages file” containing the raw text of the Wikipedia pages referring to the concepts extracted (using WikiExtractor on a Wikipedia dump of Jan. 2020). The dataset has been used for the PRELEARN shared task (https://sites.google.com/view/prelearn20/), organised as part of Evalita 2020 evaluation campaign (http://www.evalita.it/2020). It was extracted from the ITA-PREREQ dataset (Miaschi et al., 2019), built upon the AL-CPL dataset (Liang et al., 2018), a collection of binary-labelled concept pairs extracted from textbooks on four domains: data mining, geometry, physics and pre-calculus. The concept pairs consist of target and prerequisite concepts (A, B), labelled as follows:1 if B is a prerequisite of A;0 in all other cases.Domain experts were asked to manually annotate if pairs of concepts showed a prerequisite relation or not. The dataset is split into a training set (5908 pairs) and a test set (699 pairs). The distribution of prerequisite and non- prerequisite labels was balanced (50/50) for each domain only in the test datasets.

提供机构：

ELG

创建时间：

2022-06-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集