Elgold intermediate: raw texts
收藏DataCite Commons2026-04-30 更新2024-07-13 收录
下载链接:
https://mostwiedzy.pl/en/open-research-data/elgold-intermediate-raw-texts,628102859659161-0
下载链接
链接失效反馈官方服务:
资源简介:
The dataset contains raw texts scrapped from various internet sources which were used for creating the Elgold dataset.
The texts were collected from 7 main categories: "News", "Job offers", "Movie reviews", "Automotive blogs", "Amazon product reviews", "Scientific papers abstracts", and "Historic blogs". The Scientific Papers category was additionally divided into five subcategories: "Biomedicine", "Life Sciences", "Mathematics", "Medicine & Public Health", and "Science, Humanities and Social Sciences, multidisciplinary".
The raw texts were collected from publicly available Internet sources by the group of 14 participants. Every category has 2-3 participants assigned.
The dataset consists of approximately 100 texts for each category (and subcategory in the case of "Scientific papers abstracts").
提供机构:
Gdańsk University of Technology
创建时间:
2024-06-28



