PADI-web corpus used for the EpidBioELECTRA approach
收藏DataCite Commons2023-10-20 更新2025-04-09 收录
下载链接:
https://dataverse.cirad.fr/citation?persistentId=doi:10.18167/DVN1/WD1UC2
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains a set of news articles in English related to animal disease outbreaks, that have been used to train and evaluate EpidBioELECTRA epidemiological classifier and explainer. It is composed of 70,707 articles in csv format found in several folders (relevant folder contains 34,015 news articles labelled relevant, while irrelevant folder contains 36,692 irrelevant articles), with information about the article itself (publication date, title, content, url, etc.). Thematic feature folder contains relevant and irrelevant labelled thematic features (disease, host, location, cases, etc) as contained in relevant and irrelevant documents by sentence id organized in year and month of the article. These labels were machine generated by PADIWeb classifier.
提供机构:
CIRAD Dataverse
创建时间:
2023-04-25



