five

PADI-web corpus used for the EpidBioELECTRA approach

收藏
DataCite Commons2023-10-20 更新2025-04-09 收录
下载链接:
https://dataverse.cirad.fr/citation?persistentId=doi:10.18167/DVN1/WD1UC2
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains a set of news articles in English related to animal disease outbreaks, that have been used to train and evaluate EpidBioELECTRA epidemiological classifier and explainer. It is composed of 70,707 articles in csv format found in several folders (relevant folder contains 34,015 news articles labelled relevant, while irrelevant folder contains 36,692 irrelevant articles), with information about the article itself (publication date, title, content, url, etc.). Thematic feature folder contains relevant and irrelevant labelled thematic features (disease, host, location, cases, etc) as contained in relevant and irrelevant documents by sentence id organized in year and month of the article. These labels were machine generated by PADIWeb classifier.
提供机构:
CIRAD Dataverse
创建时间:
2023-04-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作