five

Annotation of epidemiological information in animal disease-related news articles: guidelines and manually labelled corpus

收藏
DataCite Commons2023-10-23 更新2025-04-09 收录
下载链接:
https://dataverse.cirad.fr/citation?persistentId=doi:10.18167/DVN1/YGAKNB
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains two files: (i) An annotated corpus ("epi_info_corpus‧xlsx") containing 486 manually annotated sentences extracted from 32 animal disease-related news articles. These news articles were obtained from the database of an event-based biosurveillance system dedicated to animal health surveillance, PADI-web (https://padi-web.cirad.fr/en/). The first sheet (‘article_metadata’) provides metadata about the news articles : (1) id_article, the unique id of a news article, (2) title, the title of the news article, (3) source, the name of the news article website, (3) publication_date, the publication date of the news article (mm-dd-yyyy) and (4) URL, the web URL of the news article. The second sheet (‘annot_sentences’) contains the annotated sentences: each row corresponds to a sentence from a news article. Each sentence has two distinct labels, Event type and Information type. The set of columns is : (1) id_article, the id of the news article to which the sentence belongs, (2) id_sentence, the unique id of the sentence, indicating its position in the news content (integer ranging from 1 to n, n being the total number of sentences in the news article), (3) sentence_text, the sentence textual content, (4) event_type, the Event type label and (5) information_type, the Information type label. Event type labels indicate the relation between the sentence and the epidemiological context, i‧e. current event (CE), risk event (RE), old event (OE), general (G) and irrelevant (IR). Information type labels indicate the type of epidemiological information, i‧e descriptive epidemiology (DE), distribution (DI), preventive and control measures (PCM), economic and political consequences (EPC), transmission pathway (TP), concern and risk factors (CRF), general epidemiology (GE) and irrelevant (IR). (ii) The annotation guidelines ("epi_info_guidelines‧doc") providing a detailed description of each category.
提供机构:
CIRAD Dataverse
创建时间:
2019-12-04
二维码
社区交流群
二维码
科研交流群
商业服务