Supporting data for "CoVEffect: Interactive System for Mining the Effects of SARS-CoV-2 Mutations and Variants Based on Deep Learning"
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7817519
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains the datasets created and extracted for the paper:
Giuseppe Serna García, Ruba Al Khalaf, Francesco Invernici, Stefano Ceri, and Anna Bernasconi. 2022.
"CoVEffect: Interactive System for Mining the Effects of SARS-CoV-2 Mutations and Variants Based on Deep Learning". (Available online at http://gmql.eu/coveffect)
--------------------------------------------------------------------------------
LIST OF FILES WITH DESCRIPTION:
--------------------------------------------------------------------------------
AdditionalFile1-effects-taxonomy:
Descriptions of legal values for the 'Effect' field, based on a categorized taxonomy.
AdditionalFile2-levels-taxonomy:
Descriptions of legal values for the 'Level' field.
AdditionalFile3-training_dataset_target:
List of target tuples (manually annotated) of 221 abstracts considered for training the model. For each abstract, target tuples follow the schema ID, DOI, title, entity, effect, level, type (mutation or variant), tuples_count (>1 when an effect/level is shared by multiple entities, #abstracts containing the same effect described in the tuple).
AdditionalFile4-validation_dataset_target:
List of target tuples (manually annotated) of 50 abstracts considered for validating the prepared prediction model.
For each abstract, target tuples follow the schema defined for AdditionalFile3.
AdditionalFile5-validation_dataset_highlighted:
Textual abstracts of the 50 manuscripts considered for validation; the text used to support the manual target annotations has been highlighted in yellow.
AdditionalFile6-validation_dataset_prediction:
List of predicted annotations of 50 abstracts considered for validating the prepared prediction model. The file is split in 4 TSV, respectively for entity (a), effect (b), level (c), and whole tuple predictions (d).
AdditionalFile7-keywords_query_list:
Keyword-based search run on the CORD-19 dataset to extract a relevant subset of abstracts regarding the scope of interest of CoVEffect. The Boolean logic used to combine keywords is explained in the section 'Annotations of the biology-related CORD-19 cluster'.
AdditionalFile8-CORD-19_batch_dataset_metadata:
Metadata of the 7,230 papers extracted by the keyword-based query in AdditionalFile7.
These abstracts have been annotated by the prediction framework.
AdditionalFile9-CORD-19_batch_dataset_prediction:
List of predicted annotations of 7,230 abstracts extracted from the biology-related cluster of CORD-19.
AdditionalFile10-test_dataset_target:
List of target tuples (manually annotated) of 100 abstracts randomly selected from the 7,230 extracted as in AdditionalFile8.
For each abstract, target tuples follow the schema defined for AdditionalFile3.
AdditionalFile11-test_dataset_prediction:
List of predicted annotations of 100 abstracts considered for testing the prediction model on a subset of the CORD-19 biology-related cluster. As AdditionalFile6, it is split in 4 TSV, respectively for entity (a), effect (b), level (c), and whole tuple predictions (d).
创建时间:
2023-04-20



