five

orkg-R0: A Dataset of Structured Summaries for the R0 estimate of Infectious Diseases from Complex Scientific Abstracts

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8068441
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is a curated dataset obtained by filtering, cleaning and manually annotating the metadata file available at CORD_19 dataset (https://allenai.org/data/cord-19). It contains structured summaries for the R0 estimate of infectious diseases from scientific abstracts. The main data directory contains two subdirectories, "raw" sub-folder where it holds the train, test, dev splits of the annotated data. The "processed" subdirectory contains train, test, and dev JSON files filled in a sub-selection of "Templates for FLAN." prompts. The sub-selection is : Drop and Squad_v2 templates. template number 8 from Drop and template number 3 from Squad_v2 have been excluded among all splits. Templates 9 and 10 from Drop have been just used in the training sets. Two main dataset types are included in this repository: Text_based and Json_based. The "dev_templated_files" subdirectory contains two subdirectories of "text" and "json". The "text" sub-folder contains the raw "dev" split filled in all suitable templates for dev where the responses are in the defined structured text_based format. The "json" sub-folder contains the raw "dev" split filled in all suitable templates for dev where the responses are in the defined structured json_based format. The "test_templated_files" subdirectory contains two subdirectories of "text" and "json". The "text" sub-folder contains the raw "test" split filled in all suitable templates for dev where the responses are in the defined structured text_based format. The "json" sub-folder contains the raw "test" split filled in all suitable templates for dev where the responses are in the defined structured json_based format. The "train_templated_files" Subdirectory contains subdirectories each representing a train dataset obtained using the specific templates. it contains 20 different train sets each having 2 json_based and text_based versions, resulting in 40 different training sets.
创建时间:
2023-06-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作