Linked Avian Influenza Epidemiological and Genomic Data for Epidemic Intelligence (2012-2021)
收藏DataCite Commons2025-05-15 更新2025-04-16 收录
下载链接:
https://entrepot.recherche.data.gouv.fr/citation?persistentId=doi:10.57745/JNA7N9
下载链接
链接失效反馈官方服务:
资源简介:
We release a new Avian Influenza epidemiological events dataset (from 2012 to 2021), in which the epidemiological events in EMPRES-i [1] are enriched with the genome sequence data of Avian Influenza cases, publicly provided by the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) [2]. The association between EMPRES-i and BV-BRC is obtained through an automatic task, described in [3]. For this reason, the obtained dataset is the result of the "putatively" associated events between EMPRES-i and BV-BRC. This dataset contributes to the available resources in the field of Avian Influenza surveillance and epidemic intelligence. It can be useful for epidemiologists and computer scientists for studying AI transmission dynamics. This dataset is obtained by our publicly available source code on GitHub: https://github.com/arinik9/AIAGIS Here are the files composing this dataset: raw_intput_files.zip: Raw input files from EMPRES-i and BV-BRC. Note that the collected data is structured by nature, but it needs to be preprocessed and normalized for the purpose of high-quality data linkage. BVBRC_genome_events.csv: Raw genome events in BV-BRC. BVBRC_genome_sequences.csv: Raw genome sequences in BV-BRC. EMPRES-i_events.csv: Raw EMPRES-i events. result_files.zip: doc_events_empres-i_strategy=1-to-1.csv: Enriched EMPRES-i dataset with genetic information based on the 1-to-1 linking strategy (Section 5.2.1 in [3]). doc_events_empres-i_strategy=1-to-many.csv: Enriched EMPRES-i dataset with genetic information based on the 1-to-many linking strategy (Section 5.2.2 in [3]). data_files_for_missing_genetic_information.zip: Average isolate similarity scores for handling missing genetic information in EMPRES-i. These scores are meant to be used when we want to compute the isolate similarity between a pair of EMPRES-i events, but at least one of them has a missing genetic information (Section 6 in [3]). genome_sim_summary_by_country_and_genome_name.csv: It is used when only one of the two events has an isolate information (1st strategy in [3]). genome_sim_summary_by_genome_name.csv: It is used when only one of the two events has an isolate information (2nd strategy in [3]). genome_sim_summary_by_country.csv: It is used when none of the two events has an isolate information (3rd strategy in [3]). genome_sim_summary.csv: It is used when none of the two events has an isolate information (4th strategy in [3]). [1] FAO. 2021. EMPRES Global Animal Disease Information System (EMPRES-i). Accessed on 23 October 2024, http://empres-i.fao.org/empres-i, licence CC-BY-4.0. [2]Introducing the Bacterial and Viral Bioinformatics Resource Center (BV-BRC): a resource combining PATRIC, IRD and ViPR. Olson RD, Assaf R, Brettin T, Conrad N, Cucinell C, Davis JJ, Dempsey DM, Dickerman A, Dietrich EM, Kenyon RW, Kuscuoglu M, Lefkowitz EJ, Lu J, Machi D, Macken C, Mao C, Niewiadomska A, Nguyen M, Olsen GJ, Overbeek JC, Parrello B, Parrello V, Porter JS, Pusch GD, Shukla M, Singh I, Stewart L, Tan G, Thomas C, VanOeffelen M, Vonstein V, Wallace ZS, Warren AS, Wattam AR, Xia F, Yoo H, Zhang Y, Zmasek CM, Scheuermann RH, Stevens RL. Nucleic Acids Res. 2022 Nov 9:gkac1003. doi: 10.1093/nar/gkac1003 [3] Nejat Arınık, Roberto Interdonato, Mathieu Roche, Maguelonne Teisseire (2024). An Improved Avian Influenza Surveillance Dataset with Genome Sequences (submitted).
我们发布了一个新的禽流感(Avian Influenza)流行病学事件数据集(2012-2021年),其中EMPRES-i [1]中的流行病学事件通过细菌与病毒生物信息学资源中心(Bacterial and Viral Bioinformatics Resource Center, BV-BRC)[2]公开提供的禽流感病例基因组序列数据得到丰富。EMPRES-i与BV-BRC之间的关联通过[3]中描述的自动化任务获取,因此所得数据集是两者之间“推测性”关联事件的结果。
该数据集为禽流感监测与疫情情报领域的现有资源提供了补充,对流行病学家和计算机科学家研究禽流感传播动力学具有重要价值。本数据集可通过GitHub上公开的源代码获取:https://github.com/arinik9/AIAGIS
以下是构成该数据集的文件:
raw_intput_files.zip:来自EMPRES-i和BV-BRC的原始输入文件。请注意,所收集的数据本质上是结构化的,但为实现高质量数据链接需进行预处理和标准化。
BVBRC_genome_events.csv:BV-BRC中的原始基因组事件数据
BVBRC_genome_sequences.csv:BV-BRC中的原始基因组序列数据
EMPRES-i_events.csv:EMPRES-i的原始事件数据
result_files.zip:
doc_events_empres-i_strategy=1-to-1.csv:基于一对一链接策略([3]第5.2.1节)添加遗传信息的EMPRES-i富集数据集
doc_events_empres-i_strategy=1-to-many.csv:基于一对多链接策略([3]第5.2.2节)添加遗传信息的EMPRES-i富集数据集
data_files_for_missing_genetic_information.zip:
用于处理EMPRES-i中缺失遗传信息的平均分离株相似性得分,适用于计算一对EMPRES-i事件分离株相似性但至少一方缺失遗传信息的场景([3]第6节)
genome_sim_summary_by_country_and_genome_name.csv:适用于仅一方有分离株信息的情况([3]策略1)
genome_sim_summary_by_genome_name.csv:适用于仅一方有分离株信息的情况([3]策略2)
genome_sim_summary_by_country.csv:适用于双方均无分离株信息的情况([3]策略3)
genome_sim_summary.csv:适用于双方均无分离株信息的情况([3]策略4)
[1] 联合国粮农组织(FAO).2021.EMPRES全球动物疾病信息系统(EMPRES-i).访问日期:2024年10月23日,http://empres-i.fao.org/empres-i,许可证:CC-BY-4.0
[2] 细菌与病毒生物信息学资源中心(BV-BRC)简介:整合PATRIC、IRD与ViPR的资源.Olson RD等.Nucleic Acids Res.2022年11月9日:gkac1003.doi:10.1093/nar/gkac1003
[3] Nejat Arınık, Roberto Interdonato, Mathieu Roche, Maguelonne Teisseire (2024).一个包含基因组序列的改进型禽流感监测数据集(已投稿)
提供机构:
Recherche Data Gouv
创建时间:
2024-10-24



