Thesis on Global Patterns of Sampling Bias in Molecular Sequences of Vertebrate Viruses: Supplementary Data and Code

Name: Thesis on Global Patterns of Sampling Bias in Molecular Sequences of Vertebrate Viruses: Supplementary Data and Code
Creator: Zenodo
Published: 2026-05-05 14:30:57
License: 暂无描述

DataCite Commons2026-05-05 更新2026-05-07 收录

下载链接：

https://zenodo.org/doi/10.5281/zenodo.20039252

下载链接

链接失效反馈

官方服务：

资源简介：

Supplementary Data and Code from my thesis on "Global Patterns of Sampling Bias in Molecular Sequences of Vertebrate Viruses". Virus family data was attained from NCBI Virus. GDP and population size data was attained from World Bank. Total number of animal species (biodiversity proxy) data was attained from IUCN Red List. NCBI Virus information per family can be found as .csv files in format "NCBI_VirusFamily_09042026.csv". For data on Coronaviridae, Orthomyxoviridae and Retroviridae, refer to .fst files which can be read into R; how these files were processed can be seen in the code; they contain the same information as the other virus families but just in different format for more efficient processing. Code files: loading_NCBI_Virus_datasets (needs to be run first, before any of the other code files); once run, the other code files can be run explorations_of_geographic_bias (code for maps, country-level data, log-log models, k-means clustering) explorations_of_taxonomic_bias (code for virus family bar chart, distinct vertebrate host species information, phylogenetic heatmap) explorations_of_temporal_bias (code for cumulative discovery curves, discovery rates, Kruskal-Wallis tests) For the pipeline used to attain taxonomy information via Taxonkit, "Taxonkit Taxonomic Information.txt". Contains the lines of code used and short description of code. GDP, population and biodiversity (IUCN_species_info) data are saved as .csv.

提供机构：

Zenodo

创建时间：

2026-05-05