A global dataset of viral sequence, diversity, and distribution of arboviruses and arthropod-specific viruses

Name: A global dataset of viral sequence, diversity, and distribution of arboviruses and arthropod-specific viruses
Creator: figshare
Published: 2023-02-24 16:34:20
License: 暂无描述

DataCite Commons2023-02-24 更新2024-08-18 收录

下载链接：

https://figshare.com/articles/dataset/A_global_dataset_of_viral_sequence_diversity_and_distribution_of_arboviruses_and_arthropod-specific_viruses/22154573/2

下载链接

链接失效反馈

官方服务：

资源简介：

We built a comprehensive dataset of the arboviruses and arthropod-specific viruses by curating worldwide available data from Arbovirus Catalog, Section VIII-F of the Biosafety in Microbiological and Biomedical Laboratories 6th edition, Virus Metadata Resource of International Committee on Taxonomy of Viruses, and GenBank. This dataset includes a complete information on viral taxonomy, biological characteristics, vectors and vertebrate hosts, distribution, recommended biosafety levels, genome segment, and nucleotide/amino acid sequences, which will facilitate research by scientists/researchers of arboviruses and arthropod-specific viruses in viral vector/host prediction, disease outbreak risk warning, arbovirus/arthropod-specific interactions, phylogenetic and evolutionary relationships, and biosafety risk assessment. This global dataset of viral sequence, diversity, and distribution for and diversity dataset of arbovirus and arthropod-specific virus contains a viral information file (.csv), a nucleic acid sequences file (.fna) and amino acid sequences file (.faa), as accessible from figshare (ref). The column details of viral meta information file (.CSV) are as follows: Taxonomy Information 1. Virus_Group: viruses in the database are divided into two groups: arbovirus and arthropod-specific virus. The former has both vertebrate and arthropod hosts, the latter has only arthropod hosts. 2. Name: the virus name, each name represents a distinct virus. 3. Acronym: acronym of virus name. 4. NCBI_Taxonomy_ID: taxonomy identifier of virus from NCBI Taxonomy Database. 5. Isolate: Isolate of virus from NCBI GenBank. 6. Unified_Isolate_Number: renumbering of the field Isolate. Each isolate of the same virus is numbered. 7. Species: species that the virus belongs to. Species of the viruses are normally different with their names. 8. Genus: genus that the virus belongs to. 9. Family: family that the virus belongs to. Genome Information 10. Segmented: whether the genome of the virus is unsegmented (recorded as “no”) or segmented virus (recorded as “yes”). Virus with unknown number of segments (recorded as “NAV”). 11. Number_of_Segments: the theoretical number of segments of the virus. 12. Molecule_Type: molecule types of the virus genome which are divided into ssRNA(+), ssRNA(-), ssRNA(+/-), dsRNA, RNA, ssDNA(+/-), dsDNA and etc. Sequence_Information 13. Accession: NCBI GenBank Accession of the nucleotide sequence. 14. Locus: the locus name of the nucleotide sequence. 15. SRA_Accession: NCBI SRA Accession of the nucleotide sequence. 16. Submitters: submitters of the nucleotide sequence. 17. Sequence_Type: whether the nucleotide sequence is a reference sequence (recorded as “RefSeq”) or a non-reference sequence (recorded as “GenBank”). 18. BioSample: NCBI BioSample Accession of the nucleotide sequence. 19. GenBank_Title: the field “DEFINITION” of NCBI GenBank database of the sequence. 20. Genotype: genotype of the nucleotide sequence. 21. Segment: segment identifier of the nucleotide sequence. 22. Unified_Segment_Number: renumbering of the field Segment. Each segment is assigned a new number from 1. Segment of the unsegmented virus is assigned as 1. Host Information 23. Host_Species: the species of the dead-end host of the virus. 24. Host_Genus: the genus of the dead-end host of the virus. 25. Host_Family: the family of the dead-end host of the virus. 26. Host: the field from NCBI GenBank database that represents dead-end host or vectors. Biosafety Information 27. Recommended_BSL: recommended biosafety level of laboratory to research the virus. 28. Basis_of_Rating: risk assessment of the virus. 29. Antigenic_Group: antigenic group of the virus. 30. Isolated: whether the virus has been isolated (“Yes”, “No” or “NAV”). Source Information 31. Latitude_and_Longitude: longitude and latitude of the virus isolation source. 32. State_or_Province: state or provincial administrative unit of the virus source. 33. Geo_Location: geographical position of the virus source. 34. Country_or_Region: the country of the virus source. 35. Isolation_Source: the organism which the virus was collected from. 36. Collection_Date: the date that the virus was collected. 37. Submit_Date: the date that the virus was submitted. 38. Release_Date: the date that the virus was released or last modified. References 39. Publications: the number of publications literature covering the specific virus research. The nucleotide sequences file and amino acid sequences file are standard FASTA files. Each sequence information consists of two lines, header and content. The header contains two types of information, locus and accession, split by '|'. Content is a specific nucleic acid or amino acid sequence. The detailed definitions of the fields in the header are as follows: 1. Locus: NCBI GenBank LOCUS ID of the nucleotide sequence. 2. Accession: NCBI GenBank Accession of the nucleotide sequence. Protein_id: a protein sequence identification number (for amino acid sequences file).

提供机构：

figshare

创建时间：

2023-02-24