five

Table6_Exploring Virome Diversity in Public Data in South America as an Approach for Detecting Viral Sources From Potentially Emerging Viruses.csv

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://figshare.com/articles/dataset/Table6_Exploring_Virome_Diversity_in_Public_Data_in_South_America_as_an_Approach_for_Detecting_Viral_Sources_From_Potentially_Emerging_Viruses_csv/18857213
下载链接
链接失效反馈
官方服务:
资源简介:
The South American continent presents a great diversity of biomes, whose ecosystems are constantly threatened by the expansion of human activity. The emergence and re-emergence of viral populations with impact on the human population and ecosystem have shown increases in the last decades. In deference to the growing accumulation of genomic data, we explore the potential of South American-related public databases to detect signals that contribute to virosphere research. Therefore, our study aims to investigate public databases with emphasis on the surveillance of viruses with medical and ecological relevance. Herein, we profiled 120 “sequence read archives” metagenomes from 19 independent projects from the last decade. In a coarse view, our analyses identified only 0.38% of the total number of sequences from viruses, showing a higher proportion of RNA viruses. The metagenomes with the most important viral sequences in the analyzed environmental models were 1) aquatic samples from the Amazon River, 2) sewage from Brasilia, and 3) soil from the state of São Paulo, while the models of animal transmission were detected in mosquitoes from Rio Janeiro and Bats from Amazonia. Also, the classification of viral signals into operational taxonomic units (OTUs) (family) allowed us to infer from metadata a probable host range in the virome detected in each sample analyzed. Further, several motifs and viral sequences are related to specific viruses with emergence potential from Togaviridae, Arenaviridae, and Flaviviridae families. In this context, the exploration of public databases allowed us to evaluate the scope and informative capacity of sequences from third-party public databases and to detect signals related to viruses of clinical or environmental importance, which allowed us to infer traits associated with probable transmission routes or signals of ecological disequilibrium. The evaluation of our results showed that in most cases the size and type of the reference database, the percentage of guanine–cytosine (GC), and the length of the query sequences greatly influence the taxonomic classification of the sequences. In sum, our findings describe how the exploration of public genomic data can be exploited as an approach for epidemiological surveillance and the understanding of the virosphere.

南美洲大陆拥有极为多样的生物群系(biome),其生态系统正持续受到人类活动扩张的威胁。近数十年来,对人类种群及生态系统造成影响的病毒种群的出现与再现呈上升趋势。鉴于基因组数据的持续积累,我们探究了与南美洲相关的公共数据库的潜力,以挖掘有助于病毒圈(virosphere)研究的信号。因此,本研究旨在对公共数据库展开调研,重点关注具有医学与生态学重要性的病毒的监测工作。在此,我们分析表征了近十年间19项独立研究产生的120份‘序列读取档案(sequence read archives)’宏基因组样本。粗略来看,我们的分析仅在总序列中检出0.38%的病毒序列,其中RNA病毒的占比更高。在所分析的各类环境样本中,病毒序列丰度最高的宏基因组分别为:1)亚马逊河水生样本;2)巴西利亚污水样本;3)圣保罗州土壤样本。而动物传播相关的病毒信号则在里约热内卢的蚊虫样本以及亚马逊地区的蝙蝠样本中被检出。此外,将病毒信号归类为操作分类单元(operational taxonomic units, OTUs)(科水平)后,我们可通过元数据推断每份分析样本中检出的病毒组(virome)的潜在宿主范围。此外,多个基序(motif)及病毒序列与披膜病毒科(Togaviridae)、沙粒病毒科(Arenaviridae)和黄病毒科(Flaviviridae)中具有暴发潜力的特定病毒相关。在此背景下,对公共数据库的挖掘使我们得以评估第三方公共数据库中序列的覆盖范围与信息价值,并检出具有临床或生态学重要性的病毒信号,由此可推断与潜在传播途径相关的特征,或是生态失衡的信号。对研究结果的评估显示,在多数情况下,参考数据库的规模与类型、鸟嘌呤-胞嘧啶(guanine-cytosine, GC)含量以及查询序列的长度,都会对序列的分类学鉴定产生显著影响。综上,本研究结果阐明了如何将公共基因组数据的挖掘方法应用于流行病学监测及病毒圈的研究中。
创建时间:
2022-01-21
二维码
社区交流群
二维码
科研交流群
商业服务