Hepatitis B Virus (HBV) Genotype and Subtype Reference Sequences
收藏DataCite Commons2020-08-26 更新2024-07-27 收录
下载链接:
https://figshare.com/articles/Hepatitis_B_Virus_HBV_Genotype_and_Subtype_Reference_Sequences/8851946
下载链接
链接失效反馈官方服务:
资源简介:
The files in this dataset support an analysis of hepatitis B virus (HBV) diversity and classification of genotypes and subtypes.We have set out to produce a resource comprising alignments of published full-length HBV genetic sequences for different genotypes. These were downloaded from the Hepatitis B Virus Database (https://hbvdb.ibcp.fr/HBVdb/) in January 2019. The reference sequences we propose in each case are defined as being the closest biological sequence to the consensus for each genotype or subtype, with an associated publication that has the fewest deletions/insertions relative to a typical isolate of the subtype.Nucleotide sequence alignments are all provided in FASTA format. Maximum-likelihood tree files are provided unrooted in Newick format and were generated with RAxML (https://cme.h-its.org/exelixis/software.html).<br>Further details on the study are available in this Biorxiv publication: https://doi.org/10.1101/831891 <br>Uploaded files:1. <b>Alignment of all HBV sequences.</b> Alignment of full-length HBV sequences (n = 6412) downloaded in Jan 2019 from HBVdb, supplemented with 71 sequences belonging to genotype I and to under-represented genotype C and D sequences (in the HBVdb database) that were identified on Genbank.2.<b> Alignment of all HBV sequences (stripped).</b> Alignment of retained sequences (n = 2839) after removal of identical and highly similar sequences (within ≤1% of each other, as assessed by pairwise distance). Sequences with the least amount of ambiguous sites were retained for the analysis.3. <b>Alignment of HBV reference sequences.</b> Alignment of representative reference sequences selected for the HBV genotypes and subtypes. The genotype A strain X02763 (widely used as a numbering reference) and genotype D isolate NC_003977.2 (the current NCBI HBV reference strain) were included in the alignment. A table detailing the accession numbers and associated publications of the selected references is available in this Biorxiv publication: link4. <b>Alignment of HBV genotype reference sequences.</b> Alignment of selected genotype reference sequences, also including the genotype A strain X02763 (widely used as a numbering reference), and genotype D isolate NC_003977.2 (the current NCBI HBV reference strain). A table detailing the accession numbers of the selected references is available in this Biorxiv publication: link5. <b>Maximum likelihood phylogenetic trees for HBV genotypes A-F, H and I.</b> Only three sequences of Genotype G remained after excluding similar sequences and there is only a single known isolate of (putative) genotype J, so they were excluded from the analysis. All trees were generated with 1000 bootstrap replicates.a. <b>All genotypes tree.</b> Tree of all the retained sequences after removal of highly similar sequences (sequences given in alignment 2).b. <b>Genotype A tree</b>c. <b>Genotype B tree</b>d. <b>Genotype C tree</b>e. <b>Genotype D tree</b>f. <b>Genotype E tree</b>g. <b>Genotype F tree</b>h. <b>Genotype H tree</b>i. <b>Genotype I tree</b>j. <b>Reference sequences tree.</b> Tree of representative reference sequences of HBV genotypes and subtypes<br> <br>
本数据集文件支持乙型肝炎病毒(hepatitis B virus, HBV)多样性分析及基因型、亚型分类研究。本研究旨在构建一个涵盖不同基因型已发表全长HBV基因序列联配的资源库。所有序列于2019年1月从乙型肝炎病毒数据库(Hepatitis B Virus Database, https://hbvdb.ibcp.fr/HBVdb/)下载获取。
本研究为各基因型或亚型定义的参考序列,为与该类群共识序列最接近的生物学序列,且对应已发表文献中,该序列相对于该亚型典型分离株的缺失/插入位点最少。
核苷酸序列联配文件均采用FASTA格式存储。最大似然法系统发育树文件以未根化Newick格式存储,由RAxML(https://cme.h-its.org/exelixis/software.html)生成。
本研究详细信息可查阅以下BioRxiv预印本:https://doi.org/10.1101/831891
上传文件如下:
1. **所有HBV序列联配文件**:2019年1月从HBVdb下载的全长HBV序列联配(共6412条),补充了从基因银行(GenBank)中鉴定得到的71条基因型I序列,以及HBVdb数据库中代表性不足的基因型C和D序列。
2. **精简版所有HBV序列联配文件**:移除完全相同或高度相似(两两序列距离≤1%)的序列后保留的2839条序列的联配。分析过程中保留了歧义位点最少的序列。
3. **HBV参考序列联配文件**:针对HBV各基因型和亚型筛选得到的代表性参考序列的联配。本联配包含基因型A参考菌株X02763(广泛用作序列编号参考)以及基因型D分离株NC_003977.2(当前美国国家生物技术信息中心(NCBI)官方HBV参考菌株)。包含筛选得到的参考序列登录号及对应文献的详细表格可查阅本BioRxiv预印本。
4. **HBV基因型参考序列联配文件**:筛选得到的基因型参考序列联配,同样包含基因型A菌株X02763和基因型D分离株NC_003977.2。包含筛选得到的参考序列登录号的详细表格可查阅本BioRxiv预印本。
5. **HBV基因型A-F、H、I的最大似然法系统发育树**:基因型G在移除高度相似序列后仅剩余3条序列,而已知的(推定)基因型J分离株仅1株,因此二者均未纳入本次分析。所有系统发育树均通过1000次自展重复生成。
a. **全基因型系统发育树**:移除高度相似序列后保留的所有序列(即联配文件2中的序列)构建的系统发育树。
b. **基因型A系统发育树**
c. **基因型B系统发育树**
d. **基因型C系统发育树**
e. **基因型D系统发育树**
f. **基因型E系统发育树**
g. **基因型F系统发育树**
h. **基因型H系统发育树**
i. **基因型I系统发育树**
j. **参考序列系统发育树**:HBV各基因型和亚型代表性参考序列构建的系统发育树
提供机构:
figshare
创建时间:
2019-11-06
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含乙型肝炎病毒(HBV)不同基因型和亚型的参考序列比对及系统发育树文件,用于研究HBV的遗传多样性和分类。数据来源于HBV数据库和GenBank,经过筛选去除了高度相似的序列,并提供了代表性参考序列的详细信息和相关出版物链接。
以上内容由遇见数据集搜集并总结生成



