Hepatitis B Virus (HBV) Genotype and Subtype Reference Sequences
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://figshare.com/articles/dataset/Hepatitis_B_Virus_HBV_Genotype_and_Subtype_Reference_Sequences/8851946
下载链接
链接失效反馈官方服务:
资源简介:
The files in this dataset support an analysis of hepatitis B virus (HBV) diversity and classification of genotypes and subtypes.
We have set out to produce a resource comprising alignments of published full-length HBV genetic sequences for different genotypes. These were downloaded from the Hepatitis B Virus Database (https://hbvdb.ibcp.fr/HBVdb/) in January 2019. The reference sequences we propose in each case are defined as being the closest biological sequence to the consensus for each genotype or subtype, with an associated publication that has the fewest deletions/insertions relative to a typical isolate of the subtype.
Nucleotide sequence alignments are all provided in FASTA format. Maximum-likelihood tree files are provided unrooted in Newick format and were generated with RAxML (https://cme.h-its.org/exelixis/software.html).
Further details on the study are available in this Biorxiv publication: https://doi.org/10.1101/831891
Uploaded files:
1. Alignment of all HBV sequences. Alignment of full-length HBV sequences (n = 6412) downloaded in Jan 2019 from HBVdb, supplemented with 71 sequences belonging to genotype I and to under-represented genotype C and D sequences (in the HBVdb database) that were identified on Genbank.
2. Alignment of all HBV sequences (stripped). Alignment of retained sequences (n = 2839) after removal of identical and highly similar sequences (within ≤1% of each other, as assessed by pairwise distance). Sequences with the least amount of ambiguous sites were retained for the analysis.
3. Alignment of HBV reference sequences. Alignment of representative reference sequences selected for the HBV genotypes and subtypes. The genotype A strain X02763 (widely used as a numbering reference) and genotype D isolate NC_003977.2 (the current NCBI HBV reference strain) were included in the alignment. A table detailing the accession numbers and associated publications of the selected references is available in this Biorxiv publication: link
4. Alignment of HBV genotype reference sequences. Alignment of selected genotype reference sequences, also including the genotype A strain X02763 (widely used as a numbering reference), and genotype D isolate NC_003977.2 (the current NCBI HBV reference strain). A table detailing the accession numbers of the selected references is available in this Biorxiv publication: link
5. Maximum likelihood phylogenetic trees for HBV genotypes A-F, H and I. Only three sequences of Genotype G remained after excluding similar sequences and there is only a single known isolate of (putative) genotype J, so they were excluded from the analysis. All trees were generated with 1000 bootstrap replicates.
a. All genotypes tree. Tree of all the retained sequences after removal of highly similar sequences (sequences given in alignment 2).
b. Genotype A tree
c. Genotype B tree
d. Genotype C tree
e. Genotype D tree
f. Genotype E tree
g. Genotype F tree
h. Genotype H tree
i. Genotype I tree
j. Reference sequences tree. Tree of representative reference sequences of HBV genotypes and subtypes
本数据集所含文件支持乙型肝炎病毒(hepatitis B virus, HBV)多样性分析及基因型与亚型分类研究。
本研究旨在构建一个包含已发表的不同基因型乙型肝炎病毒全长基因序列联配的资源。相关序列于2019年1月从乙型肝炎病毒数据库(Hepatitis B Virus Database, https://hbvdb.ibcp.fr/HBVdb/)下载。本研究中提出的各基因型参考序列,定义为与对应基因型或亚型的共识序列最接近的生物学序列,且其关联出版物中描述的该序列相对于该亚型典型分离株的缺失/插入位点最少。
核苷酸序列联配均以FASTA格式提供。最大似然法进化树文件以Newick格式提供且未生根,使用RAxML软件生成(https://cme.h-its.org/exelixis/software.html)。
本研究的更多细节可参阅该BioRxiv预印本论文:https://doi.org/10.1101/831891
上传文件如下:
1. 全乙型肝炎病毒序列联配文件:2019年1月从HBVdb下载的全长HBV序列(n = 6412)的联配结果,补充了从GenBank中鉴定的71条基因型I序列,以及在HBVdb数据库中占比偏低的基因型C和D序列。
2. 去重后乙型肝炎病毒序列联配文件:移除完全相同及高度相似(两两距离≤1%)的序列后保留的2839条序列的联配结果。本次分析保留了歧义位点最少的序列。
3. 乙型肝炎病毒参考序列联配文件:针对各HBV基因型和亚型筛选的代表性参考序列的联配结果。本联配包含基因型A菌株X02763(广泛用作编号参考)以及基因型D分离株NC_003977.2(当前NCBI的HBV参考菌株)。包含所选参考序列登录号及关联出版物的表格可参阅该BioRxiv预印本论文。
4. 乙型肝炎病毒基因型参考序列联配文件:筛选的基因型参考序列的联配结果,同样包含基因型A菌株X02763(广泛用作编号参考)以及基因型D分离株NC_003977.2(当前NCBI的HBV参考菌株)。包含所选参考序列登录号的表格可参阅该BioRxiv预印本论文。
5. 乙型肝炎病毒基因型A-F、H及I的最大似然法系统发育树。在移除相似序列后,基因型G仅剩余3条序列,且基因型J仅已知1个推定分离株,因此二者均未纳入本次分析。所有进化树均通过1000次自展重复生成。
a. 全基因型进化树:移除高度相似序列后保留的所有序列(即联配文件2中的序列)的进化树。
b. 基因型A进化树
c. 基因型B进化树
d. 基因型C进化树
e. 基因型D进化树
f. 基因型E进化树
g. 基因型F进化树
h. 基因型H进化树
i. 基因型I进化树
j. 参考序列进化树:HBV各基因型和亚型代表性参考序列的进化树。
创建时间:
2019-11-06



