MOESM1 of Classification and characterization of human endogenous retroviruses; mosaic forms are common
收藏DataCite Commons2024-12-17 更新2024-07-25 收录
下载链接:
https://springernature.figshare.com/articles/dataset/MOESM1_of_Classification_and_characterization_of_human_endogenous_retroviruses_mosaic_forms_are_common/4370294/1
下载链接
链接失效反馈官方服务:
资源简介:
Additional file 1: Table S1. Excel table generated from the master.dbf table. Field names that need an explanation, and are not explained in the main text, are; “Subgenes”: Presence of motif hits belonging to portions of LTRs and the four major genes (from ReTe); “Chainscore”: Weighted sum of motif hits calculated by ReTe, ranging from 300 to 2500; “Breaks”: ReTe detected two proviral portions seemingly belonging together but separated by a longer than normal distance, therefore disregarded the intervening sequence, its start and stop shown in this field; the PBS fields are described in the main text; “Gagscore” (as well as scores for the other three major genes) shows the degree of fit of the putein to the reference proteins in the best fitting genus-specific alignment included in ReTe; “Tperc”, “Aperc” etc.: percentage of each nucleotide in the chain (from ReTe); “Bestrefrv”: Best fitting nucleotide sequence out of a set of reference retroviral nucleic acid sequences, together with % identity and total length of the reference sequence (from ReTe); “Polclass”: best fitting reference Pol amino acid sequence, with the score of the reference sequence to itself/score of the query sequence to the reference sequence (from ReTe); The five “idpc” fields show the % identity to group consensus for dna and the four major proteins; the “Envidpc2” field shows % identity to the Env subgroup2 consensus; “Repantist” shows antisense portions of a repeatmasker Simage; the five Simage field collections show nucleic acid Simages for repeatmasker (rep), HML (hml), reference sequence collection (ref), first (con1) and second [con2, this paper (Additional file 3: list S3), including best representative (bre) noncanonical or single canonical sequences] consensus collections, respectively. For each Simage are also shown a quantification (-simgst), a list (-simgls) explaining the letter symbols, and the sense relative to the chain sense (-simgse); The con2 set also contains the field “con2simgtg”, which depicts the presence of LTR (5 and 3), Gag (G), Pro (R), Pol (P) and Env (E) in each chain twentieth; The “twomost” fields show the two most frequent (with number of hits out of twenty) AutoFrame hits per the four major genes; The ensuing Simages show the distribution of AutoFrame hits per putein for each gene followed by a hit list and a quantification like for the nucleotide Simages; “Isd” is the “immunosuppressive domain” calculated from the envelope evaluation program henzyscore or identified manually; “Envhpoints” is the score from henzyscore; “Envgroup2” shows the envelope subgroup (like “HERVT_A”, in the main text often shown as “hervta”); “Envqscore” is the output from the envelope quality control program EnvQual.
提供机构:
Figshare
创建时间:
2017-12-19



