Supplementary Datasets for "Phylogenetic relatedness rather than aquatic habitat fosters horizontal transfer of transposable elements in animals"
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14514501
下载链接
链接失效反馈官方服务:
资源简介:
Dataset S1. Metadata on the 247 assemblies of the study.Information on species name, their GenBank IDs, their taxonomic group as defined in this study, their habitat, the busco database used to run Busco, their genome size, N50, the percentage of complete BUSCO genes we recovered, and statistics on their TE content.
Dataset S2. Phylogenetic tree of the 247 species in newick format.
Dataset S3. TE-TE hits clustered per clade.This table is the final output of script 12. Each line corresponds to one TE-TE hit. It only contains hits that passed the three filters in Figure S2. “copy1” and “copy2” indicate the names of the two copies invoved in the hit, while “ID.1” and “ID.2” corresponds to their numerical identifier. “TEconsensus.1” and “TEconsensus.2” are the name of the consensus given by RepeatModeler. “superfamily” indicates the superfamily of both TE copies. “species.1” and “species.2” indicate the species names of the two host genomes. “divTime” is the additive divergence time between both species and “mrca” indicates their clade identifier as in the newick tree. “pID”, “length”, “qStart”, and “sEnd” are the output of the second similarity search (dc-megablast). dS and dN indicate the synonymous and the non-synonymous distance, reciprocally. “length.aa” indicates the length of the coding region. All hits of a same community get the same value in “community” and those of a same hit group get the same value in “hitGroup”. Thus, all TE copies involved in a same transfer can be recovered by selecting lines whose hit group value is the same. “independent” indicates whether this hit can be explained by another one (FALSE) or not (TRUE). “subclass” indicates whether this TE superfamily is a Class 1 (RNA) or Class 2 (DNA) TE. The number of independent events of horizontal transfers across the dataset conresponds to the number of unique values of hit groups for which "independent" equals true.
Dataset S4. TE-TE hits clustered per pair of species.Each line corresponds to one TE-TE hit. It only contains hits that passed the three filters in Figure S2. All columns are the same as in Dataset3, except that “community” and “hitGroup” correspond to clustering per pair of species. Here, each count of hit groups between a two species is not affected by the other species composing the dataset. However, an event of horizontal transfer can be counted several times in different pairs of related species.
Dataset S5. Number of horizontal transfers we recovered in each pair of species.From dataset S4, we counted the total number of horizontal transfers (column “n”) in which each pair of species is involved. It contains all pairs of species for which we could look for horizontal transfers, even those involved in no horizontal transfer. Species of a same species unit have the same “spClade”. Species alone in their species unit takes their own species name, otherwise it takes the mrca value of the last common ancestor of the species unit. “pairClade” indicates the pair of species unit corresponding to the pair of species.
Dataset S6. Total number and total predicted number of horizontal transfers.From dataset S4, we counted the total number of horizontal transfers in which each species is involved, seperately for TE of Class 1 and 2. Predicted numbers result of the Bayesian modeling, in which additive divergence time was fixed at 500Myrs and habitat was assumed to be shared.
创建时间:
2024-12-18



