Transposable element libraries from 101 fish

NIAID Data Ecosystem2026-05-01 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.4xgxd25g9

下载链接

链接失效反馈

官方服务：

资源简介：

Repetitive DNA make up a considerable fraction of most eukaryotic genomes. In fish, transposable element (TE) activity has coincided with rapid species diversification. Here, we annotated the repetitive content in 100 genome assemblies, covering the major branches of the diverse lineage of teleost fish. We investigated if TE content correlates with family level net diversification rates and found support for a weak negative correlation. Further, we found that TE proportion correlate to genome size, but not to the proportion of short tandem repeats (STRs), which implies independent evolutionary paths. Marine and freshwater fish have large differences in STR content. The most extreme propagation was found in the genomes of codfish species and Atlantic herring. Such a high density of STRs is likely to increase the mutational load, which we propose could be counterbalanced by high fecundity as seen in codfishes and herring. Methods For TE annotation, we used a variant of the computational pipeline that is more thoroughly described in (Tørresen et al. 2017), available at https://github.com/uio-cels/Repeats. The pipeline includes multiple TE detection steps using different tools, steps for removing non-TEs from the detected sequences and steps for classifying the elements. For the initial detection step, we used RepeatModeler (v. 1.0.8) (Smit & Hubley 2008-2015) and LTRharvest (part of GenomeTools v. 1.5.7) (Ellinghaus et al. 2008). RepeatModeler detects all sorts of repetitive sequences and LTRharvest is specialized for detecting LTR-RTs. Using BLASTX, TEs with sequences matching known non-TEs in UniProtKB/Swiss-Prot were removed. To classify the TEs, we used RepeatClassifier, which is a part of the RepeatModeler software. As the tool did not manage to classify all of the remaining sequences, additional similarity searches were performed between the sequences and a curated library of TE sequences (RepBase v. 20150807), using nucleotide BLAST. Finally, we built Hidden Markov Model profiles from the detected sequences using HMMER (v. 3.1b1) (Wheeler & Eddy 2013) and compared the profiles with HMM profiles from databases downloaded from GyDB.org (Llorens et al. 2011) and dfam.org (Hubley et al. 2016), using the nhmmer feature included in HMMER. This resulted in additional sequences being classified at the class and subclass level. The pipeline resulted in one de novo library per assembly, which contained the consensus sequences of the interspersed repeats detected in each assembly.

创建时间：

2023-09-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集