Additional file 2 of Evolutionary divergent clusters of transcribed extinct truncated retroposons drive low mRNA expression and developmental regulation in the protozoan Leishmania
收藏DataCite Commons2024-10-29 更新2024-11-05 收录
下载链接:
https://springernature.figshare.com/articles/dataset/Additional_file_2_of_Evolutionary_divergent_clusters_of_transcribed_extinct_truncated_retroposons_drive_low_mRNA_expression_and_developmental_regulation_in_the_protozoan_Leishmania/27322690
下载链接
链接失效反馈官方服务:
资源简介:
Additional file 2: Figs. S1-S7. Fig. S1. Nucleotide composition of SIDER2 consensus and their hallmark 79-nt signature II sequence. Jalview alignment was performed on the SIDER2 consensus sequence encompassing 536 bp. Within sense SIDER2s, a conserved 79-nt signature (Signature II/SII) resides at the 5'-end, while an A-rich tail typically at the 3'-end (A left). The antisense SIDER2 is represented as the reverse complement of the sense element, featuring a T-rich stretch at the 3'-end and the SII at the 5'-end (A right). B-C Multiple sequence alignments of SIDER2 sequences to the SII consensus using hmmalign and represented by JALVIEW. Signature II consensus or its reverse complement (RC) aligned sequences are shown. The black consensus bars below the alignments show the frequency of the most conserved base indicated underneath. Five SIDER2 sequences within each HMMER group (Groups 1-5) (B) and 25 antisense SIDER2 sequences (C) shown here were randomly selected. Fig. S2. Phylogenetic and CD-Hit-Est analyses of L. infantum SIDER2 sequences. A The unrooted maximum likelihood phylogenetic tree was made using IQ-TREE (v.2.1.2) model GTR + I + G1 of 1448 SIDER2 sequences aligned by MAFFT (v.7.471). SIDER2 sequences were colored by the chromosome where they are located. Three sections were enlarged for better resolution. Tips in the enlarged boxes were named by chromosome followed by the start position of SIDER2. B Concordance between the phylogenetic and CD-Hit-Est analyses. An unrooted maximum likelihood tree generated as indicated in (A) depicted 189 SIDER2 sequences from five chromosomes (LinJ.14, LinJ.16, LinJ.20, LinJ.22 and LinJ.27) aligned by MAFFT (v.7.471). Tips on the tree were color-coded according to their respective chromosomes. Each SIDER2 element is labeled with its corresponding CD-Hit-Est cluster (see Methods). The same tree is represented in Fig. 2B, but with the tips labeled by the LinJ transcripts harboring SIDER2 elements. Fig. S3. Heatmap of regions with ≥ 80% of sequence identity shared between 189 SIDER2 sequences from five L. infantum chromosomes. SIDER2s from chromosomes 14, 16, 20, 22 and 27 were organized based on their genomic position and aligned using BLASTn (-task "blastn" -% identity 80, V.2.9.0 +). The percentage of query coverage per subject with ≥ 80% of identity was calculated by the outfmt option "qcovs". The darker is the blue color, the higher is the query coverage. Dashed gray lines indicate the strand-switch regions. Heatmap of 189 SIDER2 sequences showed that i) SIDER2 sequences on the same chromosome are more homologous to each other than between SIDER2s present on different chromosomes and ii) within the same chromosome, SIDER2 sequences can be part of different clusters which do not necessarily share regions of homology. Fig. S4. SIDER2-bearing transcripts have generally much longer 3'UTRs than non-SIDER2 transcripts. The length of 3'UTRs predicted by PRED-A-TERM was compared between SIDER2-containing (N = 1127; 409 low, 527 medium-to-low, and 191 high expressed) and non-SIDER2 transcripts (N = 7207; 1540 low, 3726 medium, and 1941 high expressed). All SIDER2-bearing transcripts have significantly longer (> 3 times) 3'UTRs than non-SIDER2 transcripts, regardless of their expression levels. The statistical significance of these observations was assessed using the Kruskal-Wallis test. Fig. S5. Expression levels of transcripts harboring sense vs. antisense SIDER2 sequences. Bar graph representing the expression levels in L. infantum (Li) promastigotes (Pro) of sense (n = 855) and antisense (n = 153) SIDER2-containing transcripts. Only the expression of SIDER2-containing transcripts harboring a single SIDER2 element within their 3’UTRs was analyzed to avoid using the expression of the same gene more than once. Significance level was assessed using the Wilcoxon rank-sum test and resulted by p-value = 0.5588. Fig. S6. Comparative genomic sequencing vs. RNA-sequencing analysis illustrates that SIDER2-containing mRNAs are largely less expressed than non-SIDER2 transcripts. The dot chart illustrates FPKM (Fragments Per Kilobase of transcript per Million mapped reads) levels obtained from Illumina DNA sequencing (represented in A) and Illumina RNA sequencing in B. SIDER2-containing mRNAs are denoted in red, while non-SIDER mRNAs are represented in grey. The green line represents the mean, and the blue line shows the median of mRNA expression levels. Notably, most of the SIDER2-containing mRNAs exhibit expression levels below the median expression of the L. infantum transcriptome. Fig. S7. Schematic representation of the different steps used in GO analysis. A total of 1127 SIDER2-containing transcripts was included in GO analysis. A The FASTA sequences of the ORFs were exported from TriTrypDB (https://tritrypdb.org/tritrypdb/app) and imported into OmicsBox version 1.2 (BioBam) (www.biobam.com/omicsbox). Coding regions were aligned to the NCBI database using BLASTX search (E-value ≤ 1.0 × 10-3). Subsequent GO mapping was performed using the Blast2GO mapping against the latest version of the GO database to obtain the functional labels. Sequences that shared similarities with known proteins in BLASTX searches with significant similarity (E < le-10) were identified using the online tool InterProScan 5.0. Next, the appropriate GO term was allocated to its respective predicted function using an e-value cut-off of 1.0 × 10-6 and an annotation cut-off of 55 evidence code set to 0.8 for the different categories as implemented in OmicsBox. B Final analysis found 799 (70.9%) genes with complete GO annotation. Pie graphs of the enriched GO terms were created for three categories: biological process, cellular component, and molecular function for the SIDER2-containing transcripts and their subclusters.
提供机构:
figshare
创建时间:
2024-10-29



