five

Reconstruction of full-length LINE-1 progenitors from ancestral genomes (Supplementary Data)

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6338535
下载链接
链接失效反馈
官方服务:
资源简介:
Web Supplementary Files Web Supplementary File 1 - FASTA files containing full-length reconstruction input sequences: full_length_reconstruction_input_sequence_fastas.zip Web Supplementary File 2 - FASTA files containing Muscle alignments of the full-length reconstruction input sequences. full_length_reconstruction_input_sequence_alns.zip Web Supplementary File 3 - FASTA file of full-length reconstructed sequences: full_length_reconstructions.fa Web Supplementary File 4 - Table of full-length reconstruction statistics: full_length_reconstruction_stats.csv Web Supplementary File 5 - FASTA files containing ORF reconstruction input sequences: orf_fastas.zip Web Supplementary File 6 - FASTA files containing Macse alignments of the ORF reconstruction input sequences: ORF_reconstruction_input_sequence_alns.zip Web Supplementary File 7 - Table of ORF reconstruction statistics: ORF_reconstructions.fa Web Supplementary File 8 - Table of ORF reconstruction statistics: ORF_reconstruction_stats.csv Web Supplementary File 9 - Table of Composite Sequences: bestfl_selection_fixed_CS_seqs.csv Web Supplementary File 10 - Database of gold standards: L1_goldstandards.csv Data Underlying Figures RepeatMasker scans of hg38 and ancestral genomes: anc_gen_RM_out_files.zip Figure 4 4A Source alignment of 54 composite sequences: 220121_dropped12+L1ME3A_muscle.nt.afa Tree produced using the alignment and FastTree: 220121_dropped12+L1ME3A.tree 4B Source alignment of 67 Dfam L1 subfamily 3’ end models: 200123_dfam_3ends.fa.muscle.aln Tree produced using the alignment: 200123_dfam_3ends.fa.muscle.aln.tree Figure 5 KZFP-TE enrichment p-values (from Barazandeh et al 2018): TE_KZFP_enrichment_pvals.xlsx KZFP-TE top 500 peak overlap (from Barazandeh et al 2018): top500_peak_overlap.xlsx Figure 6 RepeatMasker .out file for the Composite Sequence custom library queried against hg38: CS_RM_hg38.fa.out.gz Figure S2 RepeatMasker scan .out file of hg38 (CG corrected Kimura Divergence values are in last column): hg38+KimDiv_RM.out RepeatMasker scan .out file of the Progressive Cactus eutherian ancestral genome (CG corrected Kimura Divergence values are in last column): Progressive_Cactus_Euth+KimDiv_RM.out RepeatMasker scan .out file of the Ancestors 1.1 eutherian ancestral genome (CG corrected Kimura Divergence values are in last column): Ancestors_Euth+KimDiv_RM.out Figure S5 RepeatMasker scan .out files for Progressive Cactus simian and primate reconstructed ancestral genomes: progCactus_RM_outfiles.zip S5A FASTA files containing Cactus genome-derived reconstructed sequences equivalent to the L1MA2, L1MA4, and L1MD1-3 best full-length sequences: progCactus_reconstruction_bestFL_equivalents.zip S5B FASTA files containing Muscle alignments of Cactus genome-derived full-length reconstruction input sequences: progCactus_reconstruction_input_sequence_alns.zip Figure S6 S6A Results of Conserved Domain scans of Cactus genome-derived full-length reconstructed sequences: CD_search_results_short_nms.txt S6B-D Character posterior probabilities of “best” full-length reconstructed sequences: best_fl_post_probs.zip Figure S7 S7B-C Results of Conserved Domain scans of translated initial full-length reconstructed sequences: initial_recons_all_3frametrans_CD-search.txt Results of Conserved Domain scans of translated reconstructed ORFs: recons_ORF1-2_all_3frametrans_CD-search.csv Figure S15 S15A Source alignment of 67 composite sequences: bestfl_selection_fixed_CS_seqs_muscle.nt.afa Tree produced using the alignment: bestfl_selection_fixed_CS_seqs_muscle.nt.afa.tree S15B-E Source Muscle alignments for phylogenetic trees of reconstructed sequence components: ORF2: ORF2_keep54_muscle.nt.afa 5’ UTR: 5utr_keep54_muscle.nt.afa ORF1: ORF1_keep54_muscle.nt.afa 3’ UTR: 3utr_keep54_muscle.nt.afa Trees produced using above alignments: ORF2: ORF2_keep54_muscle.nt.afa.tree 5’ UTR: 5utr_keep54_muscle.nt.afa.tree ORF1: ORF1_keep54_muscle.nt.afa.tree 3’ UTR: 3utr_keep54_muscle.nt.afa.tree Figure S17 Unfiltered BLAST results of Composite Sequences queried against hg38: CS_hg38_blastn.csv.zip BED file of L1 instances annotated using BLAST pipeline: BLAST_L1_hits.bed
创建时间:
2022-05-23
二维码
社区交流群
二维码
科研交流群
商业服务