Reconstruction of full-length LINE-1 progenitors from ancestral genomes (Supplementary Data)
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6338535
下载链接
链接失效反馈官方服务:
资源简介:
Web Supplementary Files
Web Supplementary File 1 - FASTA files containing full-length reconstruction input sequences: full_length_reconstruction_input_sequence_fastas.zip
Web Supplementary File 2 - FASTA files containing Muscle alignments of the full-length reconstruction input sequences. full_length_reconstruction_input_sequence_alns.zip
Web Supplementary File 3 - FASTA file of full-length reconstructed sequences: full_length_reconstructions.fa
Web Supplementary File 4 - Table of full-length reconstruction statistics: full_length_reconstruction_stats.csv
Web Supplementary File 5 - FASTA files containing ORF reconstruction input sequences: orf_fastas.zip
Web Supplementary File 6 - FASTA files containing Macse alignments of the ORF reconstruction input sequences: ORF_reconstruction_input_sequence_alns.zip
Web Supplementary File 7 - Table of ORF reconstruction statistics: ORF_reconstructions.fa
Web Supplementary File 8 - Table of ORF reconstruction statistics: ORF_reconstruction_stats.csv
Web Supplementary File 9 - Table of Composite Sequences: bestfl_selection_fixed_CS_seqs.csv
Web Supplementary File 10 - Database of gold standards: L1_goldstandards.csv
Data Underlying Figures
RepeatMasker scans of hg38 and ancestral genomes: anc_gen_RM_out_files.zip
Figure 4
4A
Source alignment of 54 composite sequences: 220121_dropped12+L1ME3A_muscle.nt.afa
Tree produced using the alignment and FastTree: 220121_dropped12+L1ME3A.tree
4B
Source alignment of 67 Dfam L1 subfamily 3’ end models: 200123_dfam_3ends.fa.muscle.aln
Tree produced using the alignment: 200123_dfam_3ends.fa.muscle.aln.tree
Figure 5
KZFP-TE enrichment p-values (from Barazandeh et al 2018): TE_KZFP_enrichment_pvals.xlsx
KZFP-TE top 500 peak overlap (from Barazandeh et al 2018): top500_peak_overlap.xlsx
Figure 6
RepeatMasker .out file for the Composite Sequence custom library queried against hg38: CS_RM_hg38.fa.out.gz
Figure S2
RepeatMasker scan .out file of hg38 (CG corrected Kimura Divergence values are in last column): hg38+KimDiv_RM.out
RepeatMasker scan .out file of the Progressive Cactus eutherian ancestral genome (CG corrected Kimura Divergence values are in last column): Progressive_Cactus_Euth+KimDiv_RM.out
RepeatMasker scan .out file of the Ancestors 1.1 eutherian ancestral genome (CG corrected Kimura Divergence values are in last column): Ancestors_Euth+KimDiv_RM.out
Figure S5
RepeatMasker scan .out files for Progressive Cactus simian and primate reconstructed ancestral genomes: progCactus_RM_outfiles.zip
S5A
FASTA files containing Cactus genome-derived reconstructed sequences equivalent to the L1MA2, L1MA4, and L1MD1-3 best full-length sequences: progCactus_reconstruction_bestFL_equivalents.zip
S5B
FASTA files containing Muscle alignments of Cactus genome-derived full-length reconstruction input sequences: progCactus_reconstruction_input_sequence_alns.zip
Figure S6
S6A
Results of Conserved Domain scans of Cactus genome-derived full-length reconstructed sequences: CD_search_results_short_nms.txt
S6B-D
Character posterior probabilities of “best” full-length reconstructed sequences: best_fl_post_probs.zip
Figure S7
S7B-C
Results of Conserved Domain scans of translated initial full-length reconstructed sequences: initial_recons_all_3frametrans_CD-search.txt
Results of Conserved Domain scans of translated reconstructed ORFs: recons_ORF1-2_all_3frametrans_CD-search.csv
Figure S15
S15A
Source alignment of 67 composite sequences: bestfl_selection_fixed_CS_seqs_muscle.nt.afa
Tree produced using the alignment: bestfl_selection_fixed_CS_seqs_muscle.nt.afa.tree
S15B-E
Source Muscle alignments for phylogenetic trees of reconstructed sequence components:
ORF2: ORF2_keep54_muscle.nt.afa
5’ UTR: 5utr_keep54_muscle.nt.afa
ORF1: ORF1_keep54_muscle.nt.afa
3’ UTR: 3utr_keep54_muscle.nt.afa
Trees produced using above alignments:
ORF2: ORF2_keep54_muscle.nt.afa.tree
5’ UTR: 5utr_keep54_muscle.nt.afa.tree
ORF1: ORF1_keep54_muscle.nt.afa.tree
3’ UTR: 3utr_keep54_muscle.nt.afa.tree
Figure S17
Unfiltered BLAST results of Composite Sequences queried against hg38: CS_hg38_blastn.csv.zip
BED file of L1 instances annotated using BLAST pipeline: BLAST_L1_hits.bed
创建时间:
2022-05-23



