Data for Genomes of nitrogen-fixing eukaryotes reveal an alternate path for organellogenesis
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/rr9t3ccbc5
下载链接
链接失效反馈官方服务:
资源简介:
Raw data files used in the analyses for the in-prep 2024 manuscript Genomes of nitrogen-fixing eukaryotes reveal a non-canonical model of organellogenesis . Summary below
File descriptions:
Genome files:
Ep = E. pelagica, Ec = E. clementina. E. pelagica genome assembly is previously published and available at GCA_946965045.2, all E. pelagica files are compatible with the NCBI files. The diazoplast of E. clementina was previously published at GCA_029919255.1
[diatom].gtf de novo annotation feature file
[diatom].faa translation of annotation feature file
Ec.nuc.fasta nuclear genome of E. clementina
Ec. mito.fasta mitochondrial genome of E. clementina
Ec.chlo.fasta chloroplast genome of E. clementina
Figure 1:
1A_SpeciesAccessions.xlsx All accessions for sequences
1C_[diatom].masked.comparative.summary.tbl Output tables form the denovo repeatmasker run.
1D_Pairwise percent identity.csv All pairwise % amino acid identity values of protein orthologues between species pairs
1E_DiatomOrthogroups.xlsx Orthogroups and genes within them (orthofinder Orthogroups.tsv file)
1E_Orthogroup_overlap_and_percent_identity_statistics.csv Pairwise % identity of protein orthologues and jaccard index of the overlap between orthogroups
1E_Individual_orthogroup_membership.csv Orthogroups and number of genes for each species contained within them (orthofinder Orthogroups.GeneCount.tsv file)
1E_HGTRawResults.xlsx Output of HGT search for all diatoms. The result fields include, HGT confidence based on tree topology, AI score, and top blast hit taxa
1E_Ec_HGT_trees.tar.gz Raw phylogenetic trees used to identify HGTs and summary of HGT calls for E. clementina
1E_Ep_HGT_trees.tar.gz As above, for E. pelagica
1E_StudyGeneCodesID.xlsx Table associating the simple gene codes used in the orthogroup analyses in this study to their full corresponding gene IDs on NCBI and other databases
Figure 2:
2A_Ec.masked.regions.gff Repeat regions used for genome tracks
Figure 4:
4B_4C_ProcessedProteomicsData.xlsx Processed and unprocessed proteomics tables
4B_4C_20230228_orbitrapqexactive_lysate.raw The raw proteomics data from whole cell lysate used for analysis in Figure 5B and C. Run on the orbitrap q-exactive
4B_4C_20231030_orbitrapeclipse_lysate.raw for whole cell lyste, run on the orbitrap eclipse
4B_4C_20230421_orbitrapeclipse_lysate.raw for whole cell lyste, run on the orbitrap eclipse
4B_4C_20231030_orbitrapeclipse_diazo.raw for diazoplast fraction, run on the orbitrap eclipse
4B_4C_20230421_orbitrapeclipse_diazo.raw for diazoplast fraction, run on the orbitrap eclipse
4B_4C_20230228_orbitrapqexactive_diazo.raw for diazoplast fraction, run on the orbitrap q-exactive
创建时间:
2025-07-21



