Data for Genomes of nitrogen-fixing eukaryotes reveal an alternate path for organellogenesis

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://data.mendeley.com/datasets/rr9t3ccbc5

下载链接

链接失效反馈

官方服务：

资源简介：

Raw data files used in the analyses for the in-prep 2024 manuscript Genomes of nitrogen-fixing eukaryotes reveal a non-canonical model of organellogenesis . Summary below File descriptions: Genome files: Ep = E. pelagica, Ec = E. clementina. E. pelagica genome assembly is previously published and available at GCA_946965045.2, all E. pelagica files are compatible with the NCBI files. The diazoplast of E. clementina was previously published at GCA_029919255.1 [diatom].gtf de novo annotation feature file [diatom].faa translation of annotation feature file Ec.nuc.fasta nuclear genome of E. clementina Ec. mito.fasta mitochondrial genome of E. clementina Ec.chlo.fasta chloroplast genome of E. clementina Figure 1: 1A_SpeciesAccessions.xlsx All accessions for sequences 1C_[diatom].masked.comparative.summary.tbl Output tables form the denovo repeatmasker run. 1D_Pairwise percent identity.csv All pairwise % amino acid identity values of protein orthologues between species pairs 1E_DiatomOrthogroups.xlsx Orthogroups and genes within them (orthofinder Orthogroups.tsv file) 1E_Orthogroup_overlap_and_percent_identity_statistics.csv Pairwise % identity of protein orthologues and jaccard index of the overlap between orthogroups 1E_Individual_orthogroup_membership.csv Orthogroups and number of genes for each species contained within them (orthofinder Orthogroups.GeneCount.tsv file) 1E_HGTRawResults.xlsx Output of HGT search for all diatoms. The result fields include, HGT confidence based on tree topology, AI score, and top blast hit taxa 1E_Ec_HGT_trees.tar.gz Raw phylogenetic trees used to identify HGTs and summary of HGT calls for E. clementina 1E_Ep_HGT_trees.tar.gz As above, for E. pelagica 1E_StudyGeneCodesID.xlsx Table associating the simple gene codes used in the orthogroup analyses in this study to their full corresponding gene IDs on NCBI and other databases Figure 2: 2A_Ec.masked.regions.gff Repeat regions used for genome tracks Figure 4: 4B_4C_ProcessedProteomicsData.xlsx Processed and unprocessed proteomics tables 4B_4C_20230228_orbitrapqexactive_lysate.raw The raw proteomics data from whole cell lysate used for analysis in Figure 5B and C. Run on the orbitrap q-exactive 4B_4C_20231030_orbitrapeclipse_lysate.raw for whole cell lyste, run on the orbitrap eclipse 4B_4C_20230421_orbitrapeclipse_lysate.raw for whole cell lyste, run on the orbitrap eclipse 4B_4C_20231030_orbitrapeclipse_diazo.raw for diazoplast fraction, run on the orbitrap eclipse 4B_4C_20230421_orbitrapeclipse_diazo.raw for diazoplast fraction, run on the orbitrap eclipse 4B_4C_20230228_orbitrapqexactive_diazo.raw for diazoplast fraction, run on the orbitrap q-exactive

创建时间：

2025-07-21