Gene Annotations of 49 Bacillariophyta Genome Assemblies
收藏Zenodo2025-05-02 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.15327098
下载链接
链接失效反馈官方服务:
资源简介:
Contact: katharina.hoff@uni-greifswald.de.
Manuscript
The data hosted here is associated with the preprint https://doi.org/10.48550/arXiv.2410.05467
Files
The following gff3-files with structural and functional genome annotation are included in the compressed archive Bacillariophyta_annotations.tar.gz:
Asterionella_formosa.gff3Asterionellopsis_glacialis.gff3Bacterosira_constricta.gff3Chaetoceros_muellerii.gff3concatenated_output.gff3Conticribra_guillardii.gff3Conticribra_weissflogii.gff3Craspedostauros_australis.gff3Cyclostephanos_invisitatus.gff3Cyclostephanos_tholiformis.gff3Cyclotella_atomus.gff3Cyclotella_baltica.gff3Cyclotella_choctawhatcheeana.gff3Cyclotella_cryptica.gff3Cylindrotheca_fusiformis.gff3Detonula_confervacea.gff3Discostella_pseudostelligera.gff3Discostella_stelligera.gff3Discostella_stelligeroides.gff3Epithemia_pelagica.gff3Fistulifera_pelliculosa.gff3Fistulifera_solaris.gff3Fragilaria_radians.gff3Fragilariopsis_cylindrus.gff3Licmophora_abbreviata.gff3Mediolabrus_comicus.gff3Nitzschia_palea.gff3Nitzschia_putrida.gff3Porosira_glacialis.gff3Psammoneis_japonica.gff3Pseudo-nitzschia_multiseries.gff3Pseudo-nitzschia_pungens.gff3Skeletonema_costatum.gff3Skeletonema_marinoi.gff3Skeletonema_menzelii.gff3Skeletonema_potamos.gff3Skeletonema_tropicum.gff3Stephanocyclus_meneghinianus.gff3Stephanodiscus_minutulus.gff3Stephanodiscus_triporus.gff3Thalassiosira_allenii.gff3Thalassiosira_delicatula.gff3Thalassiosira_exigua.gff3Thalassiosira_gravida.gff3Thalassiosira_livingstoniorum.gff3Thalassiosira_mediterranea.gff3Thalassiosira_oceanica.gff3Thalassiosira_ordinaria.gff3Thalassiosira_pacifica.gff3Thalassiosira_profunda.gff3
To extract the dataset, execute the following command:
tar -xvf Bacillariophyta_annotations.tar.gz
Genome Assemblies
The files in this folder attain to genome assemblies are publicly available at NCBI datasets (https://www.ncbi.nlm.nih.gov/datasets/). We used the following versions:
Asterionella formosa GCA_002256025.1
Asterionellopsis glacialis GCA_014885115.2
Bacterosira constricta GCA_037356235.1
Chaetoceros muellerii GCA_019693545.1
Conticribra guillardii GCA_036939335.1
Conticribra weissflogii GCA_036940025.1
Craspedostauros australis GCA_026770025.1
Cyclostephanos invisitatus GCA_036939675.1
Cyclostephanos tholiformis GCA_036939975.1
Cyclotella atomus GCA_036939935.1
Cyclotella baltica GCA_036939635.1
Cyclotella choctawhatcheeana GCA_036939855.1
Cyclotella cryptica GCA_013187285.1
Cylindrotheca fusiformis GCA_019693525.1
Detonula confervacea GCA_036939415.1
Discostella pseudostelligera GCA_036940085.1
Discostella stelligera GCA_036939735.1
Discostella stelligeroides GCA_036939555.1
Epithemia pelagica GCA_946965045.2
Fistulifera pelliculosa GCA_026008555.1
Fistulifera solaris GCA_030295235.1
Fragilaria radians GCA_900642245.1
Fragilariopsis cylindrus GCA_900095095.1
Licmophora abbreviata GCA_900291995.1
Mediolabrus comicus GCA_036940125.1
Nitzschia palea GCA_019593585.1
Nitzschia putrida GCA_016586335.1
Porosira glacialis GCA_036939395.1
Psammoneis japonica GCA_008632985.1
Pseudo-nitzschia multiseries GCA_037355745.1
Pseudo-nitzschia pungens GCA_037355855.1
Skeletonema costatum GCA_018806925.1
Skeletonema marinoi GCA_030544225.1
Skeletonema menzelii GCA_036940005.1
Skeletonema potamos GCA_036940105.1
Skeletonema tropicum GCA_037178625.1
Stephanocyclus meneghinianus GCA_036940045.1
Stephanodiscus minutulus GCA_036939435.1
Stephanodiscus triporus GCA_036939755.1
Thalassiosira allenii GCA_036939655.1
Thalassiosira delicatula GCA_036939835.1
Thalassiosira exigua GCA_036939895.1
Thalassiosira gravida GCA_037356215.1
Thalassiosira livingstoniorum GCA_036939595.1
Thalassiosira mediterranea GCA_036939795.1
Thalassiosira oceanica GCA_019693575.1
Thalassiosira ordinaria GCA_036939695.1
Thalassiosira pacifica GCA_036939875.1
Thalassiosira profunda GCA_036939355.1
Converting to Protein FASTA and Coding Sequences FASTA
To save storage place at Zenodo, we did not upload the protein FASTA and coding sequence FASTA files. They can easily be generated from the genome FASTA file in combination with the respective GFF3 file. To do this, you can use the following commands:
# assume that genome.fa ist you respective genome FASTA file downloaded from NCBI datasets
sed '/^>/ s/ .*//' genome.fasta > genome_short_headers.fasta
# assume that file.gff is the respective GFF3 file
getAnnoFastaFromJoingenes.py -g genome_short_headers.fasta -3 file.gff -o nameStem
This will produce the following files: nameStem.aa (protein FASTA file) and nameStem.codingseq (coding sequence FASTA file).
The getAnnoFastaFromJoingenes.py script is available at https://raw.githubusercontent.com/Gaius-Augustus/Augustus/master/scripts/getAnnoFastaFromJoingenes.py . It is part of the AUGUSTUS software package.
Release notes
This release contains a gene set where a results of an OrthoFinder run that did not include genes on contigs that are suspected to be contaminants or horizontal gene transfer candidates were used to filter single exon genes. This means the gene and transcript counts changed compared to the previous release.
License
The genome annotation files are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
提供机构:
Zenodo
创建时间:
2025-05-02



