Support Data for Bombus flavifrons and Bombus fervidus genome assemblies
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Support_Data_for_Bombus_flavifrons_and_Bombus_fervidus_genome_assemblies/29840165
下载链接
链接失效反馈官方服务:
资源简介:
Directory "mitochondrial_genomes": Contains FASTA files for putative assembled mitochondria detected using mitofinder. Note 2 essentially identical scaffolds were assembled for B. fervidus (differ by a small number of repeats in AT rich region) that can likely be collapsed for use.
Directory "mtDNA_alignments_species_mitocheck": Contains two FASTA formatted alignments for B. flavifrons and B. fervidus used to confirm species status of specimens used for assembly, as both species belong to cryptic species complexes. These contain the relevant regions from the putative mitochondrial regions from the "mitochondrial_genomes" directory aligned to mitochondrial DNA from Genbank for the two species and possible taxa for which they might be misidentified. Sequences from Genbank contain the accessions or sample ID from the original study in the sequence name. The assembled scaffolds cluster with the correct mitochondrial lineage indicating assemblies used correctly identified specimens.
Directory "non-bee-scaffolds-figshare" contains 4 files. The .fasta file contains assembled scaffold sequences removed after identification as non-Bombus and the .tsv file contains information on Blobtools scaffold assignment to class.
Directory "repeatmasker_figshare" contains two main subdirectories (apidaeDfam_and_rmodeler and no_apidaeDfam_rmonly) each of which contain subdirectories for B. flavifrons and B. fervidus. The "apidaeDfam_and_rmodeler" contains results using both a species specific repeatmodeler library combined with the Apidae Dfam library (see below for code to retrieve these), while the "no_apidaeDfam_rmonly" subdirectory contains repeatmasker results only using the species specific repeatmodeler libraries. Each subdirectory contains the relevant repeat families library (.fa), a summary table of identified repeats (.tbl) and a detailed genome features file output of repeat locations identified in each genome (.gff).
Code to run RepeatModeler and RepeatMasker in B. flavifrons (using dfam-tetools-latest.sif) for genome .fa files only using 18 identified chromosomal scaffolds:
BuildDatabase -name flavifrons GCF_040668555.1_iyBomFlav1_principal_genomic_18scaff.fa
RepeatModeler -database flavifrons -threads 20 -LTRStruct
RepeatMasker -lib Apidae_Bflav-families-rm.fa -pa 8 -gff GCF_040668555.1_iyBomFlav1_principal_genomic_18scaff.fa #note use flavifrons-families.fa instead of Apidae_Bflav-families-rm.fa to use only B. flavifrons specific repeatmodeler results, without Apidae-Dfam models.
Code to run RepeatModeler and RepeatMaskerin B. fervidus (using dfam-tetools-latest.sif) for genome .fa files only using 19 identified chromosomal scaffolds:
BuildDatabase -name fervidus GCF_041682495.2_iyBomFerv1_genomic_chr1-19.fa
RepeatModeler -database fervidus -threads 18 -LTRStruct
RepeatMasker -lib Apidae_Bferv-families-rm.fa -pa 8 -gff GCF_041682495.2_iyBomFerv1_genomic_chr1-19.fa #note use fervidus-families.fa instead of Apidae_Bferv-families-rm.fa to use only B. flavifrons specific repeatmodeler results, without Apidae-Dfam models.
Code to download Apidae Dfam database
famdb.py -i Libraries families --format fasta_name --include-class-in-name --ancestors --descendants 7458 > Apidae-rm.fa
File "genespace_figshare.R" contains R code for running synteny analyses. This requires the user to have genespace and dependencies installed (https://github.com/jtlovell/GENESPACE) and to download the *_genomic.GFF and *_translated_cds.faa files for the relevant assemblies in the directory "genespace_genomes" (placed in subdirectories titled "affinis", "ferv", "flav", "huntii", and "pens") and results will be written to "genespace_working".
创建时间:
2026-02-02



