five

Genomic insights into the chromosomal elongation in a family of Collembola

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.70rxwdc48
下载链接
链接失效反馈
官方服务:
资源简介:
Collembola is a highly diverse and abundant group of soil arthropods with chromosome numbers ranging from 5 to 11. Previous karyotype studies indicated that the Tomoceridae family possesses an exceptionally long chromosome. To better understand chromosome size evolution in Collembola, we obtained a chromosome-level genome of Yoshiicerus persimilis with a size of 334.44 Mb and BUSCO completeness of 97.0% (n = 1,013). Both genomes of Y. persimilis and Tomocerus qinae (recently published) have an exceptionally large chromosome (ElChr >100 Mb), accounting for nearly one-third of the genome. Comparative genomic analyses suggest that chromosomal elongation occurred independently in two species approximately 10 million years ago, rather than in the ancestor of the Tomoceridae family. The ElChr elongation was caused by large tandem and segmental duplications, as well as transposon proliferation, with genes in these regions experiencing weaker purifying selection (higher dN/dS) than conserved regions. Moreover, inter-genomic synteny analyses indicated that chromosomal fission/fusion events played a crucial role in the evolution of chromosome numbers (ranging from 5 to 7) within Entomobryomorpha. This study provides a valuable resource for investigating the chromosome evolution of Collembola. Methods Genome assembly De novo assembly of PacBio long reads was performed by Raven v. 1.6.0. The assembly was then polished with one round of long reads using Flye v. 2.8.3 and two rounds of Illumina short reads using NextPolish v. 1.3.1. Primary contigs were anchored into chromosomes using 3D-DNA v. 180922. Genome annotation We used the MAKER v. 3.01.03 to predict PCGs, which integrates ab initio, RNA-seq, and protein homology evidence. BRAKER v. 2.1.6  and GeMoMa v. 1.7.1 predictions combining protein and transcriptome evidence were integrated as the ab initio input passed to MAKER. BRAKER trained Augustus v. 3.3.4 and GeneMark-ES/ET/EP 4.68_lic integrating evidence from the OrthoDB10 v1 database. GeMoMa with parameters “GeMoMa.c = 0.3 GeMoMa.p = 12” utilized eight species (Daphnia magna, Cloeon dipterum, Zootermopsis nevadensis, Drosophila melanogaster, Rhopalosiphum maidis, Tribolium castaneum, Sinella curviseta, and FCSH) as the protein homology-based reference. RNA-seq alignments were produced using HISAT2 v. 2.2.0. RNA-seq data were further assembled into transcripts with the genome-guided assembler StringTie v. 2.1.6. MAKER used the protein sequences from the aforementioned eight species as protein homology evidence. PCGs were annotated by aligning protein sequences to the UniProtKB database using Diamond v. 2.0.8 with an e-value threshold of 1e-5. Furthermore, protein domains were predicted by InterProScan 5.48–83.0 based on five public databases: Pfam, SMART, Superfamily, Gene3D, and CDD. EggNOG-mapper v. 2.1.5 was also employed for functional category annotation based on the eggNOG v. 5.0.2 database.
创建时间:
2024-01-02
二维码
社区交流群
二维码
科研交流群
商业服务