Supporting data for "Long-read sequence assembly of the firefly Pyrocoelia pectoralis genome"
收藏DataCite Commons2025-07-22 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100376
下载链接
链接失效反馈官方服务:
资源简介:
Fireflies are a family of insects within the beetle order Coleoptera, or winged beetles, which are one of the most well known and loved insect species because of their bioluminescence. However, the firefly is in danger of extinction because of the massive destruction of its living environment. In order to improve the understanding of fireflies and protect them effectively, we sequenced the whole genome of the terrestrial firefly Pyrocoelia pectoralis.
<br>
Here, we developed a highly reliable genome resource for the terrestrial firefly Pyrocoelia pectoralis (E. Oliv., 1883) (Coleoptera: Lampyridae) using single molecule real time (SMRT) Sequencing on the PacBio Sequel platform. In total, 57.8Gb long reads were generated and assembled into a final size of 760.4Mb genome which is close to the estimated genome size and covered 98.7% complete and 0.7% partial insect BUSCOs. The k-mer analysis showed this genome is highly heterozygous. However, our long-read assembly demonstrates continuousness with a contig N50 length of 3.04Mb and the longest contig length of 13.69Mb. Furthermore, 135,589
SSRs and 341Mb of repeat sequences were detected. A total of 23,092 genes were predicted in which 88.44% genes were annotated with one or more related functions.
<br>
We assembled a high quality firefly genome, which will not only provide insights into the conservation and biodiversity of fireflies, but also provide a wealth of information to study the mechanisms of their sexual communication, bio-luminescence and evolution.
萤火虫是鞘翅目(Coleoptera)甲虫类群下的一科昆虫,也就是有翅甲虫,它们因生物发光现象成为最广为人知且广受喜爱的昆虫类群之一。然而,由于生存环境遭到大规模破坏,萤火虫正面临灭绝危机。为加深对萤火虫的认知并实现有效保护,我们对陆生萤火虫胸斑窗萤(Pyrocoelia pectoralis)的全基因组进行了测序。
本研究基于PacBio Sequel平台的单分子实时(SMRT)测序技术,构建了陆生萤火虫胸斑窗萤(E. Oliv., 1883)(鞘翅目:萤科)的高可靠基因组资源。本次测序共产出57.8Gb长读长数据,最终组装得到760.4Mb的基因组,其大小与预估基因组尺寸相近,且覆盖了98.7%的完整昆虫BUSCO(Benchmarking Universal Single-Copy Orthologs)基因与0.7%的部分BUSCO基因。k-mer分析显示该基因组具有高度杂合性,但我们的长读长组装结果具有极佳的连续性,Contig N50长度达3.04Mb,最长Contig长度为13.69Mb。此外,共检测到135589个简单序列重复(Simple Sequence Repeats, SSR)位点以及341Mb的重复序列区域。研究共预测得到23092个基因,其中88.44%的基因获得了一项或多项功能注释。
本研究组装得到的高质量萤火虫基因组,不仅可为萤火虫的保护与生物多样性研究提供重要见解,也为探究其性通讯、生物发光及演化机制提供了丰富的研究资料。
提供机构:
GigaScience Database
创建时间:
2017-11-13



