five

Deep mining of phage lysins from human microbiome (DeepMineLys)

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1135322
下载链接
链接失效反馈
官方服务:
资源简介:
Our study introduces DeepMineLys, a two-track CNN-based model designed to mine phage lysins from human microbiome datasets. The human microbiome, including the Gut Virome dataset (GutV), offers a rich source for discovering phage lysins. Two samples from the GutV dataset, collected from healthy individuals, were sequenced in-house. Viral DNA from each sample was used for library construction, with whole genome amplification (WGA) performed to obtain sufficient nucleic acids. Paired-end (2 x 150 bp reads) metagenomic sequencing was conducted on an Illumina HiSeq 2500 short-read platform with an expected sequencing depth of 6 Gb per library (MAGIGENE, Guangzhou, China). The shotgun Illumina paired-end reads from the GutV dataset were assessed for quality using FastQC v0.11.7 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Based on the quality statistics, low-quality bases were trimmed using Trimmomatic v0.36 with the options: SLIDINGWINDOW:4:20 MINLEN:50. Human contamination was removed by aligning the reads with BWA MEM v0.7.17-r1188 (default options) against the human genome GRCh38. The cleaned paired-end reads were then assembled into metagenomic contigs using MEGAHIT v1.1.3 with the option '--k-list 31,51,71,91,111'.

本研究提出了DeepMineLys——一种基于双轨卷积神经网络(Convolutional Neural Network,CNN)的模型,旨在从人类微生物组数据集中挖掘噬菌体溶素。人类微生物组,包括肠道病毒组数据集(Gut Virome,简称GutV),是发现噬菌体溶素的优质资源库。本研究从GutV数据集中选取两份采自健康个体的样本开展内部测序。每份样本的病毒DNA均用于文库构建,并通过全基因组扩增(WGA)获取足量核酸物质。采用Illumina HiSeq 2500短读长测序平台进行双端(2×150 bp读长)宏基因组测序,预计每个文库的测序深度为6 Gb(MAGIGENE,中国广州)。本研究使用FastQC v0.11.7(http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)对GutV数据集的鸟枪法Illumina双端测序读段进行质量评估。基于质量统计结果,使用Trimmomatic v0.36并配置参数SLIDINGWINDOW:4:20、MINLEN:50对低质量碱基进行修剪。通过将测序读段与人类参考基因组GRCh38进行比对(采用BWA MEM v0.7.17-r1188默认参数),去除样本中的人类宿主污染序列。随后,使用MEGAHIT v1.1.3并配置参数"--k-list 31,51,71,91,111",将经过质量清洗的双端测序读段组装为宏基因组重叠群。
创建时间:
2024-07-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作