Deep mining of phage lysins from human microbiome (DeepMineLys)

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1135322

下载链接

链接失效反馈

官方服务：

资源简介：

Our study introduces DeepMineLys, a two-track CNN-based model designed to mine phage lysins from human microbiome datasets. The human microbiome, including the Gut Virome dataset (GutV), offers a rich source for discovering phage lysins. Two samples from the GutV dataset, collected from healthy individuals, were sequenced in-house. Viral DNA from each sample was used for library construction, with whole genome amplification (WGA) performed to obtain sufficient nucleic acids. Paired-end (2 x 150 bp reads) metagenomic sequencing was conducted on an Illumina HiSeq 2500 short-read platform with an expected sequencing depth of 6 Gb per library (MAGIGENE, Guangzhou, China). The shotgun Illumina paired-end reads from the GutV dataset were assessed for quality using FastQC v0.11.7 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Based on the quality statistics, low-quality bases were trimmed using Trimmomatic v0.36 with the options: SLIDINGWINDOW:4:20 MINLEN:50. Human contamination was removed by aligning the reads with BWA MEM v0.7.17-r1188 (default options) against the human genome GRCh38. The cleaned paired-end reads were then assembled into metagenomic contigs using MEGAHIT v1.1.3 with the option '--k-list 31,51,71,91,111'.

本研究提出了DeepMineLys——一种基于双轨卷积神经网络（Convolutional Neural Network，CNN）的模型，旨在从人类微生物组数据集中挖掘噬菌体溶素。人类微生物组，包括肠道病毒组数据集（Gut Virome，简称GutV），是发现噬菌体溶素的优质资源库。本研究从GutV数据集中选取两份采自健康个体的样本开展内部测序。每份样本的病毒DNA均用于文库构建，并通过全基因组扩增（WGA）获取足量核酸物质。采用Illumina HiSeq 2500短读长测序平台进行双端（2×150 bp读长）宏基因组测序，预计每个文库的测序深度为6 Gb（MAGIGENE，中国广州）。本研究使用FastQC v0.11.7（http://www.bioinformatics.babraham.ac.uk/projects/fastqc/）对GutV数据集的鸟枪法Illumina双端测序读段进行质量评估。基于质量统计结果，使用Trimmomatic v0.36并配置参数SLIDINGWINDOW:4:20、MINLEN:50对低质量碱基进行修剪。通过将测序读段与人类参考基因组GRCh38进行比对（采用BWA MEM v0.7.17-r1188默认参数），去除样本中的人类宿主污染序列。随后，使用MEGAHIT v1.1.3并配置参数"--k-list 31,51,71,91,111"，将经过质量清洗的双端测序读段组装为宏基因组重叠群。

创建时间：

2024-07-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集