five

Infant nasal gene catalogue

收藏
DataCite Commons2025-10-03 更新2026-04-25 收录
下载链接:
https://figshare.com/articles/dataset/Infant_nasal_gene_catalogue/30227902
下载链接
链接失效反馈
官方服务:
资源简介:
Functional studies of how early-life interventions shape the airway microbiome remain scarce. Here, we performed metagenomic sequencing of 706 longitudinal nasal swabs from infants with and without cystic fibrosis (CF) to construct and characterize the largest, non-redundant gene atlas of the infant nasal microbiome. This study includes data of infants from two prospective birth cohort studies: the Swiss Cystic Fibrosis Infant Lung Development (SCILD) and the Basel Bern Infant Lung Development (BILD) cohort. Parents of study participants collected biweekly anterior nasal swabs, starting from the fifth week of life until completion of the first year of life after detailed instruction and demonstration by a study nurse. Swabs (FLOQSwabs® 516CS01; Copan, Brescia, Italy) were placed in medium (UTM-RT™ in Screw-Cap Tube; Copan, Brescia, Italy), on average within 3 days, then pipetted into micro-screw tubes (Sarstedt, Nürnbrecht, Germany) and stored in a −80 °C freezer until further processing. Metagenomics library preparation and sequencing was performed by Novogene. The genomic DNA was randomly sheared into short fragments. The obtained fragments were end repaired, A-tailed, and further ligated with Illumina adapters. The fragments with adapters were PCR amplified, size selected, and purified. The library was checked with Qubit and real-time PCR for quantification and bioanalyzer for size distribution detection. Quantified libraries were pooled and sequenced on the Illumina NovaSeq 6000 platform. Raw reads were quality controlled, and reads with low quality scores, adapter sequences, or more than 10% of unidentified nucleotides were removed. We obtained approximately 40,000,000 raw reads per sample (mean 5.8 G raw data per sample).Generation and annotation of microbial gene catalogHuman DNA contamination was removed by mapping MGX reads to the GRCh38.human human reference genome with BBmap (v38.84) (sourceforge.net/projects/bbmap/) in two rounds. The first round removed the majority of human reads (parameters: mode fast = true, minratio = 0.9; maxindel = 3, minhits = 2, kmer length = 14). The second run used default parameters to increase sensitivity for residual host contamination. De novo individual assembly was then performed as previously described. In brief, for every sample, reads were assembled into contigs with MegaHIT (v1.2.9, presets meta-sensitive) and full-length coding sequences were obtained with Prodigal (default, -p meta). Coding sequences were merged across samples and clustered at 95% sequence identity and 80% target coverage with MMseqs259 to generate a non-redundant gene atlas. CDS within the atlas were taxonomically annotated using three complementary strategies: (i) MMseqs2 (version 2fad714b525f1975b62c2d2b5aff28274ad57466) easy-taxonomy against the Genome Taxonomy Database (--lca-mode 4 (e-value < 0.00001); -- tax-lineage 1), (ii) as well as sequence alignments against the human airway reference genome database (blastn; e-value < 0.001, sequence identity >= 0.95, max_target_seqs 5) and (iii) blasting CDS against a fungal database (FunOMIC.T.v1) with diamond blastx (--query-cover 95 --id 99). For functional annotations we ran eggNOG-mapper (emapper-2.1.12).

关于早期生命干预如何塑造气道微生物组的功能研究仍较为匮乏。本研究对伴或不伴囊性纤维化(CF)的婴儿的706份纵向鼻拭子样本进行宏基因组测序(metagenomic sequencing),以构建并表征目前规模最大的婴儿鼻腔微生物组非冗余基因图谱(non-redundant gene atlas)。 本研究纳入来自两项前瞻性出生队列研究(birth cohort studies)的婴儿数据:瑞士囊性纤维化婴儿肺发育(Swiss Cystic Fibrosis Infant Lung Development, SCILD)队列以及巴塞尔-伯尔尼婴儿肺发育(Basel Bern Infant Lung Development, BILD)队列。研究参与者的父母在研究护士详细指导与演示后,于婴儿出生第5周至满1周岁期间,每两周采集一次前鼻拭子(anterior nasal swabs)样本。所用拭子为FLOQSwabs® 516CS01(Copan,意大利布雷西亚),采集后置于螺旋盖管装UTM-RT™培养基(Copan,意大利布雷西亚)中,样本平均在3天内经移液操作转入微型螺旋盖管(Sarstedt,德国纽伦堡),随后置于-80℃冰箱保存以待后续处理。 宏基因组文库构建与测序由诺禾致源(Novogene)完成。基因组DNA经随机剪切为短片段,随后进行末端修复、加A尾,并与Illumina接头(Illumina adapters)连接。带接头的片段经PCR扩增、片段大小筛选与纯化后,通过Qubit与实时荧光定量PCR进行文库定量,同时利用生物分析仪检测片段大小分布。定量后的文库经混合后,在Illumina NovaSeq 6000测序平台上完成测序。对原始测序reads进行质控,移除低质量reads、接头序列以及含有超过10%未确定碱基的reads。最终每个样本平均获得约4000万条原始reads,平均每个样本的原始数据量为5.8 G。 微生物基因目录的构建与注释 通过两轮使用BBmap(v38.84,sourceforge.net/projects/bbmap/)将宏基因组(MGX)reads比对至GRCh38人类参考基因组,以去除人类DNA污染:第一轮去除大部分人类reads,参数设置为mode fast = true, minratio = 0.9; maxindel = 3, minhits = 2, kmer长度=14;第二轮采用默认参数,以提升对残留宿主污染的检测灵敏度。随后按照既往报道的方法开展从头个体组装:简言之,针对每个样本,使用MegaHIT(v1.2.9,预设参数meta-sensitive)将reads组装为重叠群(contigs),再通过Prodigal(默认参数,-p meta)获取全长编码序列(CDS)。将所有样本的编码序列进行合并,并以95%序列同一性与80%目标覆盖度通过MMseqs2^59进行聚类,从而生成非冗余基因图谱。 采用三种互补策略对图谱中的CDS进行分类学注释:(1)利用MMseqs2(版本2fad714b525f1975b62c2d2b5aff28274ad57466)的easy-taxonomy工具比对至基因组分类学数据库(Genome Taxonomy Database),参数为--lca-mode 4(e值<0.00001)、--tax-lineage 1;(2)将序列比对至人类气道参考基因组数据库,采用blastn工具,参数为e值<0.001、序列同一性≥0.95、max_target_seqs 5;(3)通过diamond blastx将CDS比对至真菌数据库FunOMIC.T.v1,参数为--query-cover 95、--id 99。功能注释则通过eggNOG-mapper(emapper-2.1.12)完成。
提供机构:
figshare
创建时间:
2025-10-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作