Table 1_Neonatal gut microbiota stratification and identification of SCFA-associated microbial subgroups using unsupervised clustering and machine learning classification.docx
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Table_1_Neonatal_gut_microbiota_stratification_and_identification_of_SCFA-associated_microbial_subgroups_using_unsupervised_clustering_and_machine_learning_classification_docx/30783989
下载链接
链接失效反馈官方服务:
资源简介:
BackgroundThe neonatal gut microbiome plays a critical role in infant health through the production of short-chain fatty acids (SCFAs). However, the organization of SCFAs-producing microbial communities in neonates remains poorly characterized. This study applied unsupervised clustering and machine learning to classify microbial subgroups associated with SCFAs production, providing insight into their composition and metabolic potential.
MethodsThis study recruited 71 mother-infant pairs from Kangwon National University Hospital and Bundang CHA Hospital, collecting meconium samples within five days postpartum. Microbial diversity was analyzed by 16S rRNA gene sequencing (V3–V4 region) at the genus level, and SCFAs were quantified from the same samples. To identify functionally distinct microbial subgroups, K-Means, Agglomerative, Spectral, and Gaussian Mixture Model clustering were applied. Clustering validity was assessed using Silhouette Score, Calinski-Harabasz Index, Davies-Bouldin Index, and Prediction Strength Validation, with t-distributed Stochastic Neighbor Embedding (t-SNE) visualization to evaluate cluster separation. SCFAs distributions across clusters were compared, while random forest and logistic regression models were used to classify SCFAs-associated microbial clusters through Receiver Operating Characteristic curves (ROC).
ResultsThe clustering analysis identified distinct microbial subgroups linked to SCFAs production, with Agglomerative clustering outperforming K-Means in capturing functionally relevant structures. Cluster 1 had higher SCFAs levels, enriched in Bacteroides, Prevotella, and Enterococcus, while Cluster 2 exhibited lower SCFAs concentrations with a more heterogeneous composition. The introduction of a third cluster in multi-class analysis revealed an intermediate metabolic profile, suggesting a continuum in microbial metabolic function. Classification analysis confirmed random forest model superiority, achieving ROC score of 91.05% (Agglomerative) and 87.74% (K-Means) in binary classification, and 92.98% (Agglomerative) and 89.84% (K-Means) in multi-class classification, demonstrating RF’s strong predictive ability for SCFAs-based clusters.
ConclusionUnsupervised clustering combined with classification analysis effectively predict SCFAs-associated subgroups and paving the way for future research on longitudinal tracking and functional genomic integration in early-life metabolic health.
创建时间:
2025-12-04



