Molecular and Pathological Characterization of Cervicovaginal Samples Reveals Protective Effects of Lactobacillus species Against HPV Infections, Bacterial Vaginosis, and Epithelial Cell Abnormalities
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/pn7h6yjt6k
下载链接
链接失效反馈官方服务:
资源简介:
This dataset accompanies the manuscript “Molecular and pathological characterization of cervicovaginal samples reveals associations between Lactobacillus species, HPV infections, bacterial vaginosis, and cytological abnormalities” and comprises the raw, intermediate, and processed data used for epidemiological profiling, molecular analysis, statistical testing, interaction effect analysis, PCA, clustering, and machine learning modeling. It includes >15,000 de-identified cervicovaginal samples collected by a CLIA-certified diagnostic laboratory in the United States between March 2023 and February 2024.
Each sample was tested using qPCR-based multiplex panels targeting Lactobacillus species, BV-associated bacteria, high-risk HPV genotypes, and other vaginal pathogens. Additional data include cytological diagnoses (Pap smear), patient demographics (age, gender, state of provider), and real-time qPCR concentration values. BV status was molecularly classified into BV-positive, BV-negative, and transitional BV based on concentration thresholds and combined bacterial profiles.
The dataset is organized as follows:
Dataset 1 – Epidemiology Stats: Aggregated demographics, HPV prevalence, cytology outcomes, and BV status distributions.
Dataset 2 – Association Analysis: Detailed qPCR results per pathogen with annotations for outcome classifications; includes derived prevalence and co-occurrence matrices.
Dataset 3 – Cytology & BV Outcomes: Binary and ordinal encoding of cytology and BV outcomes, filtered and pre-processed for regression and chi-square tests.
Dataset 4 – PCA, Clustering & ML: Scaled datasets used for principal component analysis, k-means clustering, feature importance extraction, and SHAP analysis across four classifiers (XGBoost, Random Forest, Decision Tree, Logistic Regression).
Supplementary Dataset 5 – Interaction Effects: Output tables from interaction term analysis (e.g., HPV–HPV, HPV–bacteria, Lactobacillus–bacteria) showing significant pairwise effects and q-values across multiple outcomes.
These datasets enable exploration of:
Microbial composition and distribution across patient subgroups.
Correlation and interaction between Lactobacillus spp., BV pathogens, HPV types, and cytological abnormalities.
Predictive power of microbial and demographic features for diagnosing BV and cytological changes.
The dataset is fully anonymized and compliant with HIPAA guidelines. It supports reproducibility and benchmarking of machine learning models in vaginal microbiome diagnostics and cervical cancer screening.
创建时间:
2025-06-27



