five

Feature selection on microbial profiles of CRC samples with chopin2 (powered by hdlib)

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/6467875
下载链接
链接失效反馈
官方服务:
资源简介:
This Zenodo entry contains the result of the feature selection algorithm implemented through a backward variable elimination strategy in chopin2 (powered by hdlib) applied on MetaPhlAn3 microbial profiles of a public dataset of metagenomic stool samples collected from patients affected by the colorectal cancer (CRC) as well as from healthy individuals. Microbial profiles have been extracted through the curatedMetagenomicData package for R under the IDs ThomasAM_2018a, ThomasAM_2018b, and ThomasAM_2019_a. The feature selection algorithm is implemented as a backward variable elimination method, and it makes use of the vector-symbolic architecture described in Cumbo F 2020. Deposited data is described below: datasets.tar.gz: it contains the datasets used as input of chopin2 as the result of merging the three datasets with relative abundances mentioned above, also stratified by age and sex (with prefix RA). The same datasets have been also binarized (with prefix BIN); hd-models.tar.gz: it contains the output of the feature selection performed with chopin2 (powered by hdlib) on the datasets with both relative abundance and binary profiles (RA and BIN); ml-models.tar.gz: it contains the result of the feature selection produced with classical wrapper-based techniques (i.e., Random Forest, Decision Tree, Support Vector Machine, Logistic Regression, and Extreme Gradient Boosting) in addition to a Python 3.8 script to reproduce the results. Please note that the datasets RA__ThomasAM__species.csv and BIN__ThomasAM__species.csv are also included into the datasets.tar.gz archive.
创建时间:
2024-05-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作