Replication Data for: An Efficient and Interpretable Machine Learning Model for Predicting Breast Cancer Subtypes using Gene Expression Profiles

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://doi.org/10.7910/DVN/WLEHDV

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset contains gene expression data of breast cancer carcinoma (BRCA) samples that were extracted from The Cancer Genome Atlas (TCGA) repository on February 25, 2023. All all non-tumor samples from the dataset, resulting in a total of 1,099 tumor samples representing 5 BRCA subtypes: Basal-Like, Her-2, Luminal-A, Luminal-B, and Normal-Like. Features with a mean expression value of less than 0.04 were also removed from the dataset. That leaves 34,395 features out of 60,600. Finally, the log2 transformations were performed on the gene expression values. The dataset were virtically partitioned based on feature types to make four datasets contains distict types of feature expression values. This dataset contains 5 csv files as follows: 1. dataL2.csv : 1,099 tumor samples and 34,395 features. 2. dataL2-mRNA.csv : 1,099 tumor samples and 16,967 features (mRNA). 3. dataL2-mRNA.csv : 1,099 tumor samples and 624 features (miRNA). 4. dataL2-mRNA.csv : 1,099 tumor samples and 8,291 features (lncRNA). 5. dataL2-mRNA.csv : 1,099 tumor samples and 8,516 features (otherRNA). 6. labels.csv : 1,099 tumor samples with subtype class label (Label 0: basal-like 194 samples, Label 1: Her-2 82 samples, Label 2: Luminal-A: 569 samples, Lavel 3: Luminal-B 209 samples, Normal-like 40 samples.)

创建时间：

2025-04-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集