five

Replication Data for: An Efficient and Interpretable Machine Learning Model for Predicting Breast Cancer Subtypes using Gene Expression Profiles

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/WLEHDV
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains gene expression data of breast cancer carcinoma (BRCA) samples that were extracted from The Cancer Genome Atlas (TCGA) repository on February 25, 2023. All all non-tumor samples from the dataset, resulting in a total of 1,099 tumor samples representing 5 BRCA subtypes: Basal-Like, Her-2, Luminal-A, Luminal-B, and Normal-Like. Features with a mean expression value of less than 0.04 were also removed from the dataset. That leaves 34,395 features out of 60,600. Finally, the log2 transformations were performed on the gene expression values. The dataset were virtically partitioned based on feature types to make four datasets contains distict types of feature expression values. This dataset contains 5 csv files as follows: 1. dataL2.csv : 1,099 tumor samples and 34,395 features. 2. dataL2-mRNA.csv : 1,099 tumor samples and 16,967 features (mRNA). 3. dataL2-mRNA.csv : 1,099 tumor samples and 624 features (miRNA). 4. dataL2-mRNA.csv : 1,099 tumor samples and 8,291 features (lncRNA). 5. dataL2-mRNA.csv : 1,099 tumor samples and 8,516 features (otherRNA). 6. labels.csv : 1,099 tumor samples with subtype class label (Label 0: basal-like 194 samples, Label 1: Her-2 82 samples, Label 2: Luminal-A: 569 samples, Lavel 3: Luminal-B 209 samples, Normal-like 40 samples.)
创建时间:
2025-04-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作