Dataset60: High-dimensional Datasets for Feature Selection
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10471604
下载链接
链接失效反馈官方服务:
资源简介:
Dataset60 comprises 60 high-dimensional datasets sourced from open repositories, systematically curated and formatted to establish a new benchmark for evaluating feature selection algorithms.
A summary of each dataset, including its name, number of samples (n_sample), features (n_feature), classes (n_class), and quantity/proportion for each label (label_distribution), is available in the "summary_dataset60.csv" file.
Datasets are obtained from various sources:
https://jundongl.github.io/scikit-feature/datasets.html
https://zenodo.org/records/2709491
https://archive.ics.uci.edu/datasets
https://data.mendeley.com/datasets/fhx5zgx2zj/1
https://ckzixf.github.io/dataset.html
The preprocessing steps include:
Removing an index/id column if present.
Encoding labels numerically, starting from 0 (0/1/2/…).
Re-naming headers to f{feature_index} & label (f1, f2, …, label).
Addressing missing values: Following instructions in the README/Dataset Info if available; otherwise, filling missing values with 0.
Important note: The data has NOT undergone standardization.
创建时间:
2024-04-12



