stepp1/is_sparse_5d

Name: stepp1/is_sparse_5d
Creator: stepp1
Published: 2025-11-18 14:45:02
License: 暂无描述

Hugging Face2025-11-18 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/stepp1/is_sparse_5d

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - tabular-classification language: - en tags: - synthetic - sparse-learning - classification size_categories: - 100K<n<1M --- # is_sparse/sparse5d ## Dataset Description This is a synthetic 5-dimensional classification dataset designed for sparse learning research. The dataset contains 3 classes and is specifically designed to have sparse optimal representations, where only a subset of features are informative for the classification task. ### Dataset Summary - **Variant**: sparse5d - **Features**: 5 continuous features - **Classes**: 3 - **Entropy(Y)**: 1.4855 - **Mutual Information (joint)**: 1.1819 - **Maximum Achievable Accuracy**: 0.8967 ## Dataset Structure ### Data Instances Each instance consists of: - `data`: A 5-dimensional feature vector (float32) - `label`: An integer class label (0, 1, or 2) ### Data Splits | Split | Number of Instances | |-------|---------------------| | Train | Variable (see below) | | Test | Variable (see below) | ## Dataset Creation This dataset was synthetically generated for research on sparse learning and optimal feature selection. The mutual information values between feature subsets and labels are provided in the metadata. ### Mutual Information Structure The dataset includes ground-truth mutual information values for various feature subsets, enabling: - Feature importance analysis - Information-theoretic learning algorithms - Benchmarking of MI estimation methods Key MI values: - joint: 1.1819 - 1: 0.3273 - 1-2: 0.3273 - 1-2-3: 0.6634 - 1-2-3-4: 0.6634 - 1-2-3-4-5: 1.1819 - 1-2-3-5: 1.1819 - 1-2-4: 0.3273 - 1-2-4-5: 1.0492 - 1-2-5: 1.0492 ## Citation If you use this dataset, please cite the associated research paper (to be added). ## License MIT License

license: mit 任务类别： - 表格分类（tabular-classification）语言： - 英语标签： - 合成数据集（synthetic） - 稀疏学习（sparse-learning） - 分类样本量范围： - 100K<n<1M # is_sparse/sparse5d ## 数据集描述本数据集为专为稀疏学习（sparse-learning）研究设计的合成五维分类数据集。该数据集包含3个类别，其核心设计目标是生成具备稀疏最优表征的样本，即仅存在部分特征对分类任务具备信息价值。 ### 数据集摘要 - **变体名称**：sparse5d - **特征维度**：5个连续型特征 - **类别数**：3 - **标签熵（Entropy(Y)）**：1.4855 - **联合互信息（Mutual Information (joint)）**：1.1819 - **最高可达准确率**：0.8967 ## 数据集结构 ### 数据实例每条数据样本包含以下字段： - `data`：5维特征向量（32位浮点型（float32）） - `label`：整数类型类别标签，取值为0、1或2 ### 数据划分 | 划分集 | 样本数量 | |-------|---------------------| | 训练集 | 可变（详见下文） | | 测试集 | 可变（详见下文） | ## 数据集创建本数据集为支持稀疏学习（sparse-learning）与最优特征选择研究而合成生成。元数据中已提供特征子集与标签间的互信息数值。 ### 互信息结构本数据集包含各类特征子集的真实互信息数值，可用于： - 特征重要性分析 - 信息论驱动的学习算法研究 - 互信息估计方法的基准测试关键互信息数值： - 联合互信息：1.1819 - 特征1：0.3273 - 特征1-2：0.3273 - 特征1-2-3：0.6634 - 特征1-2-3-4：0.6634 - 特征1-2-3-4-5：1.1819 - 特征1-2-3-5：1.1819 - 特征1-2-4：0.3273 - 特征1-2-4-5：1.0492 - 特征1-2-5：1.0492 ## 引用若使用本数据集，请引用关联的研究论文（待补充）。 ## 许可证 MIT许可证

提供机构：

stepp1

5,000+

优质数据集

54 个

任务类型

进入经典数据集