five

Algorithms for Sparse Support Vector Machines

收藏
DataCite Commons2022-12-13 更新2024-07-29 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Algorithms_for_Sparse_Support_Vector_Machines/21554661
下载链接
链接失效反馈
官方服务:
资源简介:
Many problems in classification involve huge numbers of irrelevant features. Variable selection reveals the crucial features, reduces the dimensionality of feature space, and improves model interpretation. In the support vector machine literature, variable selection is achieved by l1 penalties. These convex relaxations seriously bias parameter estimates toward 0 and tend to admit too many irrelevant features. The current article presents an alternative that replaces penalties by sparse-set constraints. Penalties still appear, but serve a different purpose. The proximal distance principle takes a loss function L(β) and adds the penalty ρ2dist(β,Sk)2 capturing the squared Euclidean distance of the parameter vector β to the sparsity set <i>S<sub>k</sub></i> where at most <i>k</i> components of β are nonzero. If βρ represents the minimum of the objective fρ(β)=L(β)+ρ2dist(β,Sk)2, then βρ tends to the constrained minimum of L(β) over <i>S<sub>k</sub></i> as <i>ρ</i> tends to ∞. We derive two closely related algorithms to carry out this strategy. Our simulated and real examples vividly demonstrate how the algorithms achieve better sparsity without loss of classification power. Supplementary materials for this article are available online.

分类任务中常存在大量无关特征。变量选择可识别关键特征、降低特征空间维度,并提升模型可解释性。在支持向量机(Support Vector Machine)相关研究中,变量选择通常通过L1惩罚项实现。这类凸松弛方法会将参数估计显著向0偏置,且往往会保留过多无关特征。本文提出一种替代方案,以稀疏集约束替代惩罚项。尽管惩罚项仍会被引入,但其作用已发生改变。近端距离准则(Proximal Distance Principle)将损失函数L(β)与惩罚项ρ²·dist(β,Sₖ)²相结合,该惩罚项用于衡量参数向量β到稀疏集Sₖ的平方欧氏距离,其中Sₖ指代β中非零分量至多为k个的参数集合。若β_ρ为目标函数f_ρ(β)=L(β)+ρ²·dist(β,Sₖ)²的极小值点,则当ρ趋向于无穷大时,β_ρ将收敛至L(β)在Sₖ上的约束极小值点。本文推导了两种紧密相关的算法以实现该策略。通过模拟数据与真实数据的实验,我们直观展示了所提算法如何在不损失分类性能的前提下实现更优的稀疏性。本文的补充材料可在线获取。
提供机构:
Taylor & Francis
创建时间:
2022-11-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作