five

Cost-based Feature Selection for Network Model Choice

收藏
DataCite Commons2023-01-20 更新2024-09-03 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Cost-based_feature_selection_for_network_model_choice/21648369/2
下载链接
链接失效反馈
官方服务:
资源简介:
Selecting a small set of informative features from a large number of possibly noisy candidates is a challenging problem with many applications in machine learning and approximate Bayesian computation. In practice, the cost of computing informative features also needs to be considered. This is particularly important for networks because the computational costs of individual features can span several orders of magnitude. We addressed this issue for the network model selection problem using two approaches. First, we adapted nine feature selection methods to account for the cost of features. We show for two classes of network models that the cost can be reduced by two orders of magnitude without considerably affecting classification accuracy (proportion of correctly identified models). Second, we selected features using pilot simulations with smaller networks. This approach reduced the computational cost by a factor of 50 without affecting classification accuracy. To demonstrate the utility of our approach, we applied it to three different yeast protein interaction networks and identified the best-fitting duplication divergence model. Supplementary materials, including computer code to reproduce our results, are available online.

从海量含噪候选特征中遴选少量高信息性特征,是一项极具挑战的问题,在机器学习与近似贝叶斯计算(approximate Bayesian computation)领域拥有广泛应用。实际应用中,还需统筹考量计算信息性特征的成本,这一点对网络模型尤为关键,因为单个特征的计算成本可相差数个数量级。针对网络模型选择任务,我们采用两种方案解决了这一成本考量问题:其一,我们对九种特征选择方法进行适配,以将特征成本纳入考量范畴。针对两类网络模型的实验结果表明,在几乎不影响分类准确率(即正确识别模型的比例)的前提下,可将计算成本降低两个数量级;其二,我们借助小型网络的预实验模拟开展特征筛选,该方案可将计算成本缩减至原规模的1/50,且未对分类准确率产生负面影响。为验证所提方案的实用性,我们将其应用于三种不同的酵母蛋白质相互作用网络,并筛选出了拟合效果最优的复制分歧模型(duplication divergence model)。补充材料(包含可复现本研究结果的计算机代码)可在线获取。
提供机构:
Taylor & Francis
创建时间:
2023-01-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作