five

A Smoothed-Bayesian Approach to Frequency Recovery from Sketched Data

收藏
Figshare2025-05-12 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/A_smoothed-Bayesian_approach_to_frequency_recovery_from_sketched_data_/29039308
下载链接
链接失效反馈
官方服务:
资源简介:
We provide a novel statistical perspective on a classical problem at the intersection of computer science and information theory: recovering the empirical frequency of a symbol in a large discrete dataset using only a compressed representation, or sketch, obtained via random hashing. Departing from traditional algorithmic approaches, recent works have proposed Bayesian nonparametric (BNP) methods that can provide more informative frequency estimates by leveraging modeling assumptions about the distribution of the sketched data. In this article, we propose an alternative smoothed-Bayesian approach, inspired by existing BNP methods but designed to overcome their computational limitations when dealing with large-scale data from realistic distributions, including those with power-law tail behaviors. For sketches obtained with a single hash function, our approach is supported by precise theoretical guarantees, including unbiasedness and optimality under a Bayesian framework within an intuitive class of linear estimators. For sketches with multiple hash functions, we introduce an approach based on multi-view learning to construct computationally efficient frequency estimators. We validate our method on synthetic and real data, comparing its performance to that of existing alternatives. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
创建时间:
2025-05-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作