five

Statistical inference for large-scale multi-source heterogeneous data

收藏
中国科学数据2025-11-27 更新2026-04-25 收录
下载链接:
https://www.sciengine.com/AA/doi/10.1360/SSM-2024-0085
下载链接
链接失效反馈
官方服务:
资源简介:
In the era of digital information, the data with which people face may not only be large-scale but also heterogeneous. In this paper, we study statistical inference for the overall population mean function of large-scale multi-source heterogeneous datasets. By borrowing hierarchical sampling methods and divide-and-conquer techniques, we propose a weighted local linear estimator for the overall population mean function of multi-source heterogeneous data. Through studying the pointwise convergence properties and extreme value distribution properties of the estimator, we construct asymptotically accurate simultaneous confidence bands and pointwise confidence intervals for large-scale multi-source heterogeneous data. Our proposed methods are applicable not only to scenarios of heterogeneous data but also to scenarios of homogeneous data using divide-and-conquer methods. Numerical simulation studies show that the proposed methods perform well in analyzing both large-scale multi-source heterogeneous data and homogeneous data. As an illustration, we apply the proposed methods to hypothesis testing problems on Beijing multi-site air-quality data and U.S. census data.
创建时间:
2025-02-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作