隐私政策时间序列数据集
收藏arXiv2021-07-21 更新2024-06-21 收录
下载链接:
https://privacypolicies.cs.princeton.edu/
下载链接
链接失效反馈官方服务:
资源简介:
隐私政策时间序列数据集是由普林斯顿大学研究团队创建的一个大规模数据集,包含了超过100万个英语隐私政策文档,这些文档来自超过13万个网站,时间跨度超过二十年。该数据集通过自定义爬虫技术从互联网档案馆的Wayback Machine中收集,经过一系列验证和质量控制步骤,确保了数据的质量和准确性。数据集的创建旨在解决以往研究中隐私政策分析的时间范围限制,为研究人员提供了一个纵向分析的平台,以便更好地理解隐私政策的变化趋势和透明度问题。此外,数据集还展示了GDPR对隐私政策的显著影响,为隐私法规的研究提供了历史视角。
The Privacy Policy Time Series Dataset is a large-scale dataset created by a research team at Princeton University. It contains over 1 million English privacy policy documents sourced from more than 130,000 websites, spanning over two decades. Collected via custom crawler technology from the Internet Archive's Wayback Machine, the dataset has undergone a series of validation and quality control procedures to ensure its data quality and accuracy. This dataset was developed to address the temporal scope limitations of privacy policy analysis in previous research, providing researchers with a longitudinal analysis platform to better understand the evolving trends and transparency issues of privacy policies. Additionally, the dataset demonstrates the significant impact of the GDPR on privacy policies, providing a historical perspective for research on privacy regulations.
提供机构:
普林斯顿大学
创建时间:
2020-08-21



