Annotated Privacy Policies of 100 Online Platforms

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://data.mendeley.com/datasets/pcgvm6zh43

下载链接

链接失效反馈

官方服务：

资源简介：

The dataset contains information derived from 98 annotated privacy policies of 100 online platforms.* The hypothesis behind the study was that the privacy policies do not contain information sufficient for the consumers to fully understand what personal data exactly is being collected by the platforms, and how exactly it is used. To verify this hypothesis, two annotators (working independently) read the privacy policies in search for three types of occurrences: (1) general terms describing the categories of data collected ("GenData"); (2) general terms describing the purposes for which personal data is used ("GenUse"); (3) the no-distinction structure of a privacy policy, where the document first lists the categories of data collected, and then enumerates the purposes of use, without explaining what personal data is used for what purpose. The hypothesis has been confirmed. In the analyzed sample, all the privacy policies featured at least one instance of GenData, 97 out of 98 featured at least one instance of GenUse, and 89 out of 98 documents had a no-distinction structure. The sample contains 98 privacy policies of 100* digital platforms operating in sixteen market sectors: Cloud storage, Communication, Dating, Finance, Food, Gaming, Health, Music, Shopping, Social, Sports, Transportation, Travel, Video, Work and Various. The selected companies' headquarters span four legal surroundings: the US, the EU, Poland specifically, and Other jurisdictions. The chosen platforms are both privately held and publicly listed, and offer both fee-based and free services. The dataset consists of: (a) two spreadsheets: "PP_table Tagger1.xlsx" and PP_table Tagger2.xlsx," each containing the evaluative variables ascribed, and examples of clauses based on which the judgments have been made (b) two folders: "Tagger 1" and "Tagger 2," each containing 98 pdf files with the privacy policies analyzed, together with annotations made in the form of comments; (c) one text file: "Instruction," explaining the logic behind tagging. The reuse potential of the data is significant. It can be useful for empirical researchers interested in the dynamics of data collection processes of online platforms and normative scholars (like lawyers or political philosophers) interested in critiquing the status quo and proposing ideas for reforms. It can also be useful for non-academics, like governments interested in assessing the efficacy of their regulations, or businesses interested in avoiding the common pitfalls of privacy policy drafting. *(Apple and iCloud, as well as Google and YouTube, had the same privacy policy on the day of raw data collection, i.e. March 13, 2022). ACKNOWLEDGEMENT: The research leading to these results has received funding from the Norwegian Financial Mechanism 2014-2021, project no. 2020/37/K/HS5/02769, titled “Private Law of Data: Concepts, Practices, Principles & Politics.”

创建时间：

2023-09-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集