ToS;DR policies dataset (training)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15014822
下载链接
链接失效反馈官方服务:
资源简介:
Dataset Overview
This dataset is derived from Terms of Service; Didn't Read (ToS;DR), a project that analyzes and categorizes terms of service from various online services. The dataset has been cleaned and organized into two CSV files, with a focus on reproducibility and usability. The privacy dataset is a subset of the full dataset, specifically filtering for privacy-related terms.
File Descriptions
1. training_tosdr_all_data.csv
This file contains the complete collection of terms of service data after cleaning and preprocessing. Each row represents a statement (or "point") extracted from a service's terms of service.
Key Columns:
case_id: Unique identifier for the case.
case_title: Brief description of the case.
topic_id: Unique identifier for the topic.
topic_title: Broad category the case falls under (e.g., Transparency, Copyright License).
sentence: The extracted text from the terms of service.
seq_case_id: Sequential identifier for the case, used for mapping.
seq_topic_id: Sequential identifier for the topic, used for mapping.
2. training_tosdr_privacy_data.csv
This file is a subset of the full dataset, focusing exclusively on privacy-related terms. It includes cases related to tracking, data collection, account deletion policies, and other privacy-related topics.
Key Columns:
case_id: Unique identifier for the case.
case_title: Brief description of the case.
topic_id: Unique identifier for the topic.
topic_title: Broad category the case falls under (e.g., Privacy, Data Collection).
sentence: The extracted text from the terms of service.
seq_case_id: Sequential identifier for the case, used for mapping.
seq_topic_id: Sequential identifier for the topic, used for mapping.
创建时间:
2025-03-17



