ToS;DR policies dataset (training)

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/15014822

下载链接

链接失效反馈

官方服务：

资源简介：

Dataset Overview This dataset is derived from Terms of Service; Didn't Read (ToS;DR), a project that analyzes and categorizes terms of service from various online services. The dataset has been cleaned and organized into two CSV files, with a focus on reproducibility and usability. The privacy dataset is a subset of the full dataset, specifically filtering for privacy-related terms. File Descriptions 1. training_tosdr_all_data.csv This file contains the complete collection of terms of service data after cleaning and preprocessing. Each row represents a statement (or "point") extracted from a service's terms of service. Key Columns: case_id: Unique identifier for the case. case_title: Brief description of the case. topic_id: Unique identifier for the topic. topic_title: Broad category the case falls under (e.g., Transparency, Copyright License). sentence: The extracted text from the terms of service. seq_case_id: Sequential identifier for the case, used for mapping. seq_topic_id: Sequential identifier for the topic, used for mapping. 2. training_tosdr_privacy_data.csv This file is a subset of the full dataset, focusing exclusively on privacy-related terms. It includes cases related to tracking, data collection, account deletion policies, and other privacy-related topics. Key Columns: case_id: Unique identifier for the case. case_title: Brief description of the case. topic_id: Unique identifier for the topic. topic_title: Broad category the case falls under (e.g., Privacy, Data Collection). sentence: The extracted text from the terms of service. seq_case_id: Sequential identifier for the case, used for mapping. seq_topic_id: Sequential identifier for the topic, used for mapping.

创建时间：

2025-03-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集