five

ToS;DR policies dataset (training)

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15014822
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset Overview This dataset is derived from Terms of Service; Didn't Read (ToS;DR), a project that analyzes and categorizes terms of service from various online services. The dataset has been cleaned and organized into two CSV files, with a focus on reproducibility and usability. The privacy dataset is a subset of the full dataset, specifically filtering for privacy-related terms. File Descriptions 1. training_tosdr_all_data.csv This file contains the complete collection of terms of service data after cleaning and preprocessing. Each row represents a statement (or "point") extracted from a service's terms of service. Key Columns: case_id: Unique identifier for the case. case_title: Brief description of the case. topic_id: Unique identifier for the topic. topic_title: Broad category the case falls under (e.g., Transparency, Copyright License). sentence: The extracted text from the terms of service. seq_case_id: Sequential identifier for the case, used for mapping. seq_topic_id: Sequential identifier for the topic, used for mapping. 2. training_tosdr_privacy_data.csv This file is a subset of the full dataset, focusing exclusively on privacy-related terms. It includes cases related to tracking, data collection, account deletion policies, and other privacy-related topics. Key Columns: case_id: Unique identifier for the case. case_title: Brief description of the case. topic_id: Unique identifier for the topic. topic_title: Broad category the case falls under (e.g., Privacy, Data Collection). sentence: The extracted text from the terms of service. seq_case_id: Sequential identifier for the case, used for mapping. seq_topic_id: Sequential identifier for the topic, used for mapping.
创建时间:
2025-03-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作