Explainable Detecting Noncompliance in Privacy Agreement based on Large Language Model Fine-tune-dataset

Name: Explainable Detecting Noncompliance in Privacy Agreement based on Large Language Model Fine-tune-dataset
Creator: Science Data Bank
Published: 2025-12-02 08:50:22
License: 暂无描述

DataCite Commons2025-12-02 更新2026-05-05 收录

下载链接：

https://www.scidb.cn/detail?dataSetId=4b1dc3bd5c95400c8dfd93454e56cf73

下载链接

链接失效反馈

官方服务：

资源简介：

The dataset includes three sub datasets of the article, all of which are datasets used for model training in the article "Explanatory Detection of Privacy Protocol Violations Based on Large Language Model Fine tuning". This includes three parts: a dataset for pre classifying privacy protocol texts, a named entity recognition corpus for identifying key content of privacy protocols, and a corpus dataset for detecting violations using a large language model. The dataset for pre classification of privacy protocol text contains a total of 15627 data items, including classification labels and corresponding text data that have been classified according to privacy protocols. The original annotated corpus for named entity recognition of key content in privacy protocols includes data annotated with BOEM for named entity recognition of privacy protocols for 40 apps. The corpus dataset for violation detection using the big language model consists of three parts. The first part is the common knowledge of regulations, which includes a total of 147 basic contents from the Personal Information Security Specification; The second part is compliance labeling, which includes a training dataset of 1488 data sets obtained by balancing the core content of the privacy agreement with violation labeling corpora; The third part is the semantic interpretation text of the regulations, which includes a dataset of 139 articles obtained by manually interpreting some of the content in the second part of the data.

提供机构：

Science Data Bank

创建时间：

2025-12-02

5,000+

优质数据集

54 个

任务类型

进入经典数据集