five

Explainable Detecting Noncompliance in Privacy Agreement based on Large Language Model Fine-tune-dataset

收藏
DataCite Commons2025-12-02 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=4b1dc3bd5c95400c8dfd93454e56cf73
下载链接
链接失效反馈
官方服务:
资源简介:
The dataset includes three sub datasets of the article, all of which are datasets used for model training in the article "Explanatory Detection of Privacy Protocol Violations Based on Large Language Model Fine tuning". This includes three parts: a dataset for pre classifying privacy protocol texts, a named entity recognition corpus for identifying key content of privacy protocols, and a corpus dataset for detecting violations using a large language model. The dataset for pre classification of privacy protocol text contains a total of 15627 data items, including classification labels and corresponding text data that have been classified according to privacy protocols. The original annotated corpus for named entity recognition of key content in privacy protocols includes data annotated with BOEM for named entity recognition of privacy protocols for 40 apps. The corpus dataset for violation detection using the big language model consists of three parts. The first part is the common knowledge of regulations, which includes a total of 147 basic contents from the Personal Information Security Specification; The second part is compliance labeling, which includes a training dataset of 1488 data sets obtained by balancing the core content of the privacy agreement with violation labeling corpora; The third part is the semantic interpretation text of the regulations, which includes a dataset of 139 articles obtained by manually interpreting some of the content in the second part of the data.
提供机构:
Science Data Bank
创建时间:
2025-12-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作