Full Experimental Results for liBERTa: Browser-based Privacy Policy Classification

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14911061

下载链接

链接失效反馈

官方服务：

资源简介：

Introduction This record provides a comprehensive data set and supporting materials related to a browser extension for real-time privacy policy classification. The contents include: Privacy Policy Classification Performance: Detailed results from training and evaluating various Transformer-based models (e.g., TinyBERT, BERT, ELECTRA, DeBERTa, ALBERT) on the OPP-115 Corpus. The data set features training and validation loss histories, performance metrics (accuracy, precision, recall, and F1 scores), per-class results, and confusion matrices. Web Browser Extension Performance: Benchmark data measuring key operational stages of the extension, such as URL retrieval, text segmentation, tokenization, and model inference. Performance is reported across multiple trials to assess real-world latency and efficiency. Bayesian Search Results: A summary of the optimal hyperparameter configurations obtained through Bayesian optimization, detailing the settings that maximize macro-F1 scores for each model. Data Splits: An explanation of the contiguous data splitting strategy used to partition the OPP-115 corpus into training, validation, and test sets, along with the mapping between global and local sample indices. Privacy Policy Classification Performance The classification results for each model are provided in the file cls_results.json. These results correspond to the best configuration identified through a Bayesian optimization search (see below), which was conducted to maximize the macro-F1 score on the validation set. The hyperparameter tuning process ensured that the reported models represent the optimal balance between performance and complexity. The JSON file contains a list of objects, where each represents the outcome of training and evaluating a specific model configuration. The structure of each object is described below: model: The name of the model (e.g., TinyBERT, BERT, ELECTRA). size: The size of the model (e.g., small, base, large). target: A list of classification categories the model was trained to predict. This field remains consistent across all objects and is provided for completeness. class_weights: The weight assigned to each class during training to counter class imbalance. The $i$-th weight corresponds to the $i$-th class in the target list and is used in the loss computation. This field remains consistent across all objects, as all models were trained on the same data, and is provided for completeness. train_losses: A list representing the loss values computed on the training set after each epoch, capturing the model's learning progress. val_losses: A list representing the validation loss values computed after each training epoch. val_metrics: Contains validation metrics recorded after each epoch. Metrics follow the naming convention: -, where: can assume micro, macro, weighted values. can be accuracy, precision, recall, f1. best_epoch: Denotes the epoch at which the model achieved the highest macro-F1 score on the validation set. The first epoch is indexed as 1. The final model corresponds to this epoch. test_metrics: This object summarizes the model's final performance on the test set, evaluated at the best_epoch. It contains: Aggregated metrics: similar to val_metrics, it includes micro, macro and weighted averages for accuracy, precision, recall, and F1-score. Non-aggregated metrics: provides per-class values for accuracy, precision, recall, and f1. The ordering of these metrics corresponds to the order in the target list, i.e., the $i$-th element in any of these lists corresponds to the $i$-th category. confusion_matrix: the multi-label confusion matrix for the test set. It is represented $i$-th element is a $2\times 2$ binary confusion matrix, corresponding to the $i$-th class in the target list, of the form: $$ [[C_{0,0}, C_{0,1}],$$ $$ [C_{1,0}, C_{1,1}]] $$ where: $C_{0,0}$ are the True Negatives. $C_{0,1}$ are the False Positives. $C_{1,0}$ are the False Negatives. $C_{1,1}$ are the True Positives. Web Browser Extension Performance The Web Browser Extension Performance for privacy policy classification are stored in ext_results.json. The structure of each object is described below: model: The name of the Transformer model used for classification (e.g., TinyBERT, BERT, ELECTRA). size: The size of the corresponding model (e.g., small, base, large). urls: The list of tested domains. The list contains 5 lists, each corresponding to an attempt. Thus, $\text{url}_{i,j}$ corresponds to the $j$-th tested domain at the $i$-th trial, with $i=1..5$. This field remains consistent across all objects and is provided for completeness. segments: The number of text segments extracted from the privacy policy page during each trial. $s_{i,j}$ corresponds to the number of segments extracted from the $j$-th domain in the $i$-th trial, with $i=1..5$. Each element directly corresponds to the same index in the urls field. url_extraction: The time required (in milliseconds) to extract the privacy policy URL from the active browser tab. $t_{i,j}$ represents the extraction time for the $j$-th domain in the $i$-th trial. Each element directly corresponds to the same index in the urls field. segmentation: tokenization: The time required (in milliseconds) to tokenize the extracted privacy policy text for classification. $\tau_{i,j}$ represents the tokenization time for the $j$-th domain in the $i$-th trial. Each element directly corresponds to the same index in the urls field. inference: The time required (in milliseconds) to perform classification on the tokenized privacy policy segments using the specified model. $\iota_{i,j}$ represents the inference time for the $j$-th domain in the $i$-th trial. Each element directly corresponds to the same index in the urls field. 📅 Test Dates Start Date: February 20, 2025. End Date: February 22, 2025. Testing Environment: Google Chrome 131.0.6778.69 on Windows 11, NVIDIA RTX 3060 Gaming Edition GPU (12 GB VRAM), AMD Ryzen 7 5800X CPU (8 physical cores, 16 threads) @ 3.8 GHz, 64 GB DDR4 RAM, 2TB NVMe M.2 SSD (PCIe Gen 3 x4). Test Conditions: All performance tests were conducted under identical network conditions and hardware configurations.

创建时间：

2025-03-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集