Cross-model evaluation of phishing detectors against LLM-generated emails: dataset, code and results

Name: Cross-model evaluation of phishing detectors against LLM-generated emails: dataset, code and results
Creator: Zenodo
Published: 2026-05-17 04:32:38
License: 暂无描述

Zenodo2026-05-17 更新2026-05-26 收录

下载链接：

https://zenodo.org/doi/10.5281/zenodo.20250116

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset accompanies the manuscript "Cross-model evaluation of phishing detectors against LLM-generated emails" by Gutierrez, Villegas-Ch and Govea (2026), submitted to Frontiers. The repository contains the complete data, code and experimental results from the study: - Code (Python 3.11): a 7-script pipeline for assembling the corpus, extracting 17 stylometric features, training and evaluating classifiers under intra-model, cross-model, threshold-recalibrated, cross-dataset, and aggregated-pool conditions, and generating all figures in the manuscript. - Data: stylometric feature CSVs for a combined corpus of 9,986 phishing emails (5,000 human-written from CEAS-08, TREC-07, Nazario, Nigerian Fraud, lingspam and a fraud-labeled Enron subset; and 4,986 LLM-generated using GPT-4.1, DeepSeek 3.2 and LLaMA 3.3 70B). - Results: per-task outputs including the 3x3 cross-model transferability matrix, its threshold-recalibrated counterpart, intra-model 5-fold cross-validation metrics, cross-dataset human verification, aggregated-pool results, and SHAP feature-importance values per LLM. Key headline findings: intra-model F1 above 0.955 with XGBoost on all three LLMs; default-threshold cross-model transferability gap of 28.1 percentage points; gap reduced to 4.0 percentage points (86% reduction) by recalibrating the decision threshold on a 30% slice of the target LLM; aggregated-pool detector achieves F1 = 0.997 on each individual LLM. Code is released under MIT License; data under Creative Commons Attribution 4.0 International (CC BY 4.0). The dataset is intended exclusively for defensive security research and academic study. See README.md and LICENSE for details and responsible-use statement.

提供机构：

Zenodo

创建时间：

2026-05-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集