Chinese cybersecurity event dataset

Name: Chinese cybersecurity event dataset
Creator: IEEE DataPort
Published: 2024-08-12 14:30:25
License: 暂无描述

DataCite Commons2024-08-12 更新2025-04-16 收录

下载链接：

https://ieee-dataport.org/documents/chinese-cybersecurity-event-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

This paper introduces a new dataset named CSED, designed for Chinese cybersecurity ED. The dataset has collected approximately 18,000 news articles related to cybersecurity. We have drawn on the classification definitions of cybersecurity event types from the CAISE [38] , defining two event types: Attack and Vulnerability, and further subdividing them into nine sub-event types: Data Breach, Phishing, Ransom, DDoS Attack, Malware, Supply Chain, Vulnerability Impact, Vulnerability Discovery, and Vulnerability Patch. Additionally, sentences that do not contain any specific event are categorized as ‘NA’. The key to annotating cybersecurity event tasks is to identify trigger words; carefully selected trigger words can significantly enhance the efficiency of subsequent event recognition. We establish rules for the annotation process, selecting only the most representative event for annotation when a sentence contains multiple events of the different type. This approach avoids unnecessary redundancy and ensures a refined dataset. It includes 2054 event instances, 2 event types, and 9 sub-types.

本文介绍了一款命名为CSED的全新数据集，其面向中文网络安全事件检测（Event Detection，ED）场景打造。该数据集共收录约1.8万篇与网络安全相关的新闻稿件。我们参考了CAISE[38]中关于网络安全事件类型的分类定义，将事件划分为攻击（Attack）与漏洞（Vulnerability）两大类别，并进一步细分为9个子事件类型：数据泄露（Data Breach）、网络钓鱼（Phishing）、勒索（Ransom）、分布式拒绝服务攻击（DDoS Attack）、恶意软件（Malware）、供应链（Supply Chain）、漏洞影响（Vulnerability Impact）、漏洞发现（Vulnerability Discovery）以及漏洞修复（Vulnerability Patch）。此外，未包含任何特定事件的语句将被归类为‘NA’类别。网络安全事件标注任务的核心在于识别触发词（trigger words）；经过精心筛选的触发词可显著提升后续事件识别任务的效率。我们为标注流程制定了规则：当单条语句包含多种不同类型的事件时，仅选取最具代表性的事件进行标注。该策略既避免了不必要的冗余，同时保障了数据集的精细化品质。该数据集共包含2054个事件实例、2个事件大类以及9个子类别。

提供机构：

IEEE DataPort

创建时间：

2024-08-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集