A 4-Month Dataset of SSH Botnet Interactions and Command Payloads
收藏DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19815504
下载链接
链接失效反馈官方服务:
资源简介:
Overview
This dataset contains 145,425 security events collected by a custom multi-threaded SSH Honeypot. The data reflects real-world automated and manual attack patterns against Linux-based systems, captured over a focused 4-month observation window from July 27, 2025, to November 14, 2025.
Research Context
The collection was conducted as part of the research project 'High-Interaction SSH Threat Intelligence & Attack Modeling' at the National University 'Odesa Law Academy'.
Revision History (v2.1 Update)
Version 2.1 (April 2026): Final validated release.
Logging Level Inversion: Physically updated the level column. INFO now represents transport-layer noise (94.6%), while WARNING marks active application-layer interactions (5.4%).
Metadata Synchronization: All documentation and BibTeX records are updated to reflect the refined 4-month data window and final event counts.
Version 2.0: Conducted thorough data sanitization, excluding 74 internal administrative sessions (localhost) and debugging logs from the initial setup phase.
Version 1.0: Initial raw release.
Technical Specifications
Engine: Multi-threaded Python 3.10 application using the Paramiko library.
Core Logic: Handles SSHv2 transport and authentication layers by subclassing paramiko.ServerInterface.
Session Management: Incoming connections are encapsulated in individual threads, where each session is assigned a unique UUID for full "kill chain" reconstruction.
Payload Interception: Command requests are intercepted via the check_channel_exec_request method, allowing for the capture of raw payloads (including malware droppers and fileless /dev/tcp strings) without executing them on the host system.
Persistence: Data is saved to a SQLite 3 database in real-time using a synchronous write-ahead logging (WAL) approach.
Key Research Findings (v2.1)
Attack Intensity: Analysis shows peak intensities exceeding 10,700 interactions per hour during automated surge events.
Payload Diversity: The dataset captures 28 unique interactive shell sessions, including sophisticated fileless exploitation via bash sockets.
Credential Intelligence: Records 2,109 unique credential pairs, providing insights into modern automated brute-force patterns.
High-Fidelity Noise Reduction: The pre-filtered level field allows researchers to immediately isolate the 5.4% of high-value attack payloads from background connection noise.
Data Structure
The dataset is provided in SQLite3 (.db) and CSV formats. Fields: id, timestamp, session_id, ip, port, event_type, message, command, level.
Authors & Affiliation
Viktor Boiko (ORCID: 0000-0001-5929-657X) — Scientific Supervisor & Lead Researcher.
Oleksandr Niiakyi (ORCID: 0009-0005-1025-1617) — Software Developer & Researcher.
Affiliation: Faculty of Cybersecurity and Information Technologies, National University "Odesa Law Academy".
Licensing
Creative Commons Attribution 4.0 International (CC BY 4.0).
提供机构:
Zenodo
创建时间:
2026-04-27



