Cybersecurity, Cloud Computing, and IT Support Q&A Dataset
收藏IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/cybersecurity-cloud-computing-and-it-support-qa-dataset
下载链接
链接失效反馈官方服务:
资源简介:
This paper presents a comprehensive English cybersecurity question-answer dataset designed to serve as a knowledge base for Retrieval-Augmented Generation (RAG) systems and cybersecurity support applications. The dataset comprises 2,650 unique question-answer pairs covering three critical domains: cybersecurity (40%), cloud computing (35%), and IT support (25%). This resource addresses the need for high-quality, domain-specific knowledge bases that can power AI-driven cybersecurity assistance systems.The dataset encompasses diverse cybersecurity scenarios, including incident response procedures, security threat analysis, network security protocols, digital forensics, authentication systems, cloud security frameworks, compliance standards (GDPR, HIPAA, SOX, PCI-DSS), and technical support workflows. Each question-answer pair covers varying complexity levels, from basic troubleshooting queries to sophisticated cybersecurity incident management, incorporating domain-specific terminology such as CVE references, MITRE ATT&CK framework classifications, OWASP categories, and ENISA guidelines.The knowledge base was systematically constructed through expert curation and comprehensive preprocessing pipelines that preserve semantic integrity while ensuring compatibility with modern AI systems. Questions range from simple factual inquiries to complex multi-faceted problem-solving scenarios, while answers provide actionable guidance with appropriate technical depth. The dataset includes technical acronyms, multi-step procedural descriptions, vulnerability classifications, and compliance terminology that reflect real-world cybersecurity professional needs.Each entry follows a structured format with clearly defined question and answer components, making it suitable for training and evaluating retrieval-augmented generation systems, question-answering models, and knowledge-based chatbots focused on cybersecurity domains. The preprocessing includes text cleaning, encoding standardization, and semantic integrity preservation to ensure high-quality inputs for downstream AI applications.This dataset enables researchers and practitioners to develop and evaluate AI systems capable of providing accurate, context-aware cybersecurity support. The resource supports research in information retrieval, domain-specific natural language processing, and cybersecurity automation applications. By providing a comprehensive English knowledge base spanning multiple cybersecurity domains, this dataset facilitates the development of intelligent cybersecurity assistance systems that can help organizations manage security challenges more effectively.The dataset is released to promote research in cybersecurity AI systems and to support the development of more accessible and effective cybersecurity knowledge management tools. This resource represents one of the most comprehensive cybersecurity QA datasets available, contributing valuable domain-specific content to both the cybersecurity and natural language processing research communities.Keywords: Cybersecurity Dataset, Question-Answering, Knowledge Base, Retrieval-Augmented Generation, Natural Language Processing, IT Support, Cloud Computing Security, Incident Response, Technical Documentation, AI Training Data
提供机构:
Iain Rice; Haitham Mahmoud; Ralitsa Stoycheva; Nouh Elmitwally; Artid Boonrerng; Faisal Saeed



