five

Automatic root cause analysis of network failure on IP/MPLS network using machine learning and case-based reasoning

收藏
DataCite Commons2025-02-04 更新2025-04-16 收录
下载链接:
http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14457/TU.the.2024.102
下载链接
链接失效反馈
官方服务:
资源简介:
The IP Multiprotocol Label Switching (IP/MPLS) network is a complex system comprising switches, routers, DWDM (Dense Wavelength Division Multiplexing) devices, Network Management System (NMS) servers, and various other components. Managing such large-scale networks requires multiple tools and advanced network management techniques. Due to the intricate architecture and interconnectivity of IP/MPLS, identifying and resolving network issues, particularly chain failures, is a challenging task. In chain failures, a single issue can cascade, affecting multiple interconnected devices. To address these issues, network operators rely on NMS, event or alarm signals from network devices, and frequently perform manual operational commands for further diagnostics. Given this complexity, a centralized approach is crucial for efficient network management. This article proposes a Multi-Purpose System that leverages Machine Learning and Case-Based Reasoning to enhance network operations and troubleshoot IP/MPLS networks. The system comprises several components: a Message Broker for real-time streaming of different message types (e.g., SNMP Traps, Syslog), a Log Template Generation service for standardizing logs, an Event Identification Service for classifying network events, a Node Chain Lookup Service for identifying impacted devices, and a Node Test Service for running diagnostic commands. Additionally, the system includes a Case-Based Fault Identification Service, which acts as a knowledge repository of historical fault cases and expert knowledge, a Notification Service for sending alerts through modern communication channels, and a Dashboard to provide network operators with root cause analysis and troubleshooting guidance. The proposed system aims to improve network availability and operational efficiency utilizing scenarios from the Provincial Electricity Authority of Thailand (PEA). We assessed its performance using event messages from various NMS. The results illustrate the system's efficacy regarding accuracy and performance, providing a reliable option for automating network troubleshooting and management.

IP多协议标签交换(IP/MPLS)网络是一类复杂系统,涵盖交换机、路由器、密集波分复用(Dense Wavelength Division Multiplexing,DWDM)设备、网络管理系统(Network Management System,NMS)服务器及各类其他组件。运维此类大规模网络需依托多种工具与先进的网络管理技术。由于IP/MPLS网络架构复杂且互联关系紧密,识别并排查网络故障(尤其是连锁故障)是一项极具挑战性的工作。在连锁故障场景中,单个故障点可能引发级联效应,影响多台互联设备。为应对此类问题,网络运维人员通常依赖NMS、网络设备生成的事件或告警信号,并频繁执行手动运维指令以开展进一步诊断。鉴于此类网络的复杂性,集中式管理方案对于实现高效网络运维至关重要。本文提出一种多用途系统,该系统借助机器学习与基于案例推理(Case-Based Reasoning)技术,优化IP/MPLS网络的运维与故障排查工作。该系统包含多个核心组件:用于实时流式传输各类消息(如SNMP陷阱(SNMP Traps)、系统日志(Syslog))的消息代理,用于标准化日志的日志模板生成服务,用于分类网络事件的事件识别服务,用于定位受影响设备的节点链查询服务,以及用于执行诊断指令的节点测试服务。此外,系统还配备基于案例的故障识别服务(作为历史故障案例与专家知识的知识库)、用于通过现代通信渠道发送告警的通知服务,以及可为网络运维人员提供根因分析与故障排查指导的可视化仪表板。本系统拟依托泰国省级电力局(Provincial Electricity Authority of Thailand,PEA)的实际运维场景,提升网络可用性与运维效率。研究团队借助来自多套NMS的事件消息对系统性能进行了评估,测试结果验证了该系统在准确性与性能表现上的有效性,可为网络故障排查与运维自动化提供可靠解决方案。
提供机构:
Thammasat University
创建时间:
2025-02-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作