Fault Management¶
Fault Management is the process of collecting and processing incoming events, coming from various sources. NOC provides flexible event processing pipeline split to cleanly separated stages:
- Collection - Collecting events from external sources, like Syslog, SNMP Trap, active probes, metrics thresholds and injecting them into event processing pipeline.
- Classification - removing of all device-depended personality and replacing them by generalized Event Classes. NOC recognizes about 300 event classes out of the box.
- Correlation - Analysis of possible alarm opening and closing events, rule-based correlation, topology-based correlation, raising and clearing of alarms, calculation of service impact
- Escalation - Rule-based alarm processing, notification and escalation to external trouble ticket systems.
Each stage processing by different set of microservices, allowing to adjust amount of workers according your actual workload. Multi-stage processing allows to focus monitoring staff to fix only actual problems which causes service degradation.