Category: (2) TAM Application Type
Application Identifier: 7.10.2
Maturity Level: 4
Overview
Fault Correlation & Root Cause Analysis collects the various fault
events in the network as well as other relevant information such as network
topology, and relates these events, reducing the number of raw events to
some smaller number. Root Cause Analysis (RCA) enables the end user to
quickly determine the root cause of a problem in the network. These applications
have a unique role in mediating network alarms with topology and configuration
data.
Functionality
Fault
Correlation & Root Cause Analysis functionality includes the
following:
- Alarm
Correlation (the ability to collect all relevant fault events along with other
relevant information and reduce them to some smaller manageable
number). This can include:
- Alarm
de-duplication – first level of alarm reduction based on pre-defined user
criteria. Alarm de-duplication is designed to eliminate repeated events to
reduce the amount of “noise” from the network. The application should
provide end user with capability to define rules for de-duplication.
- Alarm
auto-clearing – ability of the application to correlate a previous alarm
with a clear-alarm received from the source (NE, NMS, and EMS). The
application should deliver “out-of-the-box” auto-clearing capabilities for
each device type/EMS/NMS supported, as well as capabilities for end users to
define their own auto-clearing rules.
- Alarm
thresholding – ability of the application to handle various thresholding
scenarios such as alarm flapping and integration with performance management
systems to receive threshold crossing alarms, as well as generate synthetic
threshold alarms based on pre-defined user conditions. The application
should provide end user the ability to maintain “out-of-the box” rules, as
well as develop their own rules for threshold management.
- Correlating
alarms with supporting data (topology, configuration), including
- intra
and inter-element.
- inter-element
(including up/down the various network layers)
- service-based;
In order for the application to do topology based correlation, the
application must be “topology aware”. Topology awareness can be achieved
through autodiscovery or integration with an inventory management
application. Inter-element and service based correlation can only be
achieved if the inventory data is valid and is available for integration
with the correlation application.
- Alarm
enrichment (external database connectivity)
- Ability
to associate services to the physical aspects of the network.
- Filter,
summarize, and reduce displayed alarms
- Consolidation
of alarms
- Consolidating
alarms across technology
- Consolidating
alarms across elements
- Present
to alarm console
- Graphical
display of fault / topology overlay
- Provide
alarm to other systems
- Store
the alarms and root cause for extended periods
- Root
Cause Analysis – (RCA) ability to pinpoint the root cause of the problem or in
some instances probable cause of the problem. The application should have the
ability to:
- Root
Cause isolation based on correlation analysis
- Fault
isolation
- Network
Element / network layer attribution
- Alarm
consolidation / substitution as well as alarm suppression of the sympathetic
alarms.
- Problem
identification / initiation (ticket creation). Once Root Cause/Probable
cause is determined, the application should have the ability to integrate
with trouble management application for manual/automated ticket creation.
- Resolution
initiation (testing, solution identification/ownership, knowledge base
index). The application should have capability to integrate with various
testing applications. Integration with testing should be rules bases.
- Knowledge
of topology
- Present to alarm console
- Drill down from root cause into
details
Supported Business Services