ARCANNA Part 2 – Full-stack Automated Root Cause Analysis

In the previous article we saw how cumbersome and time consuming a troubleshooting process can become when dealing with a situation in which we have little control. When alerts start blaring it’s difficult to decide which to address first considering the multitude of devices, services and applications that are interconnected.

However, new technologies such as Machine Learning has made such problems more approachable by opening up our possibilities and presenting us new opportunities.

By combining the Elastic Stack together with neural networks we created an automated process for root cause determination which we (lovingly) called ARCANNA (Automated Root Cause Analysis Neural Network Assisted). ARCANNA was created as an open source Elastic plugin which is easy to install and configure.

There are three actors which work together to enable ARCANNA to efficiently identify the root cause of issues: the Elastic Stack, a Neural Network and the user.

 

Elastic Stack

The role of the elastic stack is to collect data and logs from the entire IT infrastructure stack (physical, virtual, cloud and containers), enriching the data and storing it so that it can be used by the neural network to determine the possible causes of issues as well as acting as the graphical interface for ARCANNA since it is a plugin that can be easily installed and configured.

This is also where the results of the analysis will be sent and stored in order to further analyse them and also leverage Elastic’s machine learning and automated alerting capabilities.

 

ARCANNA

ARCANNA is the key element of this solution and is reliant on two functions: clustering the events that are stored in the Elastic Stack which enables us to narrow down the possible suspects and probable root cause determination algorithm which reduces the number of devices, services or applications that need to be checked.

ARCANNA is also the GUI from where the jobs for root cause determination can be configured and watched which enables users to quickly and easily manage them.

 

The User

The user’s role is essential in the process of root cause determination and this role is made easy through ARCANNA’s GUI and intuitive controls.

First of all the user must create the Machine Learning job which starts the process of analysis. Once the analysis is started the user must also provide feedback regarding the results. This ensures that the process is continuously being calibrated for increased efficiency as well as offering control over the environment and results.

Finally since the results from ARCANNA are translated into Elastic indexes, the user can further analyse the results by creating dashboards, watchers for automated alerting and machine learning jobs.

The end result of ARCANNA can be translated in less tickets, fewer false-positives, less time spent in calls, more data to work with and a knowledge base which will enable IT teams to optimize their infrastructure.

If you want to find out more, we look forward to seeing you at Elastic{on} 2019 where ARCANNA will be showcased and demoed.