The Success of AIOps Requires Synthetic Internet Telemetry Data

IT teams frequently discover that their AIOps platform was trained on a limited set of telemetry data. For instance, it may have been gathered from a DevOps platform that does not have complete visibility into the distributed computing environment in which their application is running. It is highly unlikely that the machine learning algorithms at the core of any AIOps platform will generate the most effective recommendations for optimising application experiences in the absence of synthetic telemetry data collected from an Internet performance monitoring (IPM) platform.

The probabilistic nature of AI presents a challenge. The quality of the data that has been presented to the AI model determines the relevance of the recommendations that are generated. Real user data, for instance, may be scarce or nonexistent. Obviously, it is exceedingly unlikely that the AI’s recommendations will result in superior application experiences if telemetry data was never initially shared with the AI model.

Utilising Synthetic Data to Enhance Visibility

IT teams must guarantee that the data utilised to train the AI model accurately represents the production environments in which applications are deployed. In other words, regardless of the sophistication of an AI model, garbage in still results in garbage out. The data provenance of the underlying AI model must be known by any IT team prior to the adoption of an AIOps platform. The recommendations generated will be restricted if the pool of AI training data is restricted. IT teams will not place their trust in AIOps platforms that are recommending specific actions based on partial or incomplete data. Nor should they.

Rather, the teams will presume that each output must be verified before the subsequent step in a process can be taken. Ultimately, the only thing that is more detrimental than being incorrect in the context of AI and IT is to be incorrect on a catastrophic scale. Of course, the objective of investing in an AIOps platform that is designed to manage tasks in parallel is arguably defeated by the continued management of IT in a sequential manner.

A suboptimal outcome will result from any attempt to apply AI to IT management that does not incorporate synthetic Internet telemetry data, given the dependence of modern applications on Internet services. The inclusion of this type of telemetry will provide DevOps teams with the necessary insights to guarantee the attainment and maintenance of key performance indicators (KPIs).

Numerous artificial intelligence models

In all likelihood, there will not be a single AI model that can govern them all. Networking, security, and other IT service management (ITSM) platforms will frequently have already implemented AI to analyse the telemetry data they acquire in real time. The output from those AI models will be subsequently shared with AIOps platforms to automate a series of tasks on an end-to-end basis that would have previously required IT teams to orchestrate workflows across multiple islands of automation.

Consequently, DevOps teams are obligated to assess the effectiveness of a network of AI models that will be established in the near future. There are numerous such tools, each of which is or will be intended to automate specific tasks, such as the analysis of Internet traffic to identify the source of bottlenecks that may only intermittently affect an application. Once equipped with insights, the AIOps platform can generate consistently useful recommendations that DevOps teams can rely on. Then, they can allow the tools to automatically apply the recommendation to the rerouting of Internet traffic to maintain service level objectives (SLOs).

Fulfilling the Potential of Artificial Intelligence

The toil that DevOps teams frequently encounter can be significantly reduced as AIOps platforms improve. Teams can spend weeks attempting to identify the underlying cause of an issue that, once identified, may require only a few minutes to resolve. The issue is that the source of the problem frequently has little to do with the DevOps team’s immediate control, such as when latency generated by an Internet service negatively affects application performance. Nevertheless, these insights should allow DevOps teams to submit support requests that more accurately identify the precise source of an Internet service issue, thereby enabling their provider to address it more promptly. Just as crucially, the DevOps team can transition to issues that are more directly under their control.

One of the primary sources of stress for any DevOps team is the lack of knowledge regarding the true cause of an issue that, despite their best efforts, continues to generate a series of ongoing alerts. AIOps guarantees to alleviate this stress by simplifying the process of first correlating causation and, subsequently, automating remediation. Nevertheless, that assurance will never be fully realised if the data used to train the AI model does not provide a comprehensive enough perspective to enable the making of a truly informed decision.

Read More: 3 Connections Between DevOps and the Cloud for Cloud DevOps