Machine Learning in IT Operations: Observability and Governance
In the previous post, we saw how machine learning can help improve IT operations, adding efficiency and cost savings. In this post, we will talk about how machine learning can play a role in different aspects of IT operations. Understanding the role played by machine learning will help IT organizations use the tool more effectively by taking advantage of the predictions.
We are still in early days of using machine learning/artificial intelligence in IT operations. We are seeing a wave of products that take advantage of machine learning in areas of IT operations like Observability, Cost Management and Governance.
This is one area where machine learning is helping cloud operations. With monitoring data, log data present across various services and cloud providers (in the multi cloud context), it is difficult to correlate data across these sources and gain actionable insights. Observability is the right candidate to use machine learning and help IT operations gain timely insights for proactive action. Traditionally, monitoring depended on understanding the failure modes to decide what needs to be monitored. Even in the SRE world, the decision on what to monitor was based on the common failure modes wrapped with the knowledge of the systems and needs in a particular organization. However, with multiple services from various cloud providers, it is not enough for IT operations to rely just on the traditional failure modes. Machine learning steps in to help operators understand the systems better and identify even grey failures before it happens
One of the biggest headache for IT is to control the costs. Anyone who had managed the costs either through a spreadsheet or through a service focused on cost management, knows how complicated it can become in a short while. The key is to tame the complexity and ensure proper forecasting. It is not easy even with a single cloud provider but it gets even more complex with multi cloud. With machine learning, you not only forecast better but also at a more granular level. If the system determines that a cluster may not receive traffic during a weekend, it can recommend shutting down the cluster for that time period. The impact of machine learning on cost management is going to help organizations use resources more efficiently and save costs.
Governance is a hard problem in a multi cloud world. Machine learning will ease the compliance. It can help make recommendations on non compliant issues and help organizations stay consistent in the enforcement of governance.
These are some early innovations with machine learning and CoreStack platform is tapping into machine learning to help IT operations in CloudOps, Cost Management and Governance.