Monitoring VMware with Artificial Intelligence

SIOS Briefing Note

There is no way we can optimize all the VMs in this world — and yet they need to be optimized. The sooner we all accept these two facts, the sooner we can move on to something better. SIOS iQ believes they have the solution in their artificially intelligent monitoring of VMware.

What did the monitoring of computer systems look like before data centers were virtualized? At one point in computing history, there wasn’t much more than the temperature sensor on the Liebert cooling your data center. Of course even that failed if you simply cut power to the Liebert. We then started adding both monitoring and reporting functions to many items within the data center.

Data center analytics systems were based on thresholds. Examples include telling us when a filesystem was full, a CPU was overtasked, or a server was out of RAM and swapping everything to disk. More advanced monitoring might include monitoring process performance, such as how long a database query or web page load might take. If a database query that used to take one minute was suddenly taking three minutes, something must be wrong.

Notices and warnings from monitoring systems were typically based on thresholds. You’d specify to be told if a filesystem was more than 80% full, or if a CPU was more than 80% utilized. You might ask to be told when a web page load ever takes longer than 20 seconds. Even before VMs, the challenge with good monitoring is finding the balance between making sure you know when a threshold is passed, and the risk of sending so many notifications that everything just gets ignored.

Reporting, on the other hand, was focused on explaining how things went after the fact. A perfect example would be backup success reporting. Such a system would tell you which backups succeeded and failed last night or last week. The better ones could give you some trends, like which backups failed more often than others.

But today IT personnel don’t manage hundreds of systems; they manage thousands of VMs. Trying to identify and set the right thresholds on that many VMs is beyond daunting. Since the typical trend is to over-report — lest you miss something really important — thousands of VMs are going to create an awful lot of email. A single problem with a hypervisor server can cascade into hundreds or thousands of emails, none of which actually tell you what’s going on. This is why this blog post started with the idea that this has now become impossible.

Thresholds only provide a single data point about a single dimension in a single data center “silo”. They can’t handle the multidimensional, interrelated nature of virtual environments where a symptom that shows up on one VM may have its root cause on a different VM altogether.

It is toward that end that SIOS iQ has released their monitoring system that uses machine learning and artificial intelligence to automatically identify patterns of behavior among interrelated objects when they are happening, and correlate them back to the original cause of the problem, whether that cause is in the application, network, infrastructure, or storage. Since the product can see across the entire environment, it can “watch” all your VMs and the environment they operate in and apply machine learning logic to automatically figure out what’s going on and what the root cause of it is. In addition to being able to identify critical issues when they’re happening, SIOS iQ is designed to look for trends over time — combining the logic of monitoring and reporting mentioned previously. It not only identified root causes of issues, but also recommends specific solutions and calculates the potential cost and performance benefits it’s recommended changes will result in.

StorageSwiss Take

SIOS iQ is the first product we are aware of to apply machine learning and artificial intelligence to the problem of monitoring and trend analysis. This correlation and analysis of hundreds of thousands or millions of data points is bound to create much more valuable information than typical threshold methods. The story sounds good; we look forward to watching the execution of this vision.

W. Curtis Preston (aka Mr. Backup) is an expert in backup & recovery systems; a space he has been working in since 1993. He has written three books on the subject, Backup & Recovery, Using SANs and NAS, and Unix Backup & Recovery. Mr. Preston is a writer and has spoken at hundreds of seminars and conferences around the world. Preston’s mission is to arm today’s IT managers with truly unbiased information about today’s storage industry and its products.

Tagged with: , , , , , , , ,
Posted in Briefing Note

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 21,738 other followers

Blog Stats
  • 1,056,777 views
%d bloggers like this: