Looking for the Perfect Dashboard: InfluxDB, Telegraf and Grafana – Part XV – IPMI Monitoring of our ESXi Hosts

Greetings friends, we have spoken on numerous occasions about the power of InfluxDB, Telegraf and Grafana, if you remember not long ago I left you this fantastic post on how to monitor your vSphere for free and in less than 5 minutes:

The post, and the Dashboards ready to consume, which have several thousand downloads, are very complete, but if it is true that many of you have asked me if we could go a step further and also monitor the physical part of the ESXi Hosts, using the IPMI.

At the end of the blog posts you can have an interesting Dashboard as simple and useful as this one, where we can see the temperatures of the different components (it’s different for each manufacturer, I’ll leave you ready HPE and Supermicro):

You can see it live without installing anything here:

What is IPMI or Intelligent Platform Management Interface?

The IPMI chip is a stand-alone chip found on certain motherboards, typically in server architectures. This chip is responsible for monitoring the basic components of the motherboard, for example IPMI can know the temperature of different chips such as the CPU, RAM modules, network chips, as well as provide asbtraction of keyboard, mouse and monitor, allowing through an applet or HTML5 plugin, connect through a browser emulating that we have connected VGA ports, mouse and keyboard. In addition, and as if all the above were not enough, it also allows us to abstract the DVD/CD drive and we can mount our ISOS to servers that may be miles away from us, in a transparent way doing the redirection, here a diagram thanks to Wikipedia:

https://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface#/media/File:IPMI-Block-Diagram.png
https://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface#/media/File:IPMI-Block-Diagram.png

This IPMI is known in other manufacturers or jargon of IT administrators like iLO, RemoteKVM, remote hands, out-of-band management, etc.

Monitoring with InfluxDB, Telegraf and Grafana

Telegraf is the best agent to collect information from a lot of applications that it already supports in the inputs part in a native way, specifically Telegraf already comes with IPMI support as input, so we will go to the [input.ipmi] section and we will have to configure the section with our details, for example in my case my two Servers, one HPE and another Supermicro:

In addition, we will have to install ipmitool in case the telegraf server does not have it, with these steps:

And it would be, if, I know, it’s so simple that it almost scares, we’ll restart the telegraph service:

And if we don’t see any error when we launch service telegraf status after a minute, we can continue with the import of the Grafana Dashboard.

Grafana Dashboards

This is where I’ve worked really hard, as I’ve created a Dashboard from scratch by selecting the best database requests, finishing off colors, thinking what graphics and how to display them, and everything is automated so that it fits your environment without any problems and without having to edit anything manually. You can find the Dashboards here, once imported, you can move between this and the other vSphere with the top menu on the right, now it’s time to download them, or know the ID at least of them:

Import the Grafana Dashboard easily

So that you don’t have to waste hours configuring a new Dashboard, and ingesting and debugging queries, I’ve already created you a stupid Dashboard with everything you need to monitor our environment in a very simple way, it will look like the image I showed you above.

From our Grafana, we will do Create – Import

Select the name you want and enter the ID: 9985, which is the unique ID of the Dashboard, or the URLs:

With the menu at the top right, you can switch between the Dashboards of Hosts, Datastores, VMs and of course the main one of Overview or the latter about Temperature and more using IPMI:If you want to see them working without installing anything, here is the link to my environment:

Nothing more, folks, I hope you like it, and I’d like to leave the whole series here:

2 Thoughts

  1. Jorge,

    Thanks for another great tutorial. I have 3 HP ESXi hosts and I wanted to know if there was a way to extend this dashboard to cover them? Also for The one HP server and my Supermicro server I am only getting inlet ambient temp and system temp. The rest of the metrics are not reporting. Any ideas why this would be?
    Thanks!

  2. Hello NJL, you will need to play with the query, as my assumption is that all different vendors, and MODELS, have different IPMI I am afraid 🙁 Let me know if you get stuck so I can help

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.