Looking for the Perfect Dashboard: InfluxDB, Telegraf and Grafana – Part XII (Native Telegraf Plugin for vSphere)

Greetings friends, today I bring you another one of those hidden gems that you like so much. In addition to being free and being able to display it in a few minutes, it has a potential that many commercial tools would like.

Today we are about to create four fresh Grafana Dashboards within minutes, at the end of the blog, we can have some Dashboards (in plural friends) similar to these:

vSphere Overview Dashboard

vSphere Hosts Overview Dashboard

vSphere Datastore Overview

vSphere VM Overview

Telegraf Plugin for VMware vSphere

My friend Craig told me that an official Telegraf plugin for vSphere had been released a few days ago, so the first thing I did was to go to his GitHub and check it out:

The plugin is pure joy, not only because it speaks directly with the vCenter SDK, but also because we can monitor all the following parameters:

  • Cluster Stats
    • Cluster services: CPU, memory, failover
    • CPU: total, usage
    • Memory: consumed, total, vmmemctl
    • VM operations: # changes, clone, create, deploy, destroy, power, reboot, reconfigure, register, reset, shutdown, standby, vmotion
  • Host Stats:
    • CPU: total, usage, cost, mhz
    • Datastore: iops, latency, read/write bytes, # reads/writes
    • Disk: commands, latency, kernel reads/writes, # reads/writes, queues
    • Memory: total, usage, active, latency, swap, shared, vmmemctl
    • Network: broadcast, bytes, dropped, errors, multicast, packets, usage
    • Power: energy, usage, capacity
    • Res CPU: active, max, running
    • Storage Adapter: commands, latency, # reads/writes
    • Storage Path: commands, latency, # reads/writes
    • System Resources: cpu active, cpu max, cpu running, cpu usage, mem allocated, mem consumed, mem shared, swap
    • System: uptime
    • Flash Module: active VMDKs
  • VM Stats:
    • CPU: demand, usage, readiness, cost, mhz
    • Datastore: latency, # reads/writes
    • Disk: commands, latency, # reads/writes, provisioned, usage
    • Memory: granted, usage, active, swap, vmmemctl
    • Network: broadcast, bytes, dropped, multicast, packets, usage
    • Power: energy, usage
    • Res CPU: active, max, running
    • System: operating system uptime, uptime
    • Virtual Disk: seeks, # reads/writes, latency, load
  • Datastore stats:
    • Disk: Capacity, provisioned, used

Impressive! right?, if you do not have yet Telegraf, InfluxDB and Grafana follow these steps (these for Grafana), but for some of you, who already have followed the whole series in Spanish, we only have to update our system to receive the vSphere plugin for Telegraf:

We will be able to see the telegraf package with an update, so we will say yes when it asks us to update:

Once we have the package installed, we only need to configure the telegraf.conf, found in /etc/telegraf/telegraf.conf, we will have to remove the # from the vSphere plugin:

Of course, we will also have to decomment all the parameters of the plugin:

Once done, if we are not using a valid SSL CA, or if the CA it is not installed on the Grafana, InfluxDB, Telegraf server, please uncomment this as well:

Another option is to download the SSL from our vCenter to our Telegraf, to trust it:

Let’s finally restart the telegraf service:

Verifying that we are ingesting information with Chronograf

The normal thing to these heights, if we have made well all the steps, is that already we are sending information compiled by Telegraf towards InfluxDB, if we realize a search using the wonderful Chronograf, we will be able to verify that we have information:

All the variables of this new vSphere plugin for Telegraf are stored in vsphere_* so it’s really easy to find them.

Grafana Dashboards

It is here where I have worked really hard, since I have created the Dashboards from scratch selecting the best requests to the database, finishing colors, thinking which graphic and how to show it, and in addition everything is automated so that it fits with your environment without any problem and without having to edit you anything manually. You can find the Dashboards here, once imported the four, you can move between them with the top menu on the right, now it’s time to download them, or know the ID at least of them:

How to easily import the Grafana Dashboards

So that you don’t have to waste hours configuring a new Dashboard, and ingesting and debugging queries, I’ve already created four wonderful Dashboards with everything you need to monitor our environment in a very simple way, it will look like the image I showed you above.

From our Grafana, we will make Create – Import

Select the name you want and enter one by one the IDs: 8159, 8162, 8165, 8168, which are the unique IDs of the Dashboard, or the URLs:

  • https://grafana.com/dashboards/8159
  • https://grafana.com/dashboards/8162
  • https://grafana.com/dashboards/8165
  • https://grafana.com/dashboards/8168

With the menu at the top right, you can switch between the Dashboards of Hosts, Datastores, VMs and of course the main one of Overview:

Some of the improvements that this Dashboard includes are the variable selections at the top left, depending on what you select, you will be able to see only the Cluster, ESXi, or VM you are interested in. Please leave your feedback in the comments.

If you want to see them working without installing anything, here is the link to my environment:

That’s all folks, if you want to follow the full Blog series about Grafana, InfluxDB, Telegraf, please click on the next links:

69 Thoughts

  1. Hi Jorge,

    I am looking to setup InfluxDB, Telegraf and Grafana – Part XII (Native Telegraf Plugin for vSphere) in our environment.
    Could you please provide the full installation and configuration document on windows platform.

  2. Very cool, I set this up this morning on a large instance and your dashboards are beautiful!
    I can’t seem to get datastore ‘used’ metrics though, perhaps our vSphere version 5.5 is too old ?

  3. Hello Tom,
    On which dashboard exactly? I have updated a new version, it is on the grafana.com site, please download the new version. Let me know exactly, or share some screenshots please 🙂

    Thank you for the feedback!

  4. Hi Guys,

    i get the error in the telegraf logs,

    [input.vsphere]: Error in discovery for ServerFaultCode: Request version ‘urn:vim25/6.7’ and namespace ‘urn:vim25’ are not supported

    Im unable to connect to my vCenter any ideas ?

    thanks in advance..

  5. Can you please try to do an apt-get upgrade or yum upgrade? It does look you might have some old openssl on the Telegraf side, also, would you mind to please let me know your vSphere version?

  6. This is great work i got it install with no issues trying to update the dashboards to allow another search field data center i am having no luck to find that key value any ideals ?

  7. Hello James, which Dashboard, and which panel trying to update? Is that DC inside the same VC?

  8. Hi Jorge,

    thanks for the blog article.
    You mean “insecure_skip_verify = true” instead of “insecure_skip_verify = false”, right ?

  9. All 4 dashboards. And yes the DC is in the same Vcenter. we have multiple vcenter with multiple DC by having this searching and filtering would be a great added value.

  10. Definitely, let me dig into it and I will let you know when the grafana.com it is updated.

    Thank you!

  11. Hi Wesley, as mentioned by you on Slack, uncomment the datastore section, like this:
    datastore_metric_include = []

    Best regards

  12. Hey Jorge,

    First, thank you for your awesome hard work!

    I am getting errors in telegraf from the vsphere plugin.

    [input.vsphere]: Error in discovery for : Post https:/// http: no Host in request URL

    Would you happen to know what the error means? I have not found anything.

  13. HI Again,

    Please can can i get some advise, I have managed to get all working (Very Awesome) but now im only getting certain datastore back,

    Its only pulling -7 through but i have 16 DS,

    this is what is in my config
    ## Datastores
    datastore_metric_include = [] ## if omitted or empty, all metrics are collected
    # datastore_metric_exclude = [] ## Nothing excluded by default
    # datastore_instances = false ## false by default for Datastores only

    any advice would be appreciated ..


  14. Hello David,
    Are the ones missing NFS? Can you please try to increase the timeout, also the max_query_objects and max_query_metrics, and on Grafana try to show a wider range, like the last 3 hours or so. Let us know

  15. Hi Jorge,

    Thanks for the reply, I have done as you have asked, i have also removed some metrics and its actually getting worse less metrics are getting pulled in and yes it was NFS datastore not being pulled in… this is what i have changed in my Config..

    with these change below i have all the datastore showing now but just no metrics

    ## Default data collection interval for all inputs
    interval = “60s” –changed from 10

    ## This controls the size of writes that Telegraf sends to output plugins.
    metric_batch_size = 10000 — changed from 1000

    # ## number of go routines to use for collection and discovery of objects and metrics
    collect_concurrency = 5
    discover_concurrency = 3

    # ## set to 64 for vCenter 5.5 and 6.0 (default: 256)
    max_query_objects = 1000 changed from 256

    # ## set to 64 for vCenter 5.5 and 6.0 (default: 256)
    max_query_metrics = 1000 changed from 256

    any help would be much appreciated


  16. Hi Jorge,

    Great work on this! Thank you! I was able to get it up and running quickly thanks to your documentation.

    The only issue that I have is that NONE of my Datastore are showing. They are all iSCSI and here’s my current settings per your documentation:

    ## Datastores
    datastore_metric_include = [] ## if omitted or empty, all metrics are collected
    # datastore_metric_exclude = [] ## Nothing excluded by default
    # datastore_instances = true ## false by default for Datastores only

    If you can give me some assistance I would appreciate it.


  17. Hi Edward, can you please change the timeout to something higher, and maybe the:
    ## Default data collection interval for all inputs
    interval = "60s"

    Will make the trick too

  18. Jorge,

    I’ve changed the timeout to “100s” and have updated the interval to “60s”, restarted the necessary services to reflect the changes and still NO info for all of my Datastores.

    Any other recommendation that you think I should change or look into?

    Just wondering, did your Dashboard work right off the bat or did you have to tweak it and made some changes to get your Datastore readings? If so, please let me know what other settings you might have updated to get the Datastore to show.


  19. Hi Edward,
    It does work out of the box with me, here are my config, just datastore and the tweaks:
    # Configuration for telegraf agent
    ## Default data collection interval for all inputs
    interval = "60s"
    ## Rounds collection interval to 'interval'
    ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
    round_interval = true

    ## Datastores
    datastore_metric_include = [] ## if omitted or empty, all metrics are collected
    # datastore_metric_exclude = [] ## Nothing excluded by default
    # datastore_instances = false ## false by default for Datastores only

    # ## timeout applies to any of the api request made to vcenter
    timeout = "180s"

    Then on the top of Grafana I select like 1 hour, 3 or 6 , it all does work, can you check on your chronograf if you are indeed sending any data at all? And review on tail -f /var/log/telegraf/telegraf.log that not errors appear?

    thank you!

  20. Hi Jorge,

    I’ve made all the changes you’ve recommended and unfortunately Datastore is still not showing.

    Only errors that I see is this:

    Oct 05 09:14:31 vm-stats telegraf[1110]: 2018-10-05T13:14:31Z W! [outputs.influxdb] when writing to [http://localhost:8086]: database “telegraf” creation failed: Post http://localhost:8086/query: dial tcp connect: connect

    The rest of the Dashboard is working perfectly other than the Datastore status/section.

    If you can think of anything else for me to look into that would be much appreciated.


  21. Great work!

    Question: how can we add more than one Vcenter?

    Can you explain what the syntax is please, I cannot find that anywhere, I have 2 vcenters.

    Something like:
    vcenters = [ “https://vcenter1.local/sdk” ] [ “https://vcenter2.local/sdk” ]
    Or maybe like this?
    vcenters = [ “https://vcenter1.local/sdk” “https://vcenter2.local/sdk” ]

    How is it done???

    Thanks in advance!

  22. Hello,
    I have not 2 vcneters to try, but it should be as it always is on Telegraf:
    vcenters = [ "https://vcenter1.local/sdk", "https://vcenter2.local/sdk" ]

    Cna you please try it?

  23. Hello Edward, what is that vm-stats, it is maybe another plugin you had? I will recommend taking a copy of the telegraf.conf to telegraf.conf.old then cp the telegraf.conf.dpkg-dist to telegraf.conf, edit the basics of InfluxDB if needed, and then under telegraf.d create a new vsphere.conf, where you put just your new config directly from this blog, to see if that works.

  24. Jorge,

    “vm-stats” is the hostname.

    I’ll copy the telegraf.conf and give that a shot. I’ll let you know how it goes.

  25. Hello,

    Thanks for the awesome guide!

    Is there anyway to get used percentage of the Virtual Machine’s CPU?

  26. “Hello,
    I have not 2 vcneters to try, but it should be as it always is on Telegraf:
    vcenters = [ “https://vcenter1.local/sdk”, “https://vcenter2.local/sdk” ]

    Cna you please try it?”

    Hey Jorge,

    Thanks, I have configured two Vcenters and this works just fine, thank you.

    ” # # Read metrics from VMware vCenter
    # ## List of vCenter URLs to be monitored. These three lines must be uncommented
    # ## and edited for the plugin to work.
    vcenters = [ “”, “” ]
    username = “User@Domain”
    password = “P@$$w0rd”
    # ## VMs
    # ## Typical VM metrics (if omitted or empty, all metrics are collected)”

  27. Hi,
    how is it possible to exclude datastore metrics?
    i want to exclude all local datastores which named all “hypervisorname-local”.
    i tried datastore_metric_exclude = [“*-local”] but i still collect metrics for these datastores.

  28. Hello Florian,
    On Grafana, on the Datastore variables I am already not including the Veeam ones, look at them at the moment it is a regex which says /^(?!VeeamBackup_)/ add your own, so at least Grafana doesn’t show them.

    I will investigate how to not ingest the data from Telegraf.

  29. Thanks a lot Jorge for your excellent work! I have couple of queries:

    1. Cluster variable Filter is not working for me. Doesn’t matter which cluster I choose, it shows all the hypervisors.
    2. It is taking ages to load the graphs for Hosts view as I have 100s of hosts.

    Any help with the same will be appreciated man 🙂

  30. yeah i am testing right now i have a vcenter with 300 host and grafana keep crashing because of java. i was looking at trying to convert it to elastic search because your able to cluster for free.

  31. Hi Jorge, please can i ask how to connect to two vCenter on different username and password..

    thanks in advance..

  32. Was anyone that had Datastores not being picked up ever fix this issue? I’ve tried all the things mentioned in the comments but it still does not display.

  33. Absolutely top notch work! Thanks so much for sharing. Those dashboards must have been a huge amount of effort! Thanks to the community for the vmware plugin also! This stuff is amazing and helps some many people.

  34. Thanks Darragh! I am trying my best, I hope they do help, I have seen the vSphere Dashboard on so many different places and that always humble me.
    Have a great day

  35. Hello,

    I have a problem. I created an alarm on Grafana but it gave this error:
    “Template variables are not supported in alert queries.”

    Do you have a solution?

    Thank you!

  36. Hey Jorge, thanks for this! Any chance you would be interested in helping make a top XX vm dashboard? Sorta like a NOC view of vm’s.

  37. Hi Rob,
    Thanks for the comment, sounds very interesting, I will work on it and should be ready soon.


  38. Hello Jorge,
    Many thanks for the articles and the dashboards are just working awesome.
    I can see the data for all the dashboards but the DATASTORE dashboard does not pull up any information for some reason? Did you see this before?
    I can see the data for the datastores for the Overview but not the actual datastore ones also the datstore data does not seems to refresh as i have deleted couple of old datastores but they still show up?


  39. Hello Pavan,
    I think it is a known issue, so I will check when I go back home and try to make it work properly. I have not found yet why this doesn’t work in some cases and works on my case for example

  40. I had the same issue with the blank Datastores dashboard, the fix for me was to install the Grafana Pie Chart Panel as it’s not installed by default.

  41. Thank You very much for an excellent blog post, and Your work on the Grafana Dashboards. The problem with the blank Datastores dashboard was solved for me by installing the Pie Chart Panel as noted above.

  42. Hey, great post, looks really neat and useful.
    Just one question, what specs would you recommend for a VM running the Ubuntu Server? I’m wondering how much RAM it will need, and if 2 vCPU’s would be suficcient?
    Thanks in advance!

  43. Hello,
    Not sure how many VMs and hosts and datastore you monitor, but 2vCPU, 6GB RAM, 100GB disk to start it is more than enough.

  44. Hi Jorge,

    Thank you so much for this fantastic page. Very helpful very simple. was able to do the basic dashboard as a novice in few hours time.

    One thing I noticed on the Hosts Dashboard is the memory statistics section is actually using the percentage counters in GB based graph & counter. I changed the format type percentage to get it corrected.

  45. Hi Jorge,

    I updated telegraf to 1.10.0 and noticed that after a while my datastore metrics would disappear. I don’t think its a bug in the vsphere plugin but possibly a change to how the metrics are collected. I can’t figure out what to change if anything.

    Have you seen anything similar?

  46. Hi, love the boards!!!! great work. i have the Datastore problem but i also have an issue on the VM’s board. It finds all my vm’s and lists them, but when i expand each one it comes back stating no data points available. wondering if i am just missing an easy setting. any help would be awesome! thanks in advance!

  47. Hello Jesse,
    For the Dashboards issue, the solution was to install the Pie Chart Panel plugin on Grafana, regarding the VM issue, do you have data already? I mean, on for example the main dashboard do you see stuff under the VM section?

  48. Hi Jorge,

    Great work on this! Thank you so much for your effort.
    We manage to get all working and it’s really awesome.

    However, I got a question to ask.

    On the vCenter messages (logs), we can see there was a Task Message:
    “Task Name: Remote View Manager, Status: The request refers to an unexpected or unknown type, Initiator: (our local username), and its getting on every seconds/minutes.

    It this because of the pulling of data?

  49. Thank you for your reply. i was able to install the pie chart plugin after reading through comments on here before i posted, still no luck unfortunately for the data store stuff. As for the VM dashboard saying no data points, i AM seeing vm info on the overview dashboard, so that confuses me.

  50. Hi Jesse,
    Thanks for coming back, umh, have you make sure on the top right you have selected different times, like today so far, or this week, etc? Nothing at all?

  51. Changing the time fixed the VM issue and its now showing data. thank you! still no luck on the datastore side.

  52. Yes! Can you try to change on the top right for this month, and maybe narrow on the top left the search a bit?

  53. Excellent project. Congratulations. I just identified a problem in the log:

    [inputs.vsphere] Metric name cpu.readiness.average is unknown. Will not be collected

    I don’t know why…

  54. Hi Jorge

    i’ve this error on my vcenter,any idea how to solve this ?

    Task Name: Remote View Manager, Status: The request refers to an unexpected or unknown type

  55. Hello, where is this error exactly? Telegraf log? Which vCenter version do you have, and have you tried with a different user to authenticate?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.