• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
The Blog of Jorge de la Cruz

The Blog of Jorge de la Cruz

Everything about VMware, Veeam, InfluxData, Grafana, Zimbra, etc.

  • Home
  • VMWARE
  • VEEAM
    • Veeam Content Recap 2021
    • Veeam v11a
      • Veeam Backup and Replication v11a
    • Veeam Backup for AWS
      • Veeam Backup for AWS v4
    • Veeam Backup for Azure
      • Veeam Backup for Azure v3
    • VeeamON 2021
      • Veeam Announces Support for Red Hat Enterprise Virtualization (RHEV/KVM)
      • Veeam announces enhancements for new versions of Veeam Backup for AWS v4/Azure v3/GVP v2
      • VBO v6 – Self-Service Portal and Native Integration with Azure Archive and AWS S3 Glacier
  • Grafana
    • Part I (Installing InfluxDB, Telegraf and Grafana on Ubuntu 20.04 LTS)
    • Part VIII (Monitoring Veeam using Veeam Enterprise Manager)
    • Part XII (Native Telegraf Plugin for vSphere)
    • Part XIII – Veeam Backup for Microsoft Office 365 v4
    • Part XIV – Veeam Availability Console
    • Part XV – IPMI Monitoring of our ESXi Hosts
    • Part XVI – Performance and Advanced Security of Veeam Backup for Microsoft Office 365
    • Part XVII – Showing Dashboards on Two Monitors Using Raspberry Pi 4
    • Part XIX (Monitoring Veeam with Enterprise Manager) Shell Script
    • Part XXII (Monitoring Cloudflare, include beautiful Maps)
    • Part XXIII (Monitoring WordPress with Jetpack RESTful API)
    • Part XXIV (Monitoring Veeam Backup for Microsoft Azure)
    • Part XXV (Monitoring Power Consumption)
    • Part XXVI (Monitoring Veeam Backup for Nutanix)
    • Part XXVII (Monitoring ReFS and XFS (block-cloning and reflink)
    • Part XXVIII (Monitoring HPE StoreOnce)
    • Part XXIX (Monitoring Pi-hole)
    • Part XXXI (Monitoring Unifi Protect)
    • Part XXXII (Monitoring Veeam ONE – experimental)
    • Part XXXIII (Monitoring NetApp ONTAP)
    • Part XXXIV (Monitoring Runecast)
  • Nutanix
  • ZIMBRA
  • PRTG
  • LINUX
  • MICROSOFT

Looking for the Perfect Dashboard: InfluxDB, Telegraf and Grafana – Part XXXV (GPU Monitoring)

24th May 2021 - Written in: opensource

Greetings friends, I have been showing since 2016 all the goodness of Telegraf, InfluxDB, and Grafana. And I do not get tired since every day or week, I have new technologies, or hardware, that I want to monitor to have more detailed control of all my environment.

Just a few days ago I acquired an NVIDIA RTX 3090, one of the best graphics cards we can find today, and of course, the first thing I thought was, I need to monitor this hardware that has cost more than any of my servers, which I already monitor without problems.

Dashboard for NVIDIA GPUs (extendable to AMD with a little work)

When we finish the entry we will have something similar to that Dashboard that will allow you to visualize:

Dashboard – Summary

This dashboard, very useful and powerful shows us the following details:

  • Filter by Host and by GPU type – In case we have several PC/VDI, it will show us the model as well.
  • Temperatures – The GPU temperature, and very important (only in 3070, 3080 and 3090) the VRAM temperature, better known as GPU Memory Junction.
    • In addition to having them see the most recent, there is a graph with the history of it.
  • Fans – The percentage at which our fans are spinning, in case we have more fans, you will have to modify the query.
  • Speeds – The dashboards show the current clock speeds of the GPU, memory, etc.
  • Consumption – The consumption in the percentage of our GPU and its memory.
  • Power Consumption – Power consumption in Watts, which is useful to calculate how much you are spending.
    • Besides having it to see the most recent, there is a graph with the history of the same.
  • Mining – It is something simple, if the memory is at 100% constant, the mining is active. No other application puts the memory at 100% for such a long period of time.

Topology with all logical components

This entry is different from the previous ones, it includes more components, some of them for a fee such as HWInfo64 Pro, I leave the complete image here:
We will need telegraf for Windows, I guess you have already followed the steps to send information to InfluxDB, in addition, we will need to have installed the NVIDIA drivers, which include a small application called nvidia_smi, and finally, we also need HWInfo64 Pro, which allows us to use shared memory and Remote Sensor Monitor that presents the data in a local web in JSON format.

Small system requirements before continuing

We will need to verify that we have NVIDIA smi installed, for it from a CMD, we go to the following directory C:\Program Files Corporation Corporation smi> and we launch the executable, to see the result:

C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi.exe
Sun May 23 12:30:07 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 466.27       Driver Version: 466.27       CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ... WDDM  | 00000000:02:00.0  On |                  N/A |
| 70%   45C    P2   261W / 280W |   8576MiB / 24576MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

All right, let’s go to the next step, we install the latest version of HWInfo64, we need to buy a license and put it in the folder where we have installed HWInfo64. If we run the application, in sensor mode only, we will see the following, very accurate and interesting data about our GPU:

Finally, we will have to enable that we can connect to the shared memory, as simple as clicking on the icon and activating it: Minimize now the application, and download the Addon called Remote Sensor Monitor from its official website:

From a CMD, we launch the Remote Sensor Monitor with the following parameters:

"Remote Sensor Monitor.exe" --gpuz=0 --aida64=0 --ohm=0

This will show us the next message, all good and ready:

Starting Remote Sensor Monitor on port 55555...

HWiNFO process found! Enabling HWiNFO...

Please wait for the web server to start.......Web server running

Press [ENTER] to quit

Which is fantastic news, already from our browser, we go to http://localhost:55555/ and we will see the following:

Configuration of the telegraf.conf file

We have almost everything ready, we have one last step, activate these two plugins inside the telegraf.conf configuration file, as simple as adding at the end of the file the following:

# Pulls statistics from nvidia GPUs attached to the host
# Pulls statistics from nvidia GPUs attached to the host
[[inputs.nvidia_smi]]
  ## Optional: path to nvidia-smi binary, defaults to $PATH via exec.LookPath
   bin_path = "C:\\Program Files\\NVIDIA Corporation\\NVSMI\\nvidia-smi.exe"

  ## Optional: timeout for GPU polling
  # timeout = "5s"
  
[[inputs.http]]
#URL for ESXi for ARM data in JSON format
urls = ["http://localhost:55555/"]
method = "GET"
#Overwrite measurement name from default `http` to `hwinfo_stats`
name_override = "hwinfo_stats"

#Exclude url and host items from tags
tagexclude = ["url"]

#Data from HTTP in JSON format
data_format = "json"

#Set hwinfo metadata as tags
tag_keys = ["SensorName", "SensorUnit"]

#JSON values to set as string fields
json_string_fields = ["SensorValue"]

insecure_skip_verify = true

This is all, we restart, or start the telegraf service and go to grafana.

Grafana Dashboards

I created a Dashboard from scratch by selecting the best requests to the database, finalizing the colors, thinking about the graphics and how to display them, and everything is automated to fit our environment without any problems and without having to edit anything manually. The Dashboard can be found here, once imported, you can use the top drop-down menus to select between different pcs, cards, etc:

  • https://grafana.com/grafana/dashboards/14478

Importing the Grafana Dashboard the easy way

So you don’t have to waste hours configuring a new Dashboard, and ingesting and debugging what you want, I have already created a wonderful Dashboard with everything you need to monitor our environment in a very simple way, it will look like the image I showed you above. Select the name you want and enter the ID: 14478, which is the unique ID of the Dashboard, or the URL:

  • https://grafana.com/grafana/dashboards/14478

With the drop-down we could choose between GPU, PCs, etc.:Please leave your comments here, or on GitHub, thanks a lot for reading!

I hope you like it, and I would like to leave you the complete series here, so you can start playing with the plugins that I have been telling you about all these years:

  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part I (Installing InfluxDB, Telegraf, and Grafana on Ubuntu 20.04 LTS)
  • En busca del Dashboard perfecto: InfluxDB, Telegraf y Grafana – Parte II (Instalar agente Telegraf en Nodos remotos Linux)
  • En busca del Dashboard perfecto: InfluxDB, Telegraf y Grafana – Parte III Integración con PRTG
  • En busca del Dashboard perfecto: InfluxDB, Telegraf y Grafana – Parte IV (Instalar agente Telegraf en Nodos remotos Windows)
  • En busca del Dashboard perfecto: InfluxDB, Telegraf y Grafana – Parte V (Activar inputs específicos, Red, MySQL/MariaDB, Nginx)
  • En busca del Dashboard perfecto: InfluxDB, Telegraf y Grafana – Parte VI (Monitorizando Veeam)
  • En busca del Dashboard perfecto: InfluxDB, Telegraf y Grafana – Parte VII (Monitorizar vSphere)
  • En busca del Dashboard perfecto: InfluxDB, Telegraf y Grafana – Parte VIII (Monitorizando Veeam con Enterprise Manager)
  • En busca del Dashboard perfecto: InfluxDB, Telegraf y Grafana – Parte IX (Monitorizando Zimbra Collaboration)
  • En busca del Dashboard perfecto: InfluxDB, Telegraf y Grafana – Parte X (Grafana Plugins)
  • En busca del Dashboard perfecto: InfluxDB, Telegraf y Grafana – Parte XI – (Monitorizando URL e IPS con Telegraf y Ping)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XII (Native Telegraf Plugin for vSphere)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XIII (Veeam Backup for Microsoft Office 365 v4)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XIV – Veeam Availability Console
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XV (IPMI Monitoring of our ESXi Hosts)
  • Looking for Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XVI (Performance and Advanced Security of Veeam Backup for Microsoft Office 365)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XVII (Showing Dashboards on Two Monitors Using Raspberry Pi 4)
  • En busca del Dashboard perfecto: InfluxDB, Telegraf y Grafana – Parte XVIII – Monitorizar temperatura y estado de Raspberry Pi 4
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XIX (Monitoring Veeam with Enterprise Manager) Shell Script
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XXIV (Monitoring Veeam Backup for Microsoft Azure)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XXV (Monitoring Power Consumption)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XXVI (Monitoring Veeam Backup for Nutanix)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XXVII (Monitoring ReFS and XFS (block-cloning and reflink)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XXVIII (Monitoring HPE StoreOnce)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XXIX (Monitoring Pi-hole)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XXIX (Monitoring Veeam Backup for AWS)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XXXI (Monitoring Unifi Protect)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XXXII (Monitoring Veeam ONE – experimental)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XXXIII (Monitoring NetApp ONTAP)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XXXIV (Monitoring Runecast)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XXXV (GPU Monitoring)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XXXVI (Monitoring Goldshell Miners – JSONv2)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XXXVII (Monitoring Veeam Backup for Google Cloud Platform)
  • En Busca del Dashboard perfecto: InfluxDB, Telegraf y Grafana – Parte XXXVIII (Monitorizando Temperatura y Humedad con Xiaomi Mijia)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XL (Veeam Backup for Microsoft 365 – Restore Audit)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XLI (Veeam Backup for Salesforce)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XLII (Veeam ONE v12 Audit Events)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XLIII (Monitoring QNAP using SNMP v3)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XLIV (Monitoring Veeam Backup & Replication API)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XLV (Monitoring Synology using SNMP v3)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XLVI (Monitoring NVIDIA Jetson Nano)
  • Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XLVII (Monitoring Open WebUI)

Filed Under: opensource Tagged With: grafana gpu, grafana monitor temp gpu, grafana nvidia

Reader Interactions

Comments

  1. Sooyoung says

    6th July 2022 at 1:56 pm

    Have you also tried with AMD GPUs like Radeon Instinct MI25?

  2. jorgeuk says

    8th July 2022 at 10:21 am

    Hello, I have not tried any other ones, no.

Trackbacks

  1. Grafana: Novedades en Grafana v9.2, nuevo panel Canvas, o lo que es igual a añadir nuestras propias imágenes - El Blog de Jorge de la Cruz says:
    17th October 2022 at 10:12 pm

    […] Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XXXV (GPU Monitoring) […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

  • E-mail
  • GitHub
  • LinkedIn
  • RSS
  • Twitter
  • YouTube

Posts Calendar

May 2021
M T W T F S S
 12
3456789
10111213141516
17181920212223
24252627282930
31  
« Apr   Jun »

Disclaimer

All opinions expressed on this site are my own and do not represent the opinions of any company I have worked with, am working with, or will be working with.

Copyright © 2025 · The Blog of Jorge de la Cruz