Looking for the Perfect Dashboard: InfluxDB, Telegraf, and Grafana – Part XXXIII (Monitoring NetApp ONTAP)

Greetings friends, today I bring you a new post about Grafana, today’s post is special for me because I have been using NetApp for a long time, but I had never stopped to think about how to monitor in detail this fantastic Hardware.

Testing the latest Release, I realized that all the APIs that the system uses for any action are exposed in a very simple way, so I decided it was time to give it its section in this, your, series of In Search of the Perfect Dashboard.

Dashboard for NetApp ONTAP

When we finish the entry we will have something similar to that Dashboard that will allow you to visualize:

Dashboard – Summary.

This first dashboard of the series, because I intend to create maybe some more with other details such as SVM, etc., contains:

  • ONTAP Cluster Overview– Cluster Name, Version and Management IP.
  • ONTAP Cluster metrics – Three graphs that are similar to the ones ONTAP displays, with Latency, IOPS and Throughput.
  • ONTAP Cluster Aggregate Storage – Table with aggregate space detail.
  • ONTAP Cluster SVM – A very complete table showing all our Storage VMs.
  • ONTAP Volumes – A very complete table showing all our Volumes.
  • ONTAP Cluster LUN– A very complete table showing all our LUNs.
  • ONTAP Cluster LUN– A very complete table showing all our LUNs.
  • ONTAP Shares– A very complete table showing all our Shares.

Topology with all the logical components

This entry is similar to the previous ones since in this case, we will use a combination of a shell script to collect Veeam ONE metrics using RESTful API and InfluxDB. The design would look something like this:As we can see, the shell script will download the metrics from NetApp ONTAP using the RESTful API, which will send all the data to InfluxDB, from where we can view them comfortably with Grafana.

Download, and configure the netapp_ontap.sh script.

We have almost everything ready, we have one last step, the script that will make all this work, we will download the latest version from the Github repository:

This shell script can be downloaded and run from the telegraf server, or influxDB, or any other Linux. We will have to edit the configuration parameters:

netappInfluxDBURL="http://YOURINFLUXSERVERIP" #Your InfluxDB Server, http://FQDN or https://FQDN if using SSL
netappInfluxDBPort="8086" #Default Port
netappInfluxDB="telegraf" #Default Database
netappInfluxDBUser="USER" #User for Database
netappInfluxDBPassword='PASSWORD' #Password for Database

#Endpoint URL for login action
netappUsername="YOURONTAPUSER" #Your username with privileges to login into the ONTAP
netappAuth=$$(echo -ne "$netappUsername:$netappPassword" | base64);
netappRestServer="YOURONTAPSERVER"; netappRestServer="YOURONTAPSERVER"
netappMetrics="20" #They came in interval of 15 seconds, so 20 will be equal to the metrics of the last 5 minutes. If you want to run your script every 5 minutes, let it like this, if not, change it accordingly.

Once the changes are done, make the script executable with chmod:

chmod +x netapp_ontap.sh

We run it, and the output of the command should look something like the following, with no errors:

Writing netapp_SVM_overview to InfluxDB
HTTP/1.1 204 No Content
Content-Type: application/json
Request-Id: bf99d74a-95a5-11eb-a1c2-0050569017a8
X-Influxdb-Build: OSS
X-Influxdb-Version: 1.8.4
X-Request-Id: bf99d74a-95a5-11eb-a1c2-0050569017a8
Date: Mon, 05 Apr 2021 00:27:48 GMT

Writing netapp_LUN_overview to InfluxDB
HTTP/1.1 204 No Content
Content-Type: application/json
Request-Id: bfe3d80e-95a5-11eb-a1c3-0050569017a8
X-Influxdb-Build: OSS
X-Influxdb-Version: 1.8.4
X-Request-Id: bfe3d80e-95a5-11eb-a1c3-0050569017a8
Date: Mon, 05 Apr 2021 00:27:48 GMT

If so, please now add this script to your crontab, like for example every 5minutes, I don’t think our Dashboards are updated more often, but good to download it if so:

*/5 * * * * * /home/oper/netapp_ontap.sh >> /var/log/netapp.log 2>&1

We can check that we have data on the Grafana Explorer: We are ready to go to the next step.

Grafana Dashboards

I created a Dashboard from scratch by selecting the best requests to the database, finalizing the colors, thinking about the graphics and how to display them, and everything is automated to fit our environment without any problems and without having to edit anything manually. The Dashboard can be found here, once imported, you can use the top drop-down menus to select between Cluster, SVM, etc:

Importing the Grafana Dashboard the easy way

So you don’t have to waste hours configuring a new Dashboard, and ingesting and debugging what you want, I have already created a wonderful Dashboard with everything you need to monitor our environment in a very simple way, it will look like the image I showed you above. Select the name you want and enter the ID: 14179, which is the unique ID of the Dashboard, or the URL:

With the menus above, we can move between Cluster, SVM, etc.:

Do you want something more extensive, with more Dashboards, etc?

Since I put the image on Twitter, and even with the first steps, there are some who have told me to take a look at https://nabox.org/ a very interesting opensource project, which includes a lot of dashboards and an appliance, etc.

Positive points:

  • It has an infinite number of dashboards, certainly more polished than this one
  • It comes in an appliance and seems simple to make work

Negative points:

  • It comes in an appliance, which entails deploying something additional to the system that we already have from Grafana, as well as making it difficult to upgrade internal packages without knowing if we’re going to break something.
  • In my series, we make use of InfluxDB, Telegraf, and Grafana. But nabox uses Graphite, which is more of a technology to learn and maintain.
  • I think it requires us to additionally install NetApp Harvest and the NMSDK in our environment.

I think it is good to have alternatives, and mine is a simple bash shell that calls the API directly from NetApp Nodes, without installing anything else. nabox seems to be much more complex, maybe ok if we have nothing installed, but if we already have our system, complicated to tie everything together.

Please leave your comments here, or on GitHub, thanks a lot for reading!

That’s all folks, if you want to follow the full Blog series about Grafana, InfluxDB, Telegraf, please click on the next links:

Author: jorgeuk

Father, writing in https://www.jorgedelacruz.es and https://jorgedelacruz.uk Blogger, Systems Engineer @veeam - vExpert 2014/2020 & NTC 2018/19

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.