If you’re trying to monitor a Microsoft Cluster, you’ll surely want to monitor the disk space of a single cluster service. In this case there’s a problem with Icinga2 Agent: you can’t use it with more than one IP address. So you can’t simultaneously monitor the resources of the “physical” host and a “virtual” service host that’s running on that same host. The crux of the problem is that the host name is checked against the name encoded in the certificate, and you cannot use multi-host certificates or certificates with wildcards.
We’ve come across this problem and are trying to find a solution. In the meantime, I wrote a plugin named check_influx_diskspace_cluster.pl that you can download here. It tries to solve the issue by using the performance data from a fixed check on every cluster node, and writing it out to InfluxDB.
You have to add a disk check to every physical node of the cluster, which will check all disks on that node. It works best to set very high warning/critical values, so that you won’t get alerts for this service. The service will take the performance data for all disks it finds and write it out to Influx.
Suppose for example that you need to monitor disk R: for your cluster service. You have to tell the plugin the name of the metric you want to search for (R: in our case), along with the names of the cluster hosts on which to search for that disk. It then queries the last entry in InfluxDB for this metric on all of the hosts. As the resource will only be mounted on one of the nodes, you will get the info from where it actually runs and the space it uses, so that you can check it against warning/critical values you gave as parameters.
# /neteye/shared/monitoring/plugins/check_influx_diskspace_cluster.pl --help
This nagios plugin is free software, and comes with ABSOLUTELY NO WARRANTY.
It may be used, redistributed and/or modified under the terms of the GNU
General Public Licence (see http://www.fsf.org/licensing/licenses/gpl.txt).
Gets last value for given disk metric for more servers and returns the first found instance. All values are retreived from the influxdb, status, max, warning, critical
Usage: check_influx_diskspace_cluster.pl [-H <influxdb hostname/IP>] [-p <influxdb port>] -S <regex-hostname>
[-M <measurement-name>] -m <disk-metric-name> [-w <warning>] [-c <critical>]
[ -V ] [ -h ]
Print usage information
Print detailed help screen
Print version information
Read options from an ini file. See http://nagiosplugins.org/extra-opts
for usage and examples.
influxdb hostname (Default: influxdb.neteyelocal)
influxdb tcp port (Default: 8086)
influxdb measurements to use (Default: disk-windows)
Cluster Servers to get the disk-values from. This is a coma separated list of server-names as found in the monitoring.
disk-metric string to search for
Give DEBUG output
warning value for % free disk (if not defined get it from DB)
critcal value for % free disk (if not defined get it from DB)
Seconds before plugin times out (default: 30)
Show details for command-line debugging (can repeat up to 3 times)
Give verbose output
Copyright 2019 WuerthPhoenix
check_influx_diskspace_cluster.pl -M R: -S server -w 10 -c 5
OK – DISK free space: R:(server2) 15453.00 MB (39%) | R:=16203644928;4189270835.2;2094635417.6;0;41892708352