16. 01. 2009 Patrick Zambelli Unified Monitoring

HP BladeSystem hardware monitoring

A Blade is a specialized server hardware used within server infrastructures of  medium and large sized companies. As such a solution the HP server blades are delivered as thin, modular servers with memory and one to four processors. They’re generally intended for a single, dedicated application (such as web, database, email, or line of business applications) and can be easily inserted into a space-saving rack that houses similar servers.

Such as the monitoring of services also the overview on the hardware health status is important. Particular checks allow to detect problems and alert the administrator.

As technology can be used the SNMP protocol, allowing to query the required information from the build-in management console of the HP BladeSystem. The introduced check is implemented with a dedicated Perl script, querying the given OID(s) and returning the state OK, Warning, Critical and Unknown.

Fan conditions

The continuously cooling of the hardware components is accomplished by a certain number of active colling units. The BladeSystem management unit allows to read the state of each single cooling unit through a determined OID. Since this would require a check for each single unit, an iteration loops trough all available results and returns an overall status result.

In the special case the OID for the fan category is: .1.3.6.1.4.1.232.22.2.3.1.3.1.11 , where fan 1 answers under .1.3.6.1.4.1.232.22.2.3.1.3.1.11.1 and so on.

$oid =  = “.1.3.6.1.4.1.232.22.2.3.1.3.1.11.”;

for (my $i = 1;$i<=15;$i++){
$data_fan = SNMP_getvalue($snmp_session,$oid.$i);
if (int($data_fan)){
$data_returnText_ .=$data_fan.”; “;

#hold the highest return or a return diff from 2. i.e.3
if (($data_fan > $overall_status) or ($data_fan != 2)){
$overall_status = $data_fan;
}
}
}

The variable $data_fan gets the current status of the single fan unit. The integer 2 stands for OK, 3 for Warning and 4 for Critical. In this way a subsequent check verifies weather the value 2 has already been set for the overall check result to return, or whether a higher (worse) result is retrieved with the current iteration. In this case the overall result will obtain the worse check result of all checked fans.

For a better result comprehension the $data_text keeps the status string of all the single check results.

An example check implementation into the file check_snmp_HP_BladeSystem.pl allows the single check definition for Nagios as follows:

Call of check: /usr/lib/nagios/plugins/check_snmp_HP_Bladesystem.pl -H <hostname> -C public -w 3 -c 4 -t .1.3.6.1.4.1.232.22.2.3.1.3.1.11

Result: OK Fan-Conditions (2) Fan return codes: 2; 2; 2; 2; 2; 2; 2; 2; 2; 2;

Power supply

This checks can be implemented following the same principle as the fan status check and determines withing a single check the status of all single power supplies.

Call of check: /usr/lib/nagios/plugins/check_snmp_HP_Bladesystem.pl -H <hostname> -C public -w 3 -c 4 -t .1.3.6.1.4.1.232.22.2.5.1.1.1.17.

Result: OK Power-Supply (2) Power Supply return codes: 2; 2; 2; 2; 2; 2;

In this case 6 power supply module have been detected and registered.

System state

A check of a more general nature is the check for the overall system state reported by the Management console of the HP BladeSystem. This check can be implemented by using only a single OID with a single result evaluation:

$oid = “.1.3.6.1.4.1.232.22.2.3.1.1.1.16.1″;
$data = SNMP_getvalue($snmp_session,$oid);
$data_text=’UnKnown’;
$data_text=’Normal system state.’ if $data eq 2;
$data_text=’Sytem degraded’ if $data eq 3;
$data_text=’Undefined System Error’ if $data eq 1;
$data_text=’Critical System failure’ if $data eq 4;

Possible ok result:OK System-State (2) Normal system state.

Of course there are available other status values used for monitoring and the corresponding OIDs can be retrieved from the documentation from the supplied hardware documentation.

Additional infos for the script

The value $snmp_session hold the SNMP session information. It’s usage requires the following use definition:
use Net::SNMP;

Definition:
($snmp_session,$snmp_error) = Net::SNMP->session(
-version => ‘snmpv2c’,
-hostname => $opt_host,
-community => $opt_community,
-port => $opt_port,
);

The called function SNMP_getvalue($snmp_session,$oid); is used to query the given OID and returns, if successful, the hardware status value as integer.

sub SNMP_getvalue{
my ($snmp_session,$oid) = @_;

my $res = $snmp_session->get_request(
-varbindlist => [$oid]);

if(!defined($res)){
print “ERROR: “.$snmp_session->error.”\n”;
exit;
}
return($res->{$oid});
}

This Nagios plug-in script is used within the Nagios based solution NetEye.

Download

Download check_snmp_hp_bladesystem

LINKS

Additional information and documentation can be retrieved

NagiosExchange:Compaq-HP Proliant Server and Blade Checks (SNMP)

Patrick Zambelli

Patrick Zambelli

Project Manager at Würth Phoenix
After my graduation in Applied Computer Science at the Free University of Bolzano I decided to start my professional career outside the province. With a bit of good timing and good luck I went into the booming IT-Dept. of Geox in the shoe district of Montebelluna, where I realized how a big IT infrastructure has to grow and adapt to quickly changing requirements. During this experience I had also the nice possibility to travel the world, while setting up the various production and retail areas of this company. Arrived at Würth Phoenix I started developing on our monitoring solution NetEye. Today, in my position as Consulting an Project Manager I am continuously heading to implement our solutions to meet the expectation of your enterprise customers.

Author

Patrick Zambelli

After my graduation in Applied Computer Science at the Free University of Bolzano I decided to start my professional career outside the province. With a bit of good timing and good luck I went into the booming IT-Dept. of Geox in the shoe district of Montebelluna, where I realized how a big IT infrastructure has to grow and adapt to quickly changing requirements. During this experience I had also the nice possibility to travel the world, while setting up the various production and retail areas of this company. Arrived at Würth Phoenix I started developing on our monitoring solution NetEye. Today, in my position as Consulting an Project Manager I am continuously heading to implement our solutions to meet the expectation of your enterprise customers.

3 Replies to “HP BladeSystem hardware monitoring”

  1. Hello, just browsing for information for my HP website. Lots of information out there. Wasn’t exactly what I was looking for, but great site. Cya later.

  2. Ben says:

    Hi Nice script, i am definetly willing to use this..
    However upon execution i get “check_snmp_HP_Bladesystem.pl: Permission denied” becuase i password protected my OA with Ldap… Is it possible to submit switches with username\ pass in the check ?

  3. Ben says:

    sorry, my mistake offcourse after downloading the file i need to adjust the security permissions !

Leave a Reply

Your email address will not be published. Required fields are marked *

Archive