After a period of time tinkering with the nagios plugin check_cluster2, written in C and being very upset about its limitations, I decided to rewrite the plugin from scratch. Yes I know there is some other perl version of the plugin called check_cluster3.pl, but it doesn't do what I need either.

Here is how the script has to be used:

=begin text

usage: ./check_cluster.pl options: --warn= number of warnings in statusdata to reach WARNING state

--crit= number of criticals in statusdata to reach CRITICAL state

--data= on-demand variable containing the state of a host/service, e.g.: "$SERVICESTATE:proxy01:PING$" - nagios will resolve this upon calling into the number of the actual state of the service in question, if it's down it gets: 2. --data can be supplied multiple times.

--info= info about host/service states which shall be displayed if a non-OK state is reached as log message, eg: "$SERVICESTATE:proxy01:PING$:proxy01:PING" the first part is still the same on-demand variable as used above but it contains also the host- and service name to be displayed. format: SERIVESTATE:HOST:SERVICE (separator is colon). --info can be supplied multiple times.

[--active] use active nagios CGI links for info displaying. caution: status log is limited in length, so this may not fit into the CGI output field and maybe destroy the HTML. You can use the macros %H(host) and %S(service) for the link uri, eg: "showdetails.php?host=%H&service=%S"

[--log=] string to be used for status log output, it will be prepended.

[--help] display the usage message

example usage:

% check_cluster.pl -w 0 -c 1 -d 1 -d 0 -i 2:proxy1:PING -i 0:proxy2:PIN

The check will return CRITICAL state, because first data option (-d) indicates state 1(=CRITICAL). The command output will look like this:

Cluster-Check: 1/2 OK, 1 WARN (w:0,c:2)
CRIT:proxy1:PIN

The nagios check_command config for the check for this example would be:

check_cluster.pl -w 0 -c 1 \ -d $SERVICESTATE:proxy1:PING$ \ -d $SERVICESTATE:proxy2:PING$ \ -i $SERVICESTATE:proxy1:PING$:proxy1:PING \ -i $SERVICESTATE:proxy2:PING$:proxy2:PIN

./check_cluster.pl version 1.00

COPYRIGHT (c) 2006 T.L. ALL RIGHTS RESERVED. Published under the terms of the GPL.

=end text

N

And finally here comes the script:

=begin text

#!/usr/bin/perl

my $VERSION = "1.00";

use Getopt::Long; use warnings; use strict;

use constant OK => 0; use constant WARNING => 1; use constant CRITICAL => 2; use constant UNKNOWN => 3;

use vars qw(%state $warnings $criticals @statusdata @infodata $log $help $active); my %state = (0 => "OK", 1 => "WARN", 2 => "CRIT", 3 => "ERR"); my $active = "";

sub finish; sub usage; sub shortusage;

my $result = GetOptions( "warn=i" => \$warnings, "crit=i" => \$criticals, "data=s" => \@statusdata, "info=s" => \@infodata, "log=s" => \$log, "active=s" => \$active, "help", => \$help, );

if ($help) { usage; }

if (!defined ($warnings) || !defined($criticals) || !defined(@statusdata) || !$result) { shortusage; }

if (!$ log) { $log = "Cluster-Check:"; }

my $gotwarnings = 0; my $gotcriticals = 0; my $gotunknowns = 0; my $gotok = 0; my $logmessage = ""; my @info = ();

foreach my $entry (@infodata) { my($status, $host, $service) = split /:/, $entry, 3; if (! defined($status) || !($host && $service)) { finish UNKNOWN, "supplied infodata \"$entry\" does not match the spec: SERIVESTATE:HOST:SERVICE!"; } else { if ($status) { if ($active) { $active =~ s/\%H/$host/g; $active =~ s/\%S/$service/g; push @info, "$state{$status}:$host:$service"; } else { push @info, "$state{$status}:$host:$service"; } } } }

$logmessage = join ",", @info;

foreach my $state (@statusdata) { if ($state == WARNING) { $gotwarnings++; } elsif ($state == CRITICAL) { $gotcriticals++; } elsif ($state == UNKNOWN) { $gotunknowns++; } else { $gotok++; } }

my @logstate;

if ($gotok) { push @logstate, "$gotok/" . scalar @statusdata . " OK"; } if ($gotwarnings) { push @logstate, "$gotwarnings WARN"; } if ($gotcriticals) { push @logstate, "$gotcriticals CRIT"; } if ($gotunknowns) { push @logstate, "$gotunknowns ERR"; }

my $logstate = join ", ", @logstate; $logstate .= " (w:$warnings,c:$criticals)";

if ($gotcriticals && ($gotcriticals >= $criticals)) { finish CRITICAL, "$log $logstate
$logmessage"; }

if ($gotwarnings && ($gotwarnings >= $warnings)) { finish WARNING, "$log $logstate
$logmessage"; }

if ($gotunknowns) { finish UNKNOWN, "$log $logstate
$logmessage"; }

# nothing happened finish OK, "$log $logstate";

sub finish { my ($state, $message) = @_; print "$message\n"; exit $state; }

sub shortusage { print STDERR qq(usage: $0 [-wcdia] $0 --help fpr more information ); exit UNKNOWN; }

sub usage { print STDERR qq(usage: $0 options: --warn= number of warnings in statusdata to reach WARNING state

--crit= number of criticals in statusdata to reach CRITICAL state

--data= on-demand variable containing the state of a host/service, e.g.: "\$SERVICESTATE:proxy01:PING\$" - nagios will resolve this upon calling into the number of the actual state of the service in question, if it's down it gets: 2. --data can be supplied multiple times.

--info= info about host/service states which shall be displayed if a non-OK state is reached as log message, eg: "\$SERVICESTATE:proxy01:PING\$:proxy01:PING" the first part is still the same on-demand variable as used above but it contains also the host- and service name to be displayed. format: SERIVESTATE:HOST:SERVICE (separator is colon). --info can be supplied multiple times.

[--active] use active nagios CGI links for info displaying. caution: status log is limited in length, so this may not fit into the CGI output field and maybe destroy the HTML. You can use the macros \%H(host) and \%S(service) for the link uri, eg: "showdetails.php?host=\%H&service=\%S"

[--log=] string to be used for status log output, it will be prepended.

[--help] display the usage message

example usage:

% check_cluster.pl -w 0 -c 1 -d 1 -d 0 -i 2:proxy1:PING -i 0:proxy2:PIN

The check will return CRITICAL state, because first data option (-d) indicates state 1(=CRITICAL). The command output will look like this:

Cluster-Check: 1/2 OK, 1 WARN (w:0,c:2)
CRIT:proxy1:PIN

The nagios check_command config for the check for this example would be:

check_cluster.pl -w 0 -c 1 \\ -d \$SERVICESTATE:proxy1:PING\$ \\ -d \$SERVICESTATE:proxy2:PING\$ \\ -i \$SERVICESTATE:proxy1:PING\$:proxy1:PING \\ -i \$SERVICESTATE:proxy2:PING\$:proxy2:PIN

$0 version $VERSION

COPYRIGHT (c) 2006 T.L. ALL RIGHTS RESERVED. Published under the terms of the GPL. ); exit UNKNOWN; }

=end text