After a period of time tinkering with the nagios plugin check_cluster2, written in C and being very upset about its limitations, I decided to rewrite the plugin from scratch. Yes I know there is some other perl version of the plugin called check_cluster3.pl, but it doesn't do what I need either.
Here is how the script has to be used:
usage: ./check_cluster.pl <options>
options:
--warn=<warning> number of warnings in statusdata to reach WARNING state
--crit=<criticals> number of criticals in statusdata to reach CRITICAL state
--data=<string> on-demand variable containing the state of a host/service, e.g.:
"$SERVICESTATE:proxy01:PING$" - nagios will resolve this
upon calling into the number of the actual state of the
service in question, if it's down it gets: 2.
--data can be supplied multiple times.
--info=<string> info about host/service states which shall be displayed
if a non-OK state is reached as log message, eg:
"$SERVICESTATE:proxy01:PING$:proxy01:PING" the first part
is still the same on-demand variable as used above but
it contains also the host- and service name to be displayed.
format: SERIVESTATE:HOST:SERVICE (separator is colon).
--info can be supplied multiple times.
[--active] use active nagios CGI links for info displaying. caution:
status log is limited in length, so this may not fit into
the CGI output field and maybe destroy the HTML.
You can use the macros %H(host) and %S(service) for the
link uri, eg:
"showdetails.php?host=%H&service=%S"
[--log=<string>] string to be used for status log output, it will be prepended.
[--help] display the usage message
example usage:
% check_cluster.pl -w 0 -c 1 -d 1 -d 0 -i 2:proxy1:PING -i 0:proxy2:PING
The check will return CRITICAL state, because first data option (-d) indicates
state 1(=CRITICAL). The command output will look like this:
Cluster-Check: 1/2 OK, 1 WARN (w:0,c:2)<br/>CRIT:proxy1:PING
The nagios check_command config for the check for this example would be:
check_cluster.pl -w 0 -c 1 \
-d $SERVICESTATE:proxy1:PING$ \
-d $SERVICESTATE:proxy2:PING$ \
-i $SERVICESTATE:proxy1:PING$:proxy1:PING \
-i $SERVICESTATE:proxy2:PING$:proxy2:PING
./check_cluster.pl version 1.00
COPYRIGHT (c) 2006 T.L. <tom@daemon.de>
ALL RIGHTS RESERVED. Published under the terms of the GPL.
And finally here comes the script:
#!/usr/bin/perl
my $VERSION = "1.00";
use Getopt::Long;
use warnings;
use strict;
use constant OK => 0;
use constant WARNING => 1;
use constant CRITICAL => 2;
use constant UNKNOWN => 3;
use vars qw(%state $warnings $criticals @statusdata @infodata $log G<help.png> $active);
my %state = (0 => "OK", 1 => "WARN", 2 => "CRIT", 3 => "ERR");
my $active = "";
sub finish;
sub usage;
sub shortusage;
my $result = GetOptions(
"warn=i" => $warnings,
"crit=i" => $criticals,
"data=s" => \@statusdata,
"info=s" => \@infodata,
"log=s" => $log,
"active=s" => $active,
"help", => $help,
);
if (G<help.png>) {
usage;
}
if (!defined ($warnings) || !defined($criticals) || !defined(@statusdata) || !$result) {
shortusage;
}
if (!$ log) {
$log = "Cluster-Check:";
}
my $gotwarnings = 0;
my $gotcriticals = 0;
my $gotunknowns = 0;
my $gotok = 0;
my $logmessage = "";
my @info = ();
foreach my AutoLoadBootstrap (@infodata) {
my($status, $host, $service) = split /:/, AutoLoadBootstrap, 3;
if (! defined($status) || !($host && $service)) {
finish UNKNOWN, "supplied infodata \"AutoLoadBootstrap\" does not match the spec: SERIVESTATE:HOST:SERVICE!";
}
else {
if ($status) {
if ($active) {
$active =~ s/\%H/$host/g;
$active =~ s/\%S/$service/g;
push @info, "<a href=\"$active\">$state{$status}:$host:$service</a>";
}
else {
push @info, "$state{$status}:$host:$service";
}
}
}
}
$logmessage = join ",", @info;
foreach my $state (@statusdata) {
if ($state == WARNING) {
$gotwarnings++;
}
elsif ($state == CRITICAL) {
$gotcriticals++;
}
elsif ($state == UNKNOWN) {
$gotunknowns++;
}
else {
$gotok++;
}
}
my @logstate;
if ($gotok) {
push @logstate, "$gotok/" . scalar @statusdata . " OK";
}
if ($gotwarnings) {
push @logstate, "$gotwarnings WARN";
}
if ($gotcriticals) {
push @logstate, "$gotcriticals CRIT";
}
if ($gotunknowns) {
push @logstate, "$gotunknowns ERR";
}
my $logstate = join ", ", @logstate;
$logstate .= " (w:$warnings,c:$criticals)";
if ($gotcriticals && ($gotcriticals >= $criticals)) {
finish CRITICAL, "$log $logstate<br/>$logmessage";
}
if ($gotwarnings && ($gotwarnings >= $warnings)) {
finish WARNING, "$log $logstate<br/>$logmessage";
}
if ($gotunknowns) {
finish UNKNOWN, "$log $logstate<br/>$logmessage";
}
# nothing happened
finish OK, "$log $logstate";
sub finish {
my ($state, podwiki.plssage) = @_;
print "podwiki.plssage\n";
exit $state;
}
sub shortusage {
print STDERR qq(usage: $0 [-wcdia]
$0 --help fpr more information
);
exit UNKNOWN;
}
sub usage {
print STDERR qq(usage: $0 <options>
options:
--warn=<warning> number of warnings in statusdata to reach WARNING state
--crit=<criticals> number of criticals in statusdata to reach CRITICAL state
--data=<string> on-demand variable containing the state of a host/service, e.g.:
"$SERVICESTATE:proxy01:PING$" - nagios will resolve this
upon calling into the number of the actual state of the
service in question, if it's down it gets: 2.
--data can be supplied multiple times.
--info=<string> info about host/service states which shall be displayed
if a non-OK state is reached as log message, eg:
"$SERVICESTATE:proxy01:PING$:proxy01:PING" the first part
is still the same on-demand variable as used above but
it contains also the host- and service name to be displayed.
format: SERIVESTATE:HOST:SERVICE (separator is colon).
--info can be supplied multiple times.
[--active] use active nagios CGI links for info displaying. caution:
status log is limited in length, so this may not fit into
the CGI output field and maybe destroy the HTML.
You can use the macros \%H(host) and \%S(service) for the
link uri, eg:
"showdetails.php?host=\%H&service=\%S"
[--log=<string>] string to be used for status log output, it will be prepended.
[--help] display the usage message
example usage:
% check_cluster.pl -w 0 -c 1 -d 1 -d 0 -i 2:proxy1:PING -i 0:proxy2:PING
The check will return CRITICAL state, because first data option (-d) indicates
state 1(=CRITICAL). The command output will look like this:
Cluster-Check: 1/2 OK, 1 WARN (w:0,c:2)<br/>CRIT:proxy1:PING
The nagios check_command config for the check for this example would be:
check_cluster.pl -w 0 -c 1 \\
-d $SERVICESTATE:proxy1:PING$ \\
-d $SERVICESTATE:proxy2:PING$ \\
-i $SERVICESTATE:proxy1:PING$:proxy1:PING \\
-i $SERVICESTATE:proxy2:PING$:proxy2:PING
$0 version $VERSION
COPYRIGHT (c) 2006 T.L. <tom\@daemon.de>
ALL RIGHTS RESERVED. Published under the terms of the GPL.
);
exit UNKNOWN;
}