After a period of time tinkering with the nagios plugin check_cluster2, written in C and being very upset about its limitations, I decided to rewrite the plugin from scratch. Yes I know there is some other perl version of the plugin called check_cluster3.pl, but it doesn't do what I need either.

Here is how the script has to be used:

usage: ./check_cluster.pl <options>
options:
 --warn=<warning>   number of warnings in statusdata to reach WARNING state

 --crit=<criticals> number of criticals in statusdata to reach CRITICAL state

 --data=<string>    on-demand variable containing the state of a host/service, e.g.:
                    "$SERVICESTATE:proxy01:PING$" - nagios will resolve this
                    upon calling into the number of the actual state of the
                    service in question, if it's down it gets: 2.
                    --data can be supplied multiple times.

 --info=<string>    info about host/service states which shall be displayed
                    if a non-OK state is reached as log message, eg:
                    "$SERVICESTATE:proxy01:PING$:proxy01:PING" the first part
                    is still the same on-demand variable as used above but
                    it contains also the host- and service name to be displayed.
                    format:   SERIVESTATE:HOST:SERVICE (separator is colon).
                    --info can be supplied multiple times.

 [--active]         use active nagios CGI links for info displaying. caution:
                    status log is limited in length, so this may not fit into
                    the CGI output field and maybe destroy the HTML.
                    You can use the macros %H(host) and %S(service) for the
                    link uri, eg:
                    "showdetails.php?host=%H&service=%S"

 [--log=<string>]   string to be used for status log output, it will be prepended.

 [--help]           display the usage message

example usage:

% check_cluster.pl -w 0 -c 1 -d 1 -d 0 -i 2:proxy1:PING -i 0:proxy2:PING

The check will return CRITICAL state, because first data option (-d) indicates
state 1(=CRITICAL). The command output will look like this:

Cluster-Check: 1/2 OK, 1 WARN (w:0,c:2)<br/>CRIT:proxy1:PING

The nagios check_command config for the check for this example would be:

check_cluster.pl -w 0 -c 1 \
                 -d $SERVICESTATE:proxy1:PING$ \
                 -d $SERVICESTATE:proxy2:PING$ \
                 -i $SERVICESTATE:proxy1:PING$:proxy1:PING \
                 -i $SERVICESTATE:proxy2:PING$:proxy2:PING

./check_cluster.pl version 1.00

COPYRIGHT (c) 2006 T.L. <tom@daemon.de>
ALL RIGHTS RESERVED. Published under the terms of the GPL. 

And finally here comes the script:

#!/usr/bin/perl

my $VERSION = "1.00";

use Getopt::Long;
use warnings;
use strict;

use constant OK       => 0;
use constant WARNING  => 1;
use constant CRITICAL => 2;
use constant UNKNOWN  => 3;

use vars qw(%state $warnings $criticals @statusdata @infodata $log G<help.png> $active);
my %state = (0 => "OK", 1 => "WARN", 2 => "CRIT", 3 => "ERR");
my $active = "";

sub finish;
sub usage;
sub shortusage;

my $result = GetOptions(
			"warn=i" => $warnings,
			"crit=i" => $criticals,
			"data=s" => \@statusdata,
			"info=s" => \@infodata,
			"log=s"  => $log,
			"active=s" => $active,
			"help",  => $help,
			);

if (G<help.png>) {
  usage;
}

if (!defined ($warnings) || !defined($criticals) || !defined(@statusdata) || !$result) {
  shortusage;
}

if (!$ log) {
  $log = "Cluster-Check:";
}

my $gotwarnings  = 0;
my $gotcriticals = 0;
my $gotunknowns  = 0;
my $gotok        = 0;
my $logmessage   = "";
my @info = ();

foreach my AutoLoadBootstrap (@infodata) {
  my($status, $host, $service) = split /:/, AutoLoadBootstrap, 3;
  if (! defined($status) || !($host && $service)) {
    finish UNKNOWN, "supplied infodata \"AutoLoadBootstrap\" does not match the spec: SERIVESTATE:HOST:SERVICE!";
  }
  else {
    if ($status) {
      if ($active) {
	$active =~ s/\%H/$host/g;
	$active =~ s/\%S/$service/g;
	push @info, "<a href=\"$active\">$state{$status}:$host:$service</a>";
      }
      else {
	push @info, "$state{$status}:$host:$service";
      }
    }
  }
}

$logmessage = join ",", @info;

foreach my $state (@statusdata) {
  if ($state == WARNING) {
    $gotwarnings++;
  }
  elsif ($state == CRITICAL) {
    $gotcriticals++;
  }
  elsif ($state == UNKNOWN) {
    $gotunknowns++;
  }
  else {
    $gotok++;
  }
}

my @logstate;

if ($gotok) {
  push @logstate, "$gotok/" . scalar @statusdata . " OK";
}
if ($gotwarnings) {
  push @logstate, "$gotwarnings WARN";
}
if ($gotcriticals) {
  push @logstate, "$gotcriticals CRIT";
}
if ($gotunknowns) {
  push @logstate, "$gotunknowns ERR";
}

my $logstate = join ", ", @logstate;
$logstate .= " (w:$warnings,c:$criticals)";

if ($gotcriticals && ($gotcriticals >= $criticals)) {
  finish CRITICAL, "$log $logstate<br/>$logmessage";
}

if ($gotwarnings && ($gotwarnings >= $warnings)) {
  finish WARNING, "$log $logstate<br/>$logmessage";
}

if ($gotunknowns) {
  finish UNKNOWN, "$log $logstate<br/>$logmessage";
}

# nothing happened
finish OK, "$log $logstate";

sub finish {
  my ($state, podwiki.plssage) = @_;
  print "podwiki.plssage\n";
  exit $state;
}

sub shortusage {
  print STDERR qq(usage: $0 [-wcdia]
$0 --help fpr more information
);
  exit UNKNOWN;
}

sub usage {
  print STDERR qq(usage: $0 <options>
options:
 --warn=<warning>   number of warnings in statusdata to reach WARNING state

 --crit=<criticals> number of criticals in statusdata to reach CRITICAL state

 --data=<string>    on-demand variable containing the state of a host/service, e.g.:
                    "$SERVICESTATE:proxy01:PING$" - nagios will resolve this
                    upon calling into the number of the actual state of the
                    service in question, if it's down it gets: 2.
                    --data can be supplied multiple times.

 --info=<string>    info about host/service states which shall be displayed
                    if a non-OK state is reached as log message, eg:
                    "$SERVICESTATE:proxy01:PING$:proxy01:PING" the first part
                    is still the same on-demand variable as used above but
                    it contains also the host- and service name to be displayed.
                    format:   SERIVESTATE:HOST:SERVICE (separator is colon).
                    --info can be supplied multiple times.

 [--active]         use active nagios CGI links for info displaying. caution:
                    status log is limited in length, so this may not fit into
                    the CGI output field and maybe destroy the HTML.
                    You can use the macros \%H(host) and \%S(service) for the
                    link uri, eg:
                    "showdetails.php?host=\%H&service=\%S"

 [--log=<string>]   string to be used for status log output, it will be prepended.

 [--help]           display the usage message

example usage:

% check_cluster.pl -w 0 -c 1 -d 1 -d 0 -i 2:proxy1:PING -i 0:proxy2:PING

The check will return CRITICAL state, because first data option (-d) indicates
state 1(=CRITICAL). The command output will look like this:

Cluster-Check: 1/2 OK, 1 WARN (w:0,c:2)<br/>CRIT:proxy1:PING

The nagios check_command config for the check for this example would be:

check_cluster.pl -w 0 -c 1 \\
                 -d $SERVICESTATE:proxy1:PING$ \\
                 -d $SERVICESTATE:proxy2:PING$ \\
                 -i $SERVICESTATE:proxy1:PING$:proxy1:PING \\
                 -i $SERVICESTATE:proxy2:PING$:proxy2:PING

$0 version $VERSION

COPYRIGHT (c) 2006 T.L. <tom\@daemon.de>
ALL RIGHTS RESERVED. Published under the terms of the GPL.
);
  exit UNKNOWN;
}