Is Nagios NTP Plugin flawed?

We recently started using Nagios to monitor a number of computer systems, as well some key services running on them. One such service is NTP, the Network Time synchronisation service (supported in Nagios via the check_ntp plugin).

Earlier today, Nagios (2.10) began  to report that a number of our (Solaris) systems had drifted too far out of sync from their associated time servers. However, when we checked the time and NTP service on each system, we could not find anything wrong. To cut a long story short, it turns out that the system on which Nagios was running, was not itself synchronised to any form of time server and as such, it was the system whose time had actually drifted out of sync.

This is rather curious behaviour on the part of Nagios in my opinion and I wonder if it isn’t actually a bug in the check_ntp plugin. Why should it matter what the time is where Nagios runs so long as the systems being monitored haven’t drifted off their respective mark? Also, what it we were monitoring systems in different time zones?

As far a the solution to the original problem goes, this was quite straightforward in the end. All we had to do was configure our Nagios system to synchronise to the same time server as the systems it was monitoring. However, I still don’t believe we should have to do this.

Installing Nagios on Solaris

If you are not familiar with it, Nagios is an open source system and network monitoring application that keeps a watchful eye on hosts and services that you specify, alerting you when things go bad and when they get better. It was designed to run under Linux but, according to the creators, should also run on other Unix variants. With this in mind, I decided to try it out on Solaris.

Fortunately, they provide a very comprehensive User Guide and trust me, you’re going to need it. Suffice it to say, there is a section early in this document entitled Advice for Beginners which is pretty blunt about how tricky Nagios is to set up and, believe me, they are not wrong. Having said that, they do also say that once you get it running you will never want to be without it and I can definitely subscribe this this notion too.
Anyway, here are my notes from installing Nagios on Solaris:

My Setup

  • Solaris 10 (u3) for x86
  • Sun Studio 12
  • Nagios 2.10
  • Nagios Plugins 1.4.11

Building Nagios

I downloaded the Nagios source tar ball and unpacked it as a non-root user. Then, following the User Guide, I ran the configure script with no argument (implying I wanted all default settings) followed make all and this seemed to work fine.

Building Nagios Plugins

Nagios does most things by using other scripts/applications which it calls plugins. The Nagios website provides a collection of popular plugins for you to build. Once again, I did so using a configure command followed by a make all command. However, this did not go entirely smoothly:

  1. A number of the plugins failed to build citing an “undefined symbol: floor” error. This was resolved by adding -lm to the LIBS defined in line 328 of nagios-plugins-1.4.11/plugins/Makefile. This could probably also have been fixed by adding $(MATHLIBS) to the links statement of the affected plugins but that would have been more work.
  2. The check_dhcp module failed to compile citing several unknown data types (i.e. u_int8_t and u_int32_t). This was resolved by adding a -D__solaris__ to CPPFLAGS definitions at line 161 of nagios-plugins-1.4.11/plugins-root/Makefile
  3. The nagios-plugins-1.4.11/plugins-root/Makefile was also missing the same -lm link parameter as the plugins Makefile (line 221)

Once all of the above changes were make, all of the plugins seemed to build correctly.

Problems Found During Nagios Configuration

1. check_ping plugin did not work

No matter what way I configured the use of the check_ping plugin (in localhost.cfg, services section) it always reported:

CRITICAL – You need more args!!!
Could not open pipe:

A number of websites suggested that this was an IPv6 issue and that I should have used the --with-ipv6=no in my original call to the configure script when building the plugins. However, this was not the solution for me. It turns out that the definition of PING_COMMAND in nagios-plugins-1.4.11/config.h was empty and thus the check_ping plugin was actually making no attempt whatsoever to ping the requested host. I suspect that the reason for this is that I built the software as a non-root user which, on Solaris, does not have the ping command in it’s path (since ping is located in /usr/sbin on Solaris). Hence, the original configure script was unable to produce a valid definition for PING_COMMAND.

The solution to this was to edit nagios-plugins-1.4.11/config.h and add the following definition for PING_COMMAND (line 796)

#define PING_COMMAND “/usr/sbin/ping -s %s 64 %u”

The above command specific to Solaris and makes the Solaris version of ping behave like the Linux ping command. After this edit, I had to force a rebuild of the check_ping plugin (touch plugins/check_ping.c; make)

2. statusmap.cgi did not build

I only noticed this when I tried to view the Status Map section of Nagios. In short, the reason why this has not built is that I was missing a GD library on my Solaris system. The solution was to download and install a version of the GD library (and each of its dependent packages). I got mine from sunfreeware.com. The statusmap.cgi utility then built correctly and once I copied it to the libexec directory where Nagios was installed, it worked.

3. VRML Browser Plugin required

When I tried to view the 3-D Status Map options in Nagios, my brower kept launching a “Save As” dialog box. I turns out I needed to install a VRML plugin in my browser. I chose one called Cortona from Parallel Graphics. It seems to work fine in Firefox although, as yet, the 3-D Status Map view is more impressive than it is useful (for me anyway).

Conclusion

Nagios indeed took a long time to install, configure and set up. However, I can confirm that it was worth the effort and I am very pleased with it so far.