Nagaina - structured systems monitoring platform


Nagios is extraordinarily useful, but the internals are an adhoc horror. In a world where automatic generation of DNS records and other related per-host and per-service information is the norm, building up piles of perl-ish descriptor tables is Just Wrong.

Also, nagios doesn't appear to be able to handle hierarchical cause recognition - if the first hop router is down, that doesn't mean that the remote service is down - just that you can't report on it. This is a distinct state, and there are some clear dependency declarations here. Likewise, if you can't ping a server, you don't also need to report or check that the services on it are down.

The name nagaina comes from Kipling's Rikki-Tiki-Tavi - a vicious snake, at least somewhat appropriate for this being a python project.


Nagaina is currently very small, and changing rapidly. It has been in production use (micro-scale production, but it's actually caught real problems, so I think that counts) since 0.01.

In order to streamline releasing, rather than updating this description for each rev, there is simply an RSS feed with enclosures of the releases generated by the Makefile.