knoerre(1) manual page

Name

knoerre - fast check tool and http server for nagios remote checks

knoerre is a tool for checking very different parameters of a server. The intended primary purpose is to serve check values to a (remote) requesting instance like nagios by using simplified HTTP.
It was developed as a substitution to the oversized, sometimes very buggy, sometimes difficult to configure and often also slow net-snmp package.
knoerre uses (should use) tcpserver of DJB’s software suite ucspi-tcp. Only the brave among yourselves will have the heart to do the daring deed of using (x)inetd.
The usage of DJB’s daemontools and ucspi-tcp (for tcpserver) is strongly recommended.

knoerre can be easily set up with knoerre-conf(1) .
Access restrictions by IP# can be done with knoerre-update-tcprules(1) .

A key is a specific request to knoerre like i.e. "load1". All "keys" can be used local or by http request i.e. knoerre load1, knoerre diskusage/home or GET /load1 HTTP/1.1 . A key given on command line takes precedence over reading a http request from stdin (by tcpserver). A http request is internally limited to 512 bytes.

Like using keys on the command line you can use knoerre in more ways of nagios remote checks: called by ssh, NRPE and the slow snmpd. Nevertheless the usage of tcpserver is strongly recommended. Using tcpserver and a request like load1 you’ll receive a approx. 25% faster response like a local "/bin/cat /proc/loadavg". Using a local "knoerre load1" it is 4 times faster than "/bin/cat".
Here’s a short speed comparison, 5000 times remote request "load1":

net-snmp default, default nagios check_snmp: 8 mins 50 secs
NRPE: 43 secs
tcpserver/knoerre: 3 secs

Process control

With the recommended usage of daemontools and ucspi-tcp you don’t have to care about starting, stopping or restarting knoerre. Started on demand by tcpserver(1) there is no continuously running knoerre process like other daemons. The controlling tcpserver-process can be managed with svc(8) .

Built-In checks

Some basic checks are built into knoerre. These built-in checks don’t need to call an external program.

load1: Return the load-per-1min.
Format: load1
Just return the load-per-1min in the last line and all of /proc/loadavg in the line above.
load5: Return the load-per-5min.
Format: load5
Just return the load-per-5min in the last line and all of /proc/loadavg in the line above.
load15: Return the load-per-15min.
Format: load15
Just return the load-per-15min in the last line and all of /proc/loadavg in the line above.
loaduser: Return most processes per one account
Format: loaduser/XXX/YYY
where XXX and YYY are the min/max uid of the processes to be checked.
Return most running processes per one account. For every uid in the given range all processes are counted. Up to 3 top users and the process counts are printed and the value in the last line is the max proc count.
cpuXY: Show CPU usage in percent values.
Format: cpuXY/SECONDS
where X is one of (u|n|s|i|w|I) and Y one of (t|c) and optional SECONDS the measuring interval.
The times of CPU usage can be shown ’t’otal since kernel start or ’c’urrent values of a measuring interval of 10 seconds default. The CPU times are ’u’ser, ’n’ice, ’s’ystem, ’i’dle or I/O ’w’ait. The ’I’ values are "inverted" against 100 percent, e.g. print 99 for idle of 1%.
dirlevels: Return the maximum recursion level
Format: dirlevels/absolute/path/to/dir
Step recursively into dir, count recursion level and print the max count. One "@" can be used as wildcard like asterisk.
direntries: Return the number of entries recursively in a directory.
Format: direntries/absolute/path/to/dir
It counts entries in a dir - not inodes. This check is equal to "direntries" in recursive mode. See direntries(1) .
maxdirentries: Return the maximum number of entries recursively in directories.
Format: maxdirentries/X/absolute/path/to/dir
where cipher X is the recursive search depth.
This check is equal to "direntries" in max mode. See direntries(1) .
maxfilesizes: Return biggest file size recursively.
Format: maxfilesizes/X/absolute/path/to/dir
where cipher X is the recursive search depth.
Find the 5 biggest files and print paths and sizes in MB. The return value is the size of the biggest file in MB.
filesizes: Return (max) filesize(s) in KB.
Format: filesizes/absolute/path/to/file
Get the filesize in KB of a single file or the maximum filesize of a group of files. You can use one dot or ’@’ as one wildcard like asterisk in a shell. See Examples.
filesizesbysuffix: Return (max) filesize(s) by given filename suffix.
Format: filesizesbysuffix/XXXXX/Y/absolute/path/to/file
where XXXXX is a filename suffix like i.e. .gif and cipher Y is the recursive search depth.
Get the filesize in KB of a single file or the maximum filesize of a group of files by a given filename suffix and a maximum depth to search in. You can use one dot or ’@’ as one wildcard like asterisk in a shell. See Examples.
filetimestamp: Return age of file in minutes.
Format: filetimestamp/X/absolute/path/to/file
with X one of [acmoACMO] using access, change or modification time or the oldest of these.
Upper case means return no error but just 0 if file does not exist. If file is a small regular file then also print its content before last line.
tslogentries: Count last lines in a logfile with the same beginning of line.
Format: tslogentries/XY/absolute/path/to/file
where cipher X is the recursive search depth and the optional Y is a separator char.
If you have logfiles with a timestamp at the beginning of every logline then you can count i.e. how many mails were sent or files were transferred today. The first argument must be a cipher as field count and an optional char taken as field separator to create a matching pattern. The pattern is created from the last line and the field count and separator. If no separator char is specified then ’ ’ (space) will be used as default. The second argument is the path. You can use one dot or ’@’ as one wildcard like asterisk in a shell. See Examples.
kernellog: Count "bad lines" in kernellog.
Format: kernellog/XX/absolute/path/to/kernellog
where XX is a two-digit number.
Like tslogentries you can specify as first parm the number of chars from the beginning of a log line which must be equal to the beginning of the last line of kernellog. If you use i.e. kernellog/07/var/log/kernel on Aug 29, then all lines starting with "Aug 29 " are scanned but not lines with "Aug 28".
"Bad entries" are hardcoded in source and are strings like "access beyond end of device", "ector repair", "kernel BUG" and more.
Up to 10 "bad lines" of kernellog are returned in lines above the count return value for nagios.
mysqlerr: Count errors in mysqld errlog
Format: mysqlerr/absolute/path/to/mysqld.err
Like kernellog you must specify the absolute path to MySQL daemon error logfile. Only lines with ts of the current day are examined. Every "Note" counts, "Warnings" count ten times and every "ERROR" has a weight of 100.
mountopts: Check mountpoint and options
Format: mountopts/XXXXX/absolute/path/to/mountpoint
where XXXXX is an option string which should match the beginning of the mount options
Use /proc/mounts for actual mount options and mountpoint. If the given option string matches as long as it is the actual mount options then 0 will be returned otherwise 1. If an error like i.e. not existing mountpoint or timeout happens then 9999 or a bigger value is returned.
diskusage: Return used disk space percentage.
Format: diskusage/absolute/path/to/fs
Return the amount of used space on a filesystem given after "diskusage/". NOTE: Because just on simple stat() call is used, you can use this check also for testing existance of files like e.g. "/var/lib/mysql/mysql.sock". See nagios-check-diskfree(1) .
diskinodes: Return used disk inodes percentage.
Format: diskinodes/absolute/path/to/fs
Like diskusage but for inodes and not diskspace.
nfs: Check availability of a nfs-mounted fs.
Format: nfs/absolute/path/to/file
Check the availability of a nfs-mounted fs. It does this by "cat"ting the content of a given file after "nfs/", which should contain "1". If this file does not exist or NFS is not available and a alarm timeout did happen then a bigger value than 1 is returned. See nagios-check-nfs(1) . Deprecated. Use ’cat’.
cat: Cat content of a file.
Format: cat/absolute/path/to/file
"Cat" the content of a given file after "cat/". The first line contains the filename and also the date of the file (if no error occured). The last line of the file should contain an integer value to check by nagios. You can also use this check to test if an NFS-mounted FS is actually working by "cat"ting a file which should contain just "1" in a line. If an error or timeout happens then 9999 or a bigger value is returned.
cmp: Compare a string to the content of a file.
Format: cmp/string/absolute/path/to/file
Compare a string to the content of a file. If the string is equal to the content (LF is ignored) then 0 is returned otherwise 1. If an error or timeout happens then 9999 or a bigger value is returned.
process: Count instances of a process.
Format: process/XXXXX
Format: processd/XXXXX
Format: process/OpenVZ-CTID_YYYY/XXXXX
Format: processd/OpenVZ-CTID_YYYY/XXXXX *** CURRENTLY NOT IMPLEMENTED ***
where XXXXX is the name of a process as in /proc/.../stat and YYYY is the CTID to match on an OpenVZ host.
If the key is "processd" then count only "real" daemons running as session leader with PPID 1. See nagios-check-process(1) .
cmdline: Return the number of instances of a process by cmdline match.
Format: cmdline/XXXX
where XXX is a string which should be part of the cmdline.
Like process but use /proc/.../cmdline to detect also script-processes like i.e. python loadlogger.py which process name is only "python".
mailqsize: Return postfix mailqueue size.
Format: mailqsize
Return the size of the mailqueue on a postfix server. See postfix-mailqsize(1) .
rsbackup: Return the minutes since the last backup.
Format: rsbackup
The last backup time is taken from "/var/log/backup.timestamp" and the difference to the current time is returned. See nagios-check-backup(1) .
longprocs: Return minutes of the longest running user process.
Format: longprocs
Check for long running processes. This check returns the time in minutes of the longest running user process. Its goal is to detect suspicious processes like PHP-shells of hacked user accounts. The values for min/max uid and optional exclude process names must be specified in /etc/knoerrerc. See nagios-check-longuserprocesses(1) .
longprocp: Return minutes of the longest running user process.
Format: longprocp/XXX/YYY[/A[/B[/C]]]
where XXX and YYY are the min/max uid of the processes to be checked and the optional A, B, ... are names of processes to be excluded from check (up to 15).
Check for long running processes. This check returns the time in minutes of the longest running user process. Its goal is to detect suspicious processes like PHP-shells of hacked user accounts. The only difference to longprocs is that min/max uid and process excludes are given by HTTP request and are not configured in /etc/knoerrerc. It’s useful in cases when you want to build a monolithic version of knoerre which does not read knoerrerc.
wc-l: Count lines of a file.
Format: wc-l/absolute/path/to/file
Just like shell cmd "wc -l" it counts lines of a file. You can use it for checking i.e. apache running out of semaphores with wc-l/proc/sysvipc/sem.
timediff: System clock difference between local and remote.
Format: timediff/XXXXX
where XXXXX must be the unix timestamp from the requesting server in seconds since epoch.
The difference between remote and local system time is returned as a (positive) value in seconds.
A sample check in a shell:
lynx -dump http://172.16.1.1:8888/timediff/$(date +%s)
proccount: Number of all processes
Format: proccount
The number of all processes is counted by stepping through /proc.
swap: Used swap space in MB
Format: swap
Used swap space in MB is calculated with values of /proc/meminfo. MemTotal and SwapTotal in MB are printed in line before last. If you don’t need this data you should use swaps because /proc/swaps holds just swap information.
swaps: Used swap(s) space in MB
Format: swaps
This is an alternate version to swap. The amount of used swap space is calculated by adding the "Used" fields in /proc/swaps. The number of active swaps is printed in line before last.
sockets: Count sockets / sockets per port
Format: sockets/XXXXX/YYYYY/ZZZZ
where XXXXX is the type of the socket, YYYYY is local or remote and ZZZZ is the port as 4-digit hexstring.
Currently only socket-type tcp and counting local ports is implemented. Socket data is read from /proc/net/XXXXX. If you wanna know e.g. the number of sockets of a local running apache then you should use the key sockets/tcp/local/0050 .

knoerrerc

The (optional) resource config file is "/etc/knoerrerc". You can just specify some basic settings like external commands or parameters for "longprocs".
To specify an external program which is called by knoerre please use "CMD programurl command arg1 arg2 .. arg15", like i.e.

CMD loadavg cat /proc/loadavg

NOTE1: The number of args is limited to 15.
NOTE2: knoerre doesn’t use insecure and oversized popen(). You don’t get a shell to execute the external program.
NOTE3: You can’t specify a path to your external program. For security reasons knoerre uses an internal path list to search for the program.

Parameters for the longprocs function can be specified like this:

LONGPROC_UID_MIN 630
LONGPROC_UID_MAX 65533
LONGPROC_EXCLUDES vsftpd bash sftp-server

Files

knoerre uses one configuration file and one access restrictions file for its tcpserver daemon:

/etc/knoerrerc: rc-file for non-monolithic knoerre

/etc/knoerre.tcprules.cdb: tcprules for use with tcpserver

Examples

Here’s a simple example of a client and server communication:

server$ tcpserver -v -RHl localhost 0 8888 knoerre
client$ lynx -dump -mime-header http://server:8888/load1
HTTP/1.0 200 OK
Server: knoerre/0.8.5m
Content-Type: text/plain

1.51

You can also use something like

echo "GET /loadavg HTTP/1.1" | knoerre

knoerre loadavg

This example shows the usage of a @ as wildcard:

$ knoerre filesizes/home/www/@/log/access_log
/home/www/user_hans/log/access_log
52222

A very "complex" example with three arguments (suffix, depth and path) and wildcard usage is this:

$ knoerre filesizesbysuffix/.gif/2/home/@/html/typo3temp
/home/www/user_hans/html/typo3temp/pics/30363cbb32.gif
201

Which user sent the most emails today?

$ knoerre tslogentries/1/home/www/@/log/mail.log
/home/www/user_hans/log/mail.log
858

Which user runs the most processes?

$ knoerre loaduser/1/60000
hans=32 jack=3 john=1
32

Is /home rw-mounted and nosuid?

$ grep home /proc/mounts
/dev/sda7 /home ext3 rw,nosuid,nodev,data=ordered 0 0
$ knoerre/knoerre mountopts/rw,nosuid/home
/home==rw,nosuid?
/dev/sda7 /home ext3 rw,nosuid,nodev,data=ordered 0 0
0

Security

knoerre does not support dropping of rights. Used as remote check tool with tcpserver you can drop rights with tcpserver. knoerre actually does not need to be run as root but for different checks and different dirs and files you’ll maybe need different rights. Don’t use setuid-bits, uid/euid checks are not made.

Too long keys are truncated or answered with http-redirection. HTTP requests are limited to 512 bytes.

Keys containing ".." are answered with http-redirection.

All stat-calls are lstat()-calls.

No writes are made to filesystem(s), all open()-calls are read-only. Data is only written to stdout/stderr.

No external libs are used. Only standard C-lib is used. No stdio-functions are used. "External" input data is used with bound checks. Arrays are "oversized" to avoid off-by-one errors.

An internal timeout prevents "dead" knoerre processes with blocking read() and waiting for data which will never come.

The amount of syscalls and the amount of different syscalls is low. The source code and also the executable file is small.

Using external commands with "CMD" in /etc/knoerrerc can be a security risk because the external program is forked/exec’ed by knoerre.

knoerre doesn’t use insecure and oversized popen() to execute external commands. You don’t get a shell to execute an external program. You can’t put strings in quotes. Space does always separate. You can’t specify a path to your external program. knoerre uses an internal path list to search for the program.

It’s strongly recommended that you only allow access for your nagios server by tcp. One entry "knoerre: ALL" in /etc/hosts.deny and one entry with the nagios server IP# in /etc/hosts.allow. After changing it you must use knoerre-update-tcprules(1) to update tcpserver’s cdb file. Keep always in mind that host based authentication is actually not a authentication.

To encrypt network traffic please use e.g. ipsec or vpn.

Caveats

Due to "leaf optimization" in direntries recursive mode it can produce wrong results on non-unix-like filesystems.

The maximum internal absolute pathname length is 16384 chars.

Author

Frank Bergmann, http://www.tuxad.com

Table of Contents

Name
Synopsis
Description

Process control
Built-In checks
knoerrerc

Files
See Also
Examples
Security
Caveats
Author