Table of Contents
knoerre - fast check tool and http
server for nagios remote checks
knoerre [ key ]
knoerre is a tool for checking very different parameters of
a server. The intended primary purpose is to serve check values to a (remote)
requesting instance like nagios by using simplified HTTP.
It was developed as a substitution to the oversized, sometimes very buggy,
sometimes difficult to configure and often also slow net-snmp package.
knoerre uses (should use) tcpserver of DJB’s software suite ucspi-tcp. Only
the brave among yourselves will have the heart to do the daring deed of
using (x)inetd.
The usage of DJB’s daemontools and ucspi-tcp (for tcpserver) is strongly
recommended.
knoerre can be easily set up with knoerre-conf(1)
.
Access restrictions by IP# can be done with knoerre-update-tcprules(1)
.
A key is a specific request to knoerre like i.e. "load1". All "keys" can
be used local or by http
request i.e. knoerre load1, knoerre diskusage/home
or GET /load1 HTTP/1.1 . A key given on command line takes precedence over
reading a http request from stdin (by tcpserver). A http
request is internally
limited to 512 bytes.
Like using keys on the command line you can use knoerre in more ways of
nagios remote checks: called by ssh, NRPE and the slow snmpd. Nevertheless
the usage of tcpserver is strongly recommended. Using tcpserver and a request
like load1 you’ll receive a approx. 25% faster response like a local "/bin/cat
/proc/loadavg". Using a local "knoerre load1" it is 4 times faster than
"/bin/cat".
Here’s a short speed comparison, 5000 times remote request "load1":
net-snmp
default, default nagios check_snmp: 8 mins 50 secs
NRPE: 43 secs
tcpserver/knoerre: 3 secs
With the recommended usage of daemontools and ucspi-tcp
you don’t have to care about starting, stopping or restarting knoerre. Started
on demand by tcpserver(1)
there is no continuously running knoerre process
like other daemons. The controlling tcpserver-process can be managed with
svc(8)
.
Some basic checks are built into knoerre. These built-in
checks don’t need to call an external program.
- load1
- Return the load-per-1min.
Format: load1
Just return the load-per-1min in the last line and all of /proc/loadavg in
the line above.
- load5
- Return the load-per-5min.
Format: load5
Just return the load-per-5min in the last line and all of /proc/loadavg in
the line above.
- load15
- Return the load-per-15min.
Format: load15
Just return the load-per-15min in the last line and all of /proc/loadavg
in the line above.
- loaduser
- Return most processes per one account
Format: loaduser/XXX/YYY
where XXX and YYY are the min/max uid of the processes to be checked.
Return most running processes per one account. For every uid in the given
range all processes are counted. Up to 3 top users and the process counts
are printed and the value in the last line is the max proc count.
- cpuXY
- Show CPU usage in percent values.
Format: cpuXY/SECONDS
where X is one of (u|n|s|i|w|I) and Y one of (t|c) and optional SECONDS the measuring
interval.
The times of CPU usage can be shown ’t’otal since kernel start or ’c’urrent
values of a measuring interval of 10 seconds default. The CPU times are
’u’ser, ’n’ice, ’s’ystem, ’i’dle or I/O ’w’ait. The ’I’ values are "inverted" against
100 percent, e.g. print 99 for idle of 1%.
- dirlevels
- Return the maximum recursion
level
Format: dirlevels/absolute/path/to/dir
Step recursively into dir, count recursion level and print the max count.
One "@" can be used as wildcard like asterisk.
- direntries
- Return the number
of entries recursively in a directory.
Format: direntries/absolute/path/to/dir
It counts entries in a dir - not inodes. This check is equal to "direntries"
in recursive mode. See direntries(1)
.
- maxdirentries
- Return the maximum number
of entries recursively in directories.
Format: maxdirentries/X/absolute/path/to/dir
where cipher X is the recursive search depth.
This check is equal to "direntries" in max mode. See direntries(1)
.
- maxfilesizes
- Return biggest file size recursively.
Format: maxfilesizes/X/absolute/path/to/dir
where cipher X is the recursive search depth.
Find the 5 biggest files and print paths and sizes in MB. The return value
is the size of the biggest file in MB.
- filesizes
- Return (max) filesize(s)
in KB.
Format: filesizes/absolute/path/to/file
Get the filesize in KB of a single file or the maximum filesize of a group
of files. You can use one dot or ’@’ as one wildcard like asterisk in a shell.
See Examples.
- filesizesbysuffix
- Return (max) filesize(s) by given filename
suffix.
Format: filesizesbysuffix/XXXXX/Y/absolute/path/to/file
where XXXXX is a filename suffix like i.e. .gif and cipher Y is the recursive
search depth.
Get the filesize in KB of a single file or the maximum filesize of a group
of files by a given filename suffix and a maximum depth to search in. You
can use one dot or ’@’ as one wildcard like asterisk in a shell. See Examples.
- filetimestamp
- Return age of file in minutes.
Format: filetimestamp/X/absolute/path/to/file
with X one of [acmoACMO] using access, change or modification time or the
oldest of these.
Upper case means return no error but just 0 if file does not exist. If file
is a small regular file then also print its content before last line.
- tslogentries
- Count last lines in a logfile with the same beginning of line.
Format: tslogentries/XY/absolute/path/to/file
where cipher X is the recursive search depth and the optional Y is a separator
char.
If you have logfiles with a timestamp at the beginning of every logline
then you can count i.e. how many mails were sent or files were transferred
today. The first argument must be a cipher as field count and an optional
char taken as field separator to create a matching pattern. The pattern
is created from the last line and the field count and separator. If no separator
char is specified then ’ ’ (space) will be used as default. The second argument
is the path. You can use one dot or ’@’ as one wildcard like asterisk in a
shell. See Examples.
- kernellog
- Count "bad lines" in kernellog.
Format: kernellog/XX/absolute/path/to/kernellog
where XX is a two-digit number.
Like tslogentries you can specify as first parm the number of chars from
the beginning of a log line which must be equal to the beginning of the
last line of kernellog. If you use i.e. kernellog/07/var/log/kernel on Aug
29, then all lines starting with "Aug 29 " are scanned but not lines with
"Aug 28".
"Bad entries" are hardcoded in source and are strings like "access beyond
end of device", "ector repair", "kernel BUG" and more.
Up to 10 "bad lines" of kernellog are returned in lines above the count
return value for nagios.
- mysqlerr
- Count errors in mysqld errlog
Format: mysqlerr/absolute/path/to/mysqld.err
Like kernellog you must specify the absolute path to MySQL daemon error
logfile. Only lines with ts of the current day are examined. Every "Note"
counts, "Warnings" count ten times and every "ERROR" has a weight of 100.
- mountopts
- Check mountpoint and options
Format: mountopts/XXXXX/absolute/path/to/mountpoint
where XXXXX is an option string which should match the beginning of the
mount options
Use /proc/mounts for actual mount options and mountpoint. If the given option
string matches as long as it is the actual mount options then 0 will be
returned otherwise 1. If an error like i.e. not existing mountpoint or timeout
happens then 9999 or a bigger value is returned.
- diskusage
- Return used
disk space percentage.
Format: diskusage/absolute/path/to/fs
Return the amount of used space on a filesystem given after "diskusage/".
NOTE: Because just on simple stat() call is used, you can use this check
also for testing existance of files like e.g. "/var/lib/mysql/mysql.sock".
See nagios-check-diskfree(1)
.
- diskinodes
- Return used disk inodes percentage.
Format: diskinodes/absolute/path/to/fs
Like diskusage but for inodes and not diskspace.
- nfs
- Check availability
of a nfs-mounted fs.
Format: nfs/absolute/path/to/file
Check the availability of a nfs-mounted fs. It does this by "cat"ting the
content of a given file after "nfs/", which should contain "1". If this
file does not exist or NFS is not available and a alarm timeout did happen
then a bigger value than 1 is returned. See nagios-check-nfs(1)
. Deprecated.
Use ’cat’.
- cat
- Cat content of a file.
Format: cat/absolute/path/to/file
"Cat" the content of a given file after "cat/". The first line contains
the filename and also the date of the file (if no error occured). The last
line of the file should contain an integer value to check by nagios. You
can also use this check to test if an NFS-mounted FS is actually working
by "cat"ting a file which should contain just "1" in a line. If an error
or timeout happens then 9999 or a bigger value is returned.
- cmp
- Compare
a string to the content of a file.
Format: cmp/string/absolute/path/to/file
Compare a string to the content of a file. If the string is equal to the
content (LF is ignored) then 0 is returned otherwise 1. If an error or timeout
happens then 9999 or a bigger value is returned.
- process
- Count instances
of a process.
Format: process/XXXXX
Format: processd/XXXXX
Format: process/OpenVZ-CTID_YYYY/XXXXX
Format: processd/OpenVZ-CTID_YYYY/XXXXX *** CURRENTLY NOT IMPLEMENTED ***
where XXXXX is the name of a process as in /proc/.../stat and YYYY is the
CTID to match on an OpenVZ host.
If the key is "processd" then count only "real" daemons running as session
leader with PPID 1. See nagios-check-process(1)
.
- cmdline
- Return the number
of instances of a process by cmdline match.
Format: cmdline/XXXX
where XXX is a string which should be part of the cmdline.
Like process but use /proc/.../cmdline to detect also script-processes like
i.e. python loadlogger.py which process name is only "python".
- mailqsize
- Return
postfix mailqueue size.
Format: mailqsize
Return the size of the mailqueue on a postfix server. See postfix-mailqsize(1)
.
- rsbackup
- Return the minutes since the last backup.
Format: rsbackup
The last backup time is taken from "/var/log/backup.timestamp" and the difference
to the current time is returned. See nagios-check-backup(1)
.
- longprocs
- Return
minutes of the longest running user process.
Format: longprocs
Check for long running processes. This check returns the time in minutes
of the longest running user process. Its goal is to detect suspicious processes
like PHP-shells of hacked user accounts. The values for min/max uid and optional
exclude process names must be specified in /etc/knoerrerc. See nagios-check-longuserprocesses(1)
.
- longprocp
- Return minutes of the longest running user process.
Format: longprocp/XXX/YYY[/A[/B[/C]]]
where XXX and YYY are the min/max uid of the processes to be checked and
the optional A, B, ... are names of processes to be excluded from check (up
to 15).
Check for long running processes. This check returns the time in minutes
of the longest running user process. Its goal is to detect suspicious processes
like PHP-shells of hacked user accounts. The only difference to longprocs
is that min/max uid and process excludes are given by HTTP request and
are not configured in /etc/knoerrerc. It’s useful in cases when you want
to build a monolithic version of knoerre which does not read knoerrerc.
- wc-l
- Count lines of a file.
Format: wc-l/absolute/path/to/file
Just like shell cmd "wc -l" it counts lines of a file. You can use it for
checking i.e. apache running out of semaphores with wc-l/proc/sysvipc/sem.
- timediff
- System clock difference between local and remote.
Format: timediff/XXXXX
where XXXXX must be the unix timestamp from the requesting server in seconds
since epoch.
The difference between remote and local system time is returned as a (positive)
value in seconds.
A sample check in a shell:
lynx -dump http://172.16.1.1:8888/timediff/$(date +%s)
- proccount
- Number of
all processes
Format: proccount
The number of all processes is counted by stepping through /proc.
- swap
- Used swap space in MB
Format: swap
Used swap space in MB is calculated with values of /proc/meminfo. MemTotal
and SwapTotal in MB are printed in line before last. If you don’t need this
data you should use swaps because /proc/swaps holds just swap information.
- swaps
- Used swap(s) space in MB
Format: swaps
This is an alternate version to swap. The amount of used swap space is calculated
by adding the "Used" fields in /proc/swaps. The number of active swaps is
printed in line before last.
- sockets
- Count sockets / sockets per port
Format: sockets/XXXXX/YYYYY/ZZZZ
where XXXXX is the type of the socket, YYYYY is local or remote and ZZZZ
is the port as 4-digit hexstring.
Currently only socket-type tcp and counting local ports is implemented. Socket
data is read from /proc/net/XXXXX. If you wanna know e.g. the number of sockets
of a local running apache then you should use the key sockets/tcp/local/0050
.
The (optional) resource config file is "/etc/knoerrerc". You
can just specify some basic settings like external commands or parameters
for "longprocs".
To specify an external program which is called by knoerre please use "CMD
programurl command arg1 arg2 .. arg15", like i.e.
CMD loadavg cat /proc/loadavg
NOTE1: The number of args is limited to 15.
NOTE2: knoerre doesn’t use insecure and oversized popen(). You don’t get a
shell to execute the external program.
NOTE3: You can’t specify a path to your external program. For security reasons
knoerre uses an internal path list to search for the program.
Parameters for the longprocs function can be specified like this:
LONGPROC_UID_MIN
630
LONGPROC_UID_MAX 65533
LONGPROC_EXCLUDES vsftpd bash sftp-server
knoerre uses one configuration
file and one access restrictions file for its tcpserver daemon:
- /etc/knoerrerc
- rc-file for non-monolithic knoerre
- /etc/knoerre.tcprules.cdb
- tcprules for use
with tcpserver
tcpserver(1)
, knoerre-conf(1)
, knoerre-update-tcprules(1)
,
svc(8)
, check_remote_by_http(1)
, check_remote_by_http_time(1)
http://cr.yp.to/ucspi-tcp.html
http://cr.yp.to/daemontools.html
Here’s a simple example of a client
and server communication: server$ tcpserver -v -RHl localhost 0 8888 knoerre
client$ lynx -dump -mime-header http://server:8888/load1
HTTP/1.0 200 OK
Server: knoerre/0.8.5m
Content-Type: text/plain
1.51
You can also use something like
echo "GET /loadavg HTTP/1.1" | knoerre
or
knoerre loadavg
This example shows the usage of a @ as wildcard:
$ knoerre filesizes/home/www/@/log/access_log
/home/www/user_hans/log/access_log
52222
A very "complex" example with three arguments (suffix, depth and path)
and wildcard usage is this:
$ knoerre filesizesbysuffix/.gif/2/home/@/html/typo3temp
/home/www/user_hans/html/typo3temp/pics/30363cbb32.gif
201
Which user sent the most emails today?
$ knoerre tslogentries/1/home/www/@/log/mail.log
/home/www/user_hans/log/mail.log
858
Which user runs the most processes?
$ knoerre loaduser/1/60000
hans=32 jack=3 john=1
32
Is /home rw-mounted and nosuid?
$ grep home /proc/mounts
/dev/sda7 /home ext3 rw,nosuid,nodev,data=ordered 0 0
$ knoerre/knoerre mountopts/rw,nosuid/home
/home==rw,nosuid?
/dev/sda7 /home ext3 rw,nosuid,nodev,data=ordered 0 0
0
knoerre does not support dropping of rights. Used as remote check
tool with tcpserver you can drop rights with tcpserver. knoerre actually
does not need to be run as root but for different checks and different
dirs and files you’ll maybe need different rights. Don’t use setuid-bits, uid/euid
checks are not made.
Too long keys are truncated or answered with http-redirection.
HTTP requests are limited to 512 bytes.
Keys containing ".." are answered
with http-redirection.
All stat-calls are lstat()-calls.
No writes are made
to filesystem(s), all open()-calls are read-only. Data is only written to
stdout/stderr.
No external libs are used. Only standard C-lib is used. No
stdio-functions are used. "External" input data is used with bound checks.
Arrays are "oversized" to avoid off-by-one errors.
An internal timeout prevents
"dead" knoerre processes with blocking read() and waiting for data which
will never come.
The amount of syscalls and the amount of different syscalls
is low. The source code and also the executable file is small.
Using external
commands with "CMD" in /etc/knoerrerc can be a security risk because the
external program is forked/exec’ed by knoerre.
knoerre doesn’t use insecure
and oversized popen() to execute external commands. You don’t get a shell
to execute an external program. You can’t put strings in quotes. Space does
always separate. You can’t specify a path to your external program. knoerre
uses an internal path list to search for the program.
It’s strongly recommended
that you only allow access for your nagios server by tcp. One entry "knoerre:
ALL" in /etc/hosts.deny and one entry with the nagios server IP# in /etc/hosts.allow.
After changing it you must use knoerre-update-tcprules(1)
to update tcpserver’s
cdb file. Keep always in mind that host based authentication is actually
not a authentication.
To encrypt network traffic please use e.g. ipsec or
vpn.
Due to "leaf optimization" in direntries recursive mode it
can produce wrong results on non-unix-like filesystems.
The maximum internal
absolute pathname length is 16384 chars.
Frank Bergmann, http://www.tuxad.com
Table of Contents