SysUsage is a system monitoring and alarm reporting tool. It can generate historical graph views of CPU, memory, IO, network and disk usage, and very much more.
GPL-3.0 License
NAME SysUsage v5.7 - System Monitoring Tool
DESCRIPTION SysUsage is a tool used to continuously monitor a system and generate daily/weekly/monthly/yearly graphical report using rrdtool and sar.
FEATURES SysUsage generate graphical reports on all system activity information. His periodical reports allow you to keep track of the machine activity during his life and will be a great help for performance analysis and resources management.
SysUsage can be run periodically from 10 seconds cycle in daemon mode to
1 minute or more using crond.
SysUsage can be run from a central server to call a ssh remote execution
of the sysusage perl script so that collected data will be stored in
this central place. You also will have just one place where rrdtool and
related Perl modules need to be installed as well as just one place
where sysusagegraph or sysusagejqgraph need to be executed.
CPUs - CPUs distribution usage (user, nice, system). - CPUs global usage (total cpu used, iowait). - CPUs virtualized usage (steal, guest).
Memory - Memory usage (with and without cache). - Swap usage (with and without cache). - Amount of memory need for current workload. - Posix share memory. - Hugepages utilisation - Active versus inactive memory - Dirty memeory that need to be written to disk
I/O - Context switches per second. - Interrupts per second. - Page swapping. - Page I/O stats. - I/O request stats. - I/O block stats.
Network - TCP connections per second. - TCP segments per second. - Number of socket in use (Total, TCP and UDP). - Number of socket in TIME_WAIT state. - Active network interface usage. - Active network interface bad packet, dropping, collision.
Devices - CPU time for I/O on device. - Read/Write sectors on device. - Disk throughput on device. - I/O workload on device. - Times for I/O requests issued to device. - Hard drive temperature if your hardward support it (with hddtemp). - MotherBoard/CPU/Remote temperature reported by sensors or sar. - Fan RPM reported by sensors.
Files - Number of open file. - Number of file in a queue directory. - Disk space used on mounted partition.
Process - Load average. - Process created per second. - Number of running process (ex: sendmail, httpd, oracle, etc.). - Number of running thread (ex: mysqld, amarok, etc.). - Number of task blocked waiting for I/O
Notification You can have mail or Nagios notification when some monitored values are outside max/min threshold values for all type of monitoring.
Plugins With SysUsage you can create your own monitoring plugins. Any script or program can be embeded in SysUsage provided that it return up to 3 numeric values. The graphic title and labels are defined in the configuration file.
Remote call SysUsage can be installed and run onto a central server that will be used to store statistics data by periodically calling sysusage on remote host using SSH. This central place will also be in charge to renderer HTML plages and graphics for all hosts. This will allow to simplify the SysUsage installation on remote host that will only require sysstat and rsysusage.
REQUIREMENT rrdtool You need to install rrdtool. All distribution may have a dedicated package for rrdtool. On CentOs/RedHat distributions, use the following command:
yum install rrdtool rrdtool-perl
on Debian/Ubuntu distributions use command:
apt-get install rrdtool librrds-perl
The sources can be found here:
http://people.ee.ethz.ch/~oetiker/
If you compile from sources and want to use the RRDs perl module
embedded with it, you must use the following command to compile:
make site-perl-install
This installation is optional if sysusage is installed on a remote host.
sysstat You also need sar to collect statistics. Sar is part of the sysstat package. For RPM like distributions:
yum install sysstat
and Debian like distributions:
apt-get install sysstat
The sources can always be found here :
http://freshmeat.net/projects/sysstat/
If you plan to use threshold notification you must have Net::SMTP
installed.
yum install perl-Net-SMTP-SSL
or
apt-get install libnet-smtp-ssl-perl
Sources can be found on CPAN (https://metacpan.org/pod/Net::SMTP)
Perl modules Sysusage can be run in a central place to collect remote sysusage statistics using ssh. The remote calls are proceed simultaneously using fork with the Proc::Queue Perl module.
If you're plan tu use sysusagegraph instead of sysusagejqgrpah you will
also need the GD and GD::Graph3D Perl modules. Note that the use of GD
and GD::Graph is deprecated and sysusagegraph will be removed in next
major release (6.0).
All these modules are always available from CPAN (https://metacpan.org/)
and may at least be installed on the central server. On remote host this
is optional and depend if you want to run it on each server or by ssh
from a central place.
Nagios nsca client (optional) If you want to send message to Nagios you need to install nsca-2.7.2.tar.gz or a more recent version. You can get it here:
http://sourceforge.net/projects/nagios/files/
hddtemp and sensors (optional) If you want to monitor your hard drive temperature you must install a small utility called hddtemp. You can download it from http://download.savannah.gnu.org/releases/hddtemp/. Run it to see if your hard drive have a temperature sensor.
You can also use sensors to monitor your cpu temperature and fan speed.
If you harware support it run sensors-detect and load the required
kernel modules at boot time.
INSTALLATION Quick install Simply run the following commands:
perl Makefile.PL
make && make install
By default it will copy the perl programs into /usr/local/sysusage/bin
and the HTML output will be done to /var/www/htdocs/sysusage/. The
configuration file is /usr/local/sysusage/etc/sysusage.cfg and all RRD
Bekerley DB databases from rrdtool will be saved under
/usr/local/sysusage/rrdfiles.
If you plan to run sysusage on different servers from a central place
you may just want to install the rsysusage Perl script on remote hosts.
So proceed as follow:
perl Makefile.PL REMOTE=1
make && make install
It will copy the only the rsysusage into /usr/local/sysusage/bin and the
configuration file under /usr/local/sysusage/etc/sysusage.cfg. The RRD
data directory will be created under /usr/local/sysusage/rrdfiles but
just to hold the *.cnt files relatives to the count of alert attempt on
threshold exceed.
Custom install You can overwrite all install path with the following Makefile.PL arguments. Here are the default values:
BINDIR=/usr/local/sysusage/bin
CONFDIR=/usr/local/sysusage/etc
PIDDIR=/usr/local/sysusage/etc
BASEDIR=/usr/local/sysusage/rrdfiles
PLUGINDIR=/usr/local/sysusage/plugins
HTMLDIR=/var/www/htdocs/sysusage
MANDIR=/usr/local/sysusage/doc
DOCDIR=/usr/local/sysusage/doc
REMOTE=
For example on a RedHat System you may prefer install SysUsage as this:
perl Makefile.PL BINDIR=/usr/bin CONFDIR=/etc PIDDIR=/var/run \
BASEDIR=/var/lib/sysusage HTMLDIR=/var/www/html/sysusage \
MANDIR=/usr/man/man1 DOCDIR=/usr/share/doc/sysusage
If you are installing sysusage on a host that will be call by ssh from a
central place, you may want to install just what is necessary and not
more:
perl Makefile.PL BINDIR=/usr/bin CONFDIR=/etc PIDDIR=/var/run \
MANDIR=/usr/man/man1 DOCDIR=/usr/share/doc/sysusage \
REMOTE=1
This will just install the rsysusage Perl script, the configuration file
and documentation. So that you don't need to install extra Perl modules
and other graphics related things.
Package/binary install In directory packaging/ you will find all scripts to build RPM, slackBuild and debian package. See README in this directory to know how to build these packages.
USAGE SysUsage consist in two main Perl scripts, sysusage and sysusagegraph. Once you have correctly installed and configured SysUsage the best way to execute them is by setting a cron job. If you prefer javascript graphics instead of GD::Graph images use sysusagejqgraph that is based on jqplot javascript library. This is the recommanded script as use of GD::Graph through sysusagegraph is deprecated.
sysusage The script sysusage is responsible of collecting system informations at a given interval and store them into rrdtool database files.
As it is very fast you can set running interval time to 1 minute. This
is the default pooling interval used in configuration and graph reports.
If you change this interval you must also change it in the configuration
file otherwise your graph will be false. See the INTERVAL configuration
directive.
Here is how I use it with a default installation:
*/1 * * * * /usr/local/sysusage/bin/sysusage > /dev/null 2>&1
rsysusage This script do the same things as the sysusage Perl script but instead of storing collected datas on file it will dump them to the standard output. This script is used instead of the sysusage Perl script by a ssh call from a central server where the local sysusage will store the statistics retrieved from multiple servers.
/usr/local/sysusage/bin/rsysusage -r remote_hostname
Where 'remote_hostname' is the hostname given in the [REMOTE ...]
configuration section.
sysusagegraph (deprecated) / sysusagejqgraph The perl script sysusagegraph is used to draw PNG graphs and write HTML file. As he knows the pooling interval given in the configuration file it can be run at any time. I used to run it each five minutes but you can run it each hours or more this is the same.
*/5 * * * * /usr/local/sysusage/bin/sysusagegraph > /dev/null 2>&1
Since release v4.0 of SysUsage there's a JQuery plotting replacement of
rrdGraph that only write HTML files with all javascript code to allow
the client browser to draw the graphs. To enable this feature you just
have to use sysusagejqgrpah instead.
*/5 * * * * /usr/local/sysusage/bin/sysusagejqgraph > /dev/null 2>&1
There's some more resources javascript libraries and CSS files to
install. The SysUsage installer will do the job for you. This remove the
requirement of the GD, GD::Graph and GD::Graph3D Perl modules.
sysusage.cfg If you have change the default installation path (/usr/local/sysusage) you may need to give these scripts the path to the configuration file as command line argument using -c option. To know what arguments can be passed use option -h or --help.
Note that since version 3.0 the default configuration path in these
scripts is set during installation. So you may not need anymore to edit
these scripts or give the path of the configuration file as command line
argument.
See CONFIGURATION chapter for more information on howto configure your
system monitoring.
Daemon mode Crond is good for scheduling but not under the minute. If you want to monitor your system within an interval under the minute you may want to run sysusage in daemon mode. To do that, just change the INTERVAL to the desired timer in the configuration file and the DAEMON directive to 1.
Debug mode Some time things don't appear as you wanted. The best way to see what's going wrong is to run sysusage in debug mode. This mode allow you to see all values extracted from sar and other tools. Use the --debug option for that, this mode prevent sysusage to store data in the rrdfiles. Command:
/usr/local/sysusage/bin/sysusage --debug
Please, run this command and check the result before sending bug report.
Output Once sysusage and sysusagegraph are running since some cycles, run your favorite browser and take a look at the output directory. By default:
http://my.server.dom/sysusage/
If you have special URI and/or port remember to modify the URL
configuration directive without that the web interface will not works.
CONFIGURATION During installation a default configuration file sysusage.cfg is generated. The default settings are good enougth to report essential information of your system, but if you want to monitor some processes, queue directories or some devices you must edit this file by hand.
Here is the format of the configuration file and all directives. There
is three section, the first one set the general parameters of the
application, the second set the parameters related to SMTP or Nagios
notification at threshold exceed and the last configure all type of
system information you may want to monitor.
Full sample of configuration file:
[GENERAL]
DEBUG = 0
DATA_DIR = /usr/local/sysusage/rrdfiles
PID_DIR = /usr/local/sysusage/etc
DEST_DIR = /var/www/htdocs/sysusage
SAR_BIN = /usr/bin/sar
UPTIME = /usr/bin/uptime
HOSTNAME = /bin/hostname
INTERVAL = 60
SKIP = 12:00/14:00 20:00/06:00
HDDTEMP_BIN = /usr/local/sbin/hddtemp
SENSORS_BIN = /usr/bin/sensors
DAEMON = 0
GRAPH_WIDTH = 550
GRAPH_HEIGHT= 200
FLAMING = 0
HIRES = 0
LINE_SIZE = 2
PROC_QSIZE = 4
RESRC_URL =
SSH_BIN = /usr/bin/ssh
SSH_OPTION = -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
SSH_USER =
SSH_IDENTITY=
[ALARM]
WARN_MODE = 0
ALARM_PROG = /usr/local/sysusage/bin/sysusagewarn
SMTP = localhost
FROM = root@localhost
TO = root@localhost
NAGIOS = /usr/local/nagios/bin/submit_check_result
UPPER_LEVEL = 1
LOWER_LEVEL = 2
URL =
[MONITOR]
load:threshold_max_value
blocked:threshold_max_value
cpu:threshold_max_value
cswch:threshold_max_value
intr:threshold_max_value
mem:threshold_max_value
dirty:threshold_max_value
swap:threshold_max_value
work:threshold_max_value
share:threshold_max_value
sock:threshold_max_value
socktw:threshold_max_value
io:threshold_max_value
file:threshold_max_value
page:threshold_max_value
pcrea:threshold_max_value
pswap:threshold_max_value
net:threshold_max_value
tcp:threshold_max_value
err:threshold_max_value
disk:threshold_max_value
proc:proc_name:threshold_max_value:threshold_min_value
tproc:proc_name:threshold_max_value:threshold_min_value
queue:path_queue_dir:threshold_max_value
hddtemp:device:threshold_max_value
dev:device(alias):threshold_max_value
dev:device(alias):rpm_speed:raid_type:nb_disk
work:threshold_max_value
sensors:pattern:threshold_max_value
temp:device:threshold_max_value
fan:device:threshold_max_value
huge:threshold_max_value
[PLUGIN testplug]
title:Sysage Test plugin
menu:Database
enable:no
program:/usr/local/sysusage/plugins/plugin-sample.pl
minThreshold:0
maxThreshold:10
verticalLabel:Number of seconds
label1:Total seconds
label2:
label3:
legend1:seconds
legend2:
legend3:
remote:yes
[REMOTE hostname1]
enable:no
ssh_user:monitor
ssh_identity:/home/monitor/.ssh/id_rsa
#ssh_options: -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
#ssh_command:
remote_sysusage:/usr/local/sysusage/bin/rsysusage
#[GROUP Web Servers]
#hostname1
#hostname2
Section GENERAL DEBUG = 0|1 This option is used to set debug mode. If set to 1 then sysusage and sysusagegraph just show what they do but don't create or send anything.
DATA_DIR = /path/to/rrdfiles
This option is used to set te ouput directory for all RRDTOOL
database.
PID_DIR = /path/to/piddir
sysusage and sysusagegraph use a file to store the pid of the
running process to prevent simultaneous run.
DEST_DIR = /path/to/html_output
Set the path to the directory where all HTML and graph files should
be created.
SAR_BIN = /path/to/sar_binary
sysusage use sar, part of the sysstat distribution to grab system
information so we need to know where it is.
UPTIME = /path/to/uptime_binary
sysusagegraph report the current uptime of the system using the
uptime command. Used to set path to uptime binary.
HOSTNAME = /path/to/hostname_binary
All scripts of Sysusage distribution need to know the name of the
host. They use hostname command for that.
INTERVAL = pull_interval_in_second
All RRDTOOL input use the given interval in second to store
monitored values. Graph construction also use this interval to
render things properly. By default Sysusage use an interval of 60
seconds to have a better statistic report. You can change this but
it's not recommanded. If you change this adjust your crontab to the
same value. This value must between 10 and 300 seconds. If you want
to be under the minute you must use the daemon mode to run sysusage.
See DAEMON bellow.
SKIP = HH:MM/HH:MM HH:MM/HH:MM ...
You can define here some time range where monitoring will not be
done. Value is a list of begin_time/end_time separated by space or
tabulation. Let's say you don't want to monitor the host during the
night for some good reason, you can write it like that: 20:00/06:00
HDDTEMP_BIN = /path/to/hddtemp_binary
You can monitor your hard drive temperature if you have installed
hddtemp utility. We need to know the path to hddtemp binary.
SENSORS_BIN = /path/to/sensors_binary
You can monitor your device temperature if you have installed
lm_sensor utility. We need to know the path to sensors binary.
DAEMON = 0 | 1
You can monitor your system under the crond limitation of 1 minute
by running sysusage in daemon mode with an INTERVAL between 10 end
60 seconds.
GRAPH_WIDTH and GRAPH_HEIGHT
These are usefull if you want to resize graph dimension. Default is
a width of 550 pixels and a height of 200.
FLAMING
This is for fun, if you want to have random flaming effect on graphs
with only dataset set this directive to 1. Disable by default. Not
used with JQuery graph renderer.
HIRES
Allow addition of hourly graph to have fine granularity of the data.
This is disable by default. Set it to any integer between 1 to 23
hours included to show data from past N hours to now. Not used with
JQuery graph renderer as the Javascript library allow you to zoom
into the resolution you want.
LINE_SIZE
By default the graph line size is 1 if you want graph with a more
thick line set it to 2. This is rrd graph limitation (1 or 2). Not
used with JQuery graph renderer.
PROC_QSIZE
Number of simultaneous remote sysusage call process that should be
run. Default is 4 but it can be up to 15 or more depending of the
hardware configuration. One per core is the lower value you may
think about.
RESRC_URL
Images, javascripts and css ressources by default are search into
the DEST_DIR directory so that in the HTML view they all stayed on
the current main directory. You may want to place thoses resources
on an other directory or an another place. Using this directive you
can set any FQDN, absolute or relative URL for these resources.
SSH_IDENTITY
Used to set the default identity file to connect to all remote hosts
without password. If undefined, sysusage will use the ssh system
default value. You may want to use the default value unless you know
exactly what's you are doing.
SSH_OPTION
Use set the default ssh options, that correspond to a passwordless
authent:
-o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
with a five seconds connection timeout. You may want to increase
this timeout on very slow network links.
Do not change this value unless you know exactly what's you are
doing.
SSH_BIN
Path to the ssh command is set here at install time.
SSH_USER
Used to defined the default ssh user that will be used to connect to
all remote hosts.
Section ALARM WARN_MODE = 0|1 Used to disable/enable alert message during threshold exceed.
ALARM_PROG = /path/to/sysusagewarn
Used to set path to the external program responsible of sending
alarm message. You can change it to your own, just take a look at
the sysusagewarn usage to see what command line options are used by
sysusage
SMTP = smtp.server.net
Name or Ip address of the SMTP server to contact. Default is none =>
No smtp message is sent.
FROM = sender@localhost
Sender email addresse to use in the SMTP message.
TO = destination@localhost
Destination email address where the alarm message will be sent.
NAGIOS = /usr/local/nagios/bin/submit_check_result
Path to the external nsca program used to send check message to
Nagios. Setting this will activate nagios check report. See at end
of this file to see how to configure Nagios
UPPER_LEVEL = 1
Nagios check level to send when a high threshold limit is reached.
Default is 1 => WARNING.
LOWER_LEVEL = 2
Nagios check level to send when a low threshold limit is reached.
Default is 2 => CRITICAL.
URL = Url of Sysusage report
Used to overwrite the default URL of SysUsage report
http://host.dom/sysusage/ especially if you have a special port or a
different path. Example:
http://hostname.domain:9080/Reports/Sysusage/
SKIP = HH:MM/HH:MM HH:MM/HH:MM ...
You can define here some time range where alarm notice will not be
sent. Value is a list of begin_time/end_time separated by space or
tabulation. Let's say you don't want to received notice during the
night for some good reason, you can write it like that: 20:00/06:00
Section MONITOR This section has two different format the first one is used to specify most of the monitoring target:
type:threshold_max
or
type:threshold_max(attempt)
type
Type of system information you may want to monitor. It can takes
around 30 differents values:
load => monitor load average
blocked=> monitor task blocked waiting for I/O
cpu => monitor each cpu(s) user/nice/system usage
=> monitor each cpu(s) total/iowait usage
=> monitor each cpu(s) steal/guest usage
cpuall => monitor global cpu(s) statistics
cswch => monitor context switches usage
intr => monitor number of interrupt per second
mem => monitor memory usage
dirty => monitor memory active/inactive/dirty memory
share => monitore Posix share memory usage (/dev/shm)
swap => monitor swap usage
work => monitor amount of memory needed for current workload
sock => monitor number of open socket
socktw => monitor number of socket in TIME_WAIT state
io => monitor I/O request and block usage
page => monitor I/O page usage
pswap => monitor I/O page swap usage
pcrea => monitor number of process created per second
proc => monitor number of running process
tproc => monitor number of running thread
file => monitor number of open file
queue => monitor number of files in queue
net => monitor I/O network bytes on all network interfaces
err => monitor bad packet, drop and collision on interfaces
tcp => monitor number of tcp connection and segment
disk => monitor disk space usage
dev => monitor percentage of CPU time per device
=> monitor average request queue length
=> monitor I/O sectors read and write to device
=> monitor time spent in queue (await)
=> monitor time spent in servicing (svctm)
sensors=> monitor fan and device temperature using sensors command
hddtemp=> monitor disk drive temperature
temp => monitor device temperature using sar
fan => monitor fan rotation using sar
huge => monitor size of hugepages utilisation
Note: the 'cpu' target monitoring type will report all statictics
per cpu. This can represent a lot of informations if you several
cpu. To limit statistics to total cpu only, you must replace default
the 'cpu' target to 'cpuall' in your configuration file.
threshold_max
This is the maximum threshold value. Any value equal or upper
than this one will generate SMTP and/or Nagios alert if you
have enable it.
attempt
You can delay the call to the alarm program at threshold exceed by
specifying the number of consecutive exceed attempt before the
command will be called. Just specify the number of attempt between
bracket just after the min and/or max threshold value. This setting
is optional for both threshold value and the default is to send
alarm immediatly.
Specials cases
There's a special case for 'disk' usage monitoring that allow
exclusion of some mount point. This is usefull if you have hard link
or some special device you don't need to monitor. Where exclusion is
a semi- colon (;) separated list of mount point to exclude from
monitoring.
disk:ThresholdMax:exclusion
Ex: disk:90:/home/mondo_image;/home/smb_mountpoint
You can use regexp in your excluded path.
The other directive with special syntax is 'dev'. It is construct as
follow:
dev:device(alias):rpm_speed:raid_type:nb_disk
where device is sda, sdb or any device name (without the /dev/), the
alias between parenthesis is the name that must be displayed in the
user interface instead of the device name. For example:
dev:sdc(ASM disk1):
dev:sdb(/data):
I you plan to use I/O workload report, SysUsage need to know the
speed of the disk (RPM), the raid type (0,1,5,10) and the number of
disk in the raid array to calculate the IOPS. For example if we have
a 7200 RPM disk with 2 disk in raid 1, we will write thing like
that:
dev:sdc(ASM disk1):7200:1:2
I/O workload is the relation between TPS (transfers per second) and
IOPS (I/O operations measured in seconds) of a device. If the tps
returned by sysstat reach the maximum theoretical IOPS, your storage
subsystem is saturated. Here is the equation to calculate the
maximum theoretical IOPS:
d = number of disks
dIOPS = IOPS per disk
%r = % of read workload
%w = % of write workload
F = raid factor
IOPS = (d *dIOPS) / (%r + (F * %w))
the theoretical maximum IOPS for a RAID set (excluding caching of
course). To do this you take the product of the number of disks and
IOPS per disk divided by the sum of the %read workload and the
product of the raid factor and %write workload. Where %read and
%write are calculated from the following equation:
%r = rd_sec / (rd_sec + wr_sec);
%w = wr_sec / (rd_sec + wr_sec);
This IOPS monitoring is build following the excellent article of
Nick Anderson readable from Analyzing I/O performance in Linux.
The second format is used to monitor running process, hard drive
temperature or queue directory. It has the following format:
type:target:threshold_max_value:threshold_min_value
or
type:target:threshold_max_value(attempt):threshold_min_value(attempt)
type
Type of system information you may want to monitor. It can takes
these differents values:
load, cpu, cswch, intr, mem, swap, work, share, sock, socktw, io, file,
page, pcrea, pswap, net, tcp, err, disk, proc, tproc, queue, hddtemp,
dev, work, sensors, temp, fan, huge, blocked, dirty
target
If type is 'proc' or 'tproc' target represent the name of the
process to monitor. You can put a regexp as target to match exactly
the required process. The number of running process are obtain by
the system command line:
ps -e -o command | grep -E "target" | grep -v grep | wc -l
so you can replace the word target by the regexp to match and see if
it returns the right number of process.
The number of running thread are obtain by the system command line:
ps -eL -o command | grep -E "target" | grep -v grep | wc -l
If type is 'queue' this represent the full path of the directory to
monitor. Sysusage will try to find and count any regular file in the
target directory and will not follow sub directories.
If type is 'hddtemp' the target represent the hard drive device to
monitor, ex: /dev/sda. You can try it with the following command
line:
hddtemp -n /dev/sda
This may return the actual temperature detected on the hard drive.
If this is 'dev' this represent the device name to monitor. Ex: sda.
Do not add the /dev/ before this will not work. You may want to
change the device name in the graphic menu, this is possible by
adding the device alias enclosed with parenthesis.
For example lets say you're monitoring some EMCpower SAN device.
Using sar the reported devices are dev120-48 and dev120-64. Once you
have find what partition are mapped to these devices (reading
/proc/partitions). In this example these devices are mounted as
/cache1 and /cache2 so we want to see these mount points instead of
device number in the graphical menu:
dev:dev120-48(/cache1):90
dev:dev120-64(/cache2):97
in you sysusage.conf file will do the job. The threshold_max value
is the max percentage of CPU used for this device before sending an
alarm.
If type is 'sensors' this represent the pattern to match to obtain
temperature or fan speed information in the sensors program output.
See chapter SENSORS to have more information.
If type is 'temp' or 'fan' this represent the device number reported
by sar to obtain temperature or fan speed information. To know what
device number must be used, see result of command: sar -m ALL 1 1
threshold_max
This is the maximum threshold value. Any value equal or upper will
generate an SMTP and/or Nagios alert if you have enable it.
threshold_min
This is the minimum threshold value. Any value equal or lower of
this one will generate SMTP and/or Nagios alert if you have enable
it. Min threshold should certainly only be used with 'proc' and
'tproc' monitoring type. If you set it to 0 then you will be warn if
any of the monitored process are down.
attempt
You can delay the call to the alarm program at threshold exceed by
specifying the number of consecutive exceed attempt before the
command will be called. Just specify the number of attempt between
bracket just after the min and/or max threshold value. This setting
is optional for both threshold value and the default is to send
alarm immediatly.
For example a load average monitoring defined like this
load:12(3)
will send an alarm when the system load average will exceed 12 after
three consecutives attempts at the define interval. If the interval
is 60 seconds, the alarm will be sent up to 180 second after the
first exceed.
Section PLUGIN This part enable the use of custom plugins. You can call any program or script provide that it return up to 3 numbers separated by a space character. See plugins/ directory for sample scripts.
This section must include a name composed of any alphanumeric character
that will be used to create the target file, for example:
[PLUGIN testplug1] or [PLUGIN testplug2]
The section allow the following configuration directives. They are
composed of named directives followed by ':' or '=' and a value.
enable
Is used to disable temporary the plugin monitoring. Default is 'yes'
enable. To disable write it enable:no
program
Is used to set the path to the program or script to execute as
plugin. This program must print to STDOUT 1 to 3 numbers separated
by a space character as result following the number of reports you
want. So each plugin can have 1, 2 or 3 graphed data.
title
Is used to set the title of the report page and the index link.
Default is set to "Sysusage plugin".
menu
Is used to store the plugin under a submenu of the plugins menu.
Default is to store plugin under the "Others" submenu.
maxthreshold
This is the maximum threshold value. Any value equal or upper than
this one will generate SMTP and/or Nagios alert if you have enable
it.
minthreshold
This is the minimum threshold value. Any value equal or lower of
this one will generate SMTP and/or Nagios alert if you have enable
it.
verticallabel
This is used to set the vertical label of the graph.
label1, label2, label3
Are used to show a legend for each graphed data, label1 is for the
first returned value, label2 for the second and label3 for the last.
If you just have one value returned just omit the other labels.
legend1, legend2, legend3
These are use to set the units for Current, Avg and Max values.
remote
This directive must be set to 'no' to prevent execution of the
plugin program by a issh call to sysusage in a remote context. This
directive is activated by default ('yes').
Section REMOTE This part allow to run sysusage on remote hosts from a central server. It use ssh to execute sysusage on the destination host with the -r option that force sysusage to not write anything to local data files but to print all result to stdout. As sysusage is run by cron job or daemon mode it can not authenticate interactively to remote host so you must give a ssh user and an identity file with the corresponding configuration option.
This section must include the name or the ip address of the remote host
that will be used to create the target data directory, for example:
[REMOTE hostname] or [REMOTE host.domain.dom] or [REMOTE 192.168.1.14]
The section allow the following configuration directives. They are
composed of named directives followed by ':' or '=' and a value.
Once you have installed sysusage on all remote host and exchange the SSH
key certificat between the central host and all remote hosts, most of
the time you just have to set the ssh_user directive to have it working.
Use remote_sysusage directive if sysusage perl script is not installed
on the same place than the central server.
Section GROUP This section allow you to groups remote host report under a common groupname in the index page. Remote hosts will be ordered following their parent groups. The name of the group can be any string and the values in the section must be a list of remote servers defined in the REMOTE sections.
For example if you are monitoring a cluster of web and database servers
you can use the following declaration:
[GROUP Web Servers]
webhost1
webhost2
webhost3
[GROUP Database Servers]
dbhost1
dbhost2
Of course webhostN and dbhostN hosts must be declared in the remote
section.
enable
Is used to enable/disable the remote host monitoring. Default is
'yes' enable. Set it as 'enable=no' to disable it.
ssh_user
Used to defined the ssh user allowed to connect to remote host. By
default the value set to SSH_USER configuration option in the
GENERAL section will be used.
ssh_identity
Used to set the identity file to connect to remote host without
password. By default the value set to SSH_IDENTITY configuration
option in the GENERAL section will be used. Usually this is the
private key that you've generated using ssh-keygen and most of the
time file $HOME/.ssh/id_rsa. You may want to use the default value
unless you know exactly what's you are doing.
ssh_options
Use to overwrite the default ssh options, that are:
-o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
The default options are set into the SSH_OPTIONS configuration
option in the GENERAL section. You may want to use the default value
unless you know exactly what's you are doing.
ssh_command
You can overwrite the complete ssh command using this directive,
this will replace the ssh command, the ssh option, the ssh user and
the host part. The sysusage remote command will not be replaced. You
may want to use the default value unless you know exactly what's you
are doing.
remote_sysusage
Use it to set the path to the rsysusage command that must be used on
the remote host, SysUsage will automatically add the -r option to
cause the remote execution mode.
THRESHOLD NOTIFICATION SMTP alert Sysusage use an external perl script to send SMTP alert and/or Nagios checks when a max or min threshold is reached. This program is named sysusagewarn. All options of the configuration file in section [ALARM] are use by sysusage to call this program. If they are correctly set you don't have to take care of the parameters given to this program. If you want to use this program outside sysusage, here are the command line options it understand:
Usage: sysusagewarn -t subject -c current_value -v threshold_value
[-s smtp_srv] [-f from] [-d to] [-b hostname_prog]
-t subject : Subject of the alarm
-c value : Current value monitored by sysusage
-v value : Threshold value used.
-s host : SMTP server name or ip where to send email.
-f from : Sender email address of the alarm message.
-d to : Destination address of the alarm message.
-b path : Path to program hostname. Default is /bin/hostname
-n path : Path to Nagios program submit_check_result. Default none.
-l value : Alarm level (0=OK,1=WARNING,2=CRITICAL). Default: 1.
-r service : Nagios service name to used. Must be any sysusage type of
monitoring defined in the configuration file.
-u url : Url to HTML sysusage output to include in email.
Default: http://hostname.domain/sysusage/
-h : Output this message and exit
NAGIOS alert SysUsage send check message to Nagios through an external command (submit_check_result). So you need to create the host and associate all sysusage service that you want to monitor with Nagios. The services name correspond to the type of monitoring. For example, if you have enable alarm on memory usage the service sent is 'mem'. There's also specials case with type of monitoring with multiple instance like network monitoring. You need to create a service per instance. For example type 'net' will have 'net_eth0' and 'net_lo' and more if you have more network interface. To see if your sysusage alarm messages are well understood by Nagios take a look at the nagios.log file (default to /usr/local/nagios/var/nagios.log).
To desactivate automatically an alarm reported to Nagios, SysUsage will
send each time it run an OK request if every thing is correct for the
monitored type.
SENSORS Monitoring of sensors output is based on regexp. To be clear enought here an example:
Sensors output on my server:
adt7463-i2c-0-2d
Adapter: SMBus I801 adapter at 1480
V1.5: +3.23 V (min = +0.00 V, max = +3.32 V)
VCore: +1.24 V (min = +1.10 V, max = +1.49 V)
V3.3: +3.33 V (min = +2.80 V, max = +3.78 V)
V5: +4.99 V (min = +4.25 V, max = +5.75 V)
V12: +0.11 V (min = +0.00 V, max = +15.94 V)
CPU_Fan: 0 RPM (min = 0 RPM)
fan2: 10671 RPM (min = 8095 RPM)
fan3: 0 RPM (min = 0 RPM)
fan4: 0 RPM (min = 0 RPM)
CPU Temp: +69.5 C (low = +2.0 C, high = +91.0 C)
Board Temp: +32.5 C (low = +2.0 C, high = +83.0 C)
Remote Temp: +31.2 C (low = +2.0 C, high = +58.0 C)
cpu0_vid: +1.338 V
adt7463-i2c-0-2e
Adapter: SMBus I801 adapter at 1480
V1.5: +3.21 V (min = +0.00 V, max = +3.32 V)
VCore: +1.28 V (min = +1.10 V, max = +1.49 V)
V3.3: +3.32 V (min = +2.80 V, max = +3.78 V)
V5: +4.95 V (min = +0.00 V, max = +6.64 V)
V12: +0.11 V (min = +0.00 V, max = +15.94 V)
CPU_Fan: 10843 RPM (min = 8095 RPM)
fan2: 0 RPM (min = 0 RPM)
fan3: 9642 RPM (min = 8095 RPM)
fan4: 0 RPM (min = 0 RPM)
CPU Temp: +57.2 C (low = +2.0 C, high = +91.0 C)
Board Temp: +35.2 C (low = +2.0 C, high = +91.0 C)
Remote Temp: +35.8 C (low = +2.0 C, high = +58.0 C)
cpu0_vid: +1.338 V
Following the sensors kernel module load you could have more or less
output than that. To monitor all sensors CPUs temperature on my server I
need to add the following lines into sysusage.cfg:
sensors:CPU Temp:75
sensors:Board Temp:45
sensors:Remote Temp:45
This will create 3 graphs based on lines matching 'CPU Temp', an other
with lines matching 'Board Temp' and the last with lines matching
'Remote Temp'. As I have 2 CPUs for each graph there will be 2 values.
You can not report more than 3 values per graph, this is hard coded into
sysusage. So if you have more CPUs you will not see more than 3 values.
Here it will sent alarm when temperature exceed the given values
(75,45,45).
To monitor fan speed, I just add lines like this in the configuration
file:
sensors:fan2:11000:8095
sensors:fan3:11000:8095
This whil create 2 graphs for fan 2 and fan 3. With an alarm sent when
speed exceed 11000 RPM or is lower than 8095 RPM.
On my personal computer (/etc/sysconfig/lm_sensors => modprobe coretemp)
sensors output is:
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +53.0 C (high = +78.0 C, crit = +100.0 C)
coretemp-isa-0001
Adapter: ISA adapter
Core 1: +50.0 C (high = +78.0 C, crit = +100.0 C)
To monitor CPU temprature, I just add this line in my sysusage.cfg:
sensors:Core:70
This will generate a graph with 2 graphed data for Core 0 and Core 1.
Now that sysstat sar natively reports deviceis temperature and fan speed
you don't need sensors anymore. Type 'temp' can be used instead and type
'fan' for the fan speed. The target of these types is the device number,
See sar -m TEMP or sar -m FAN to see which device number to monitor.
BUGS / FEATURE REQUEST Please report any bugs, remarqs and feature request using the Github interface at https://github.com/darold/sysusage/ or send a mail to the author.
LICENSE Copyright (C) 2003-2018 Gilles Darold
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 3 of the License, or any later
version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
AUTHOR Gilles Darold <gilles |At| darold |DoT| net>
ACKNOWLEGMENT I want ot thanks all the people who help to build this tool with a very special thank to Marat Dyatko for the web design contribution.