Back to Article Listing

symon - System Monitor

Overview

I wanted a simple system monitoring service for some boxen in a colocated datacenter. Some of the qualities I am looking for are

  • a single place to gather stats for a small LAN
  • display graphs showing measures over time
  • a small footprint
  • easy to set up

The type of stats I wanted, in no particular order, are

  • firewall stats
  • voltage
  • temperature
  • fan speeds
  • cpu utilization
  • interface traffic
  • memory usage
  • etc.

I know the operating system can put these data out by request. top, sysctl, pfctl, systat, netstat, etc all show good stuff. But, I wanted an easy way to capture it. And I really did not want to parse those outputs.

Writing my own would be a waste of time.

There are some snappy utilities that you can use on your desktop like Conky, GKrellM, or SuperKaramba. While these are flashy, they do not fit well into a remote network of servers.

Having a look around, I found some pretty cool packages. DarkStat is a fun packet sniffer that I like and actually sometimes use it on my home network. But it does not have everything I need. I looked at the venerable Nagios. Wow.. that is a lot of work; and really overkill for what I needed. I also spent some time looking at Zenoss and Cacti. These are nice packages, but again; overkill for my needs. If I was monitoring a large installation, these would be considered.

Part I - collecting data

symon - the system

The package that really fit my needs the best was symon. It was like his guy, Willem Dijkstra in the Netherlands, was reading my mind when he wrote his package.

From the website:

symon is a system monitor for FreeBSD, NetBSD, OpenBSD and Linux. It can be used to obtain accurate and up to date information on the performance of a number of systems.

  • symon - lightweight system monitor. Can be run with privleges equivalent to nobody on the monitored host. Offers no functionality but monitoring and forwarding of measured data.
  • symux - persists data. Incoming symon streams are stored on disk in rrd files. symux offers systems statistics as they come in to 3rd party clients.

To restate above:

The package is split into distinct parts: the client (symon) just pulls system stats and forwards it on to server (symux) that stores the data.

Simple, light weight, easy... just what I wanted.

Data flow

Each host to be monitored will run symon client. This will collect all the stats and send them through the network (port 2100) to the central log host. (the log host can also run a symon client.) The central log host will listen (on port 2100) for streams from specific clients.

Flow diagram (to be replaced with an image)

+-------------+
|    host 1   |
|   (symon)   |
+-------------+
      |             +------------------+
      |             | central log host |
   [switch] --------|     (symon)      | ------> writes out to rrd files
      |             |    and symux     |
      |             +------------------+
+-------------+
|    host 2   |
|   (symon)   |
+-------------+

Each client can collect and transmit different measures. e.g. On the firewall you may want to collect the packet filter stats and on the web server you may want to track the httpd process for cpu percent. But possibly not vice-versa.

Important

You may need to open port 2100 behind the firewall (in the private LAN or DMZ) to allow traffic between the hosts. It is not advisable to open that port on public IPs; unless it is made private via IPSec or similar VPN.

symon - the client

/etc/symon.conf

monitor { cpu(0), mem, pf, mbuf, proc(httpd),
            if(em0), if(em1), io(sd0),
            sensor(0), sensor(1), sensor(2)
} stream to loghost.mydomain.com 2100

On each client you will need to decide what you want to monitor. Above, I have chosen a subset of what is available. Each of the indicators above are collections of measures. e.g. When you ask symon to collect "mem", you are actually asking it to collect several ( real_active, real_total, free, swap_used, swap_total ). You may not think you need all five. If you are only concerned with one of the mem measures, (say.. swap_used), then just show that graph and ignore the rest of the data. Regardless, all five data elements will still be collected. Down the road you may become concerned with real_active. Since it has been collected all along, you can get a full graph of history when you activate that graph.

"monitor" is the command word and the collection sets are in curly brackets. Each is explained in the man pages, but here are some highlights.

  • cpu - these are numbered, this sample machine only has one
  • mem - all memory stats
  • pf - all packet filter stats
  • mbuf - all mbuf stats (more than a dozen of them)
  • proc - monitor the process passed in the parens, in this case, httpd
  • if - network stats for the interface passed in parens, in this case, there are two
  • io - io stats for the drive passed in parens, in this case the first scsi drive, sd0
  • sensor - any sensors that may be available in the system

The last bit in the symon.conf is another command "stream to" where you include an IP or FQDN and a port number. 2100 is the default on my installation.

Important

Sensors

The sensors available will be specific to the hardware and the operating system's ability to retrieve them. As an example, a recent model Dell server has many, many sensors; and incorporates IPMI (Intelligent Platform Management Interface). The OpenBSD operating system is able to gather a LOT of stats from it.

$ sysctl hw.sensors

hw.sensors.ipmi0.temp0=44.00 degC (Temp), OK
hw.sensors.ipmi0.temp1=50.00 degC (Temp), OK
hw.sensors.ipmi0.temp2=25.00 degC (Ambient Temp), OK
hw.sensors.ipmi0.temp3=35.00 degC (Planar Temp), OK
hw.sensors.ipmi0.temp4=28.00 degC (Riser Temp), OK
hw.sensors.ipmi0.temp5=40.00 degC (Temp), UNKNOWN
hw.sensors.ipmi0.temp6=40.00 degC (Temp), UNKNOWN
hw.sensors.ipmi0.temp7=50.00 degC (Temp), OK
hw.sensors.ipmi0.temp8=50.00 degC (Temp), OK
hw.sensors.ipmi0.temp9=23.00 degC (Ambient Temp), OK
hw.sensors.ipmi0.temp10=40.00 degC (Planar Temp), OK
hw.sensors.ipmi0.temp11=40.00 degC (Riser Temp), OK
hw.sensors.ipmi0.temp12=40.00 degC (Temp), UNKNOWN
hw.sensors.ipmi0.temp13=40.00 degC (Temp), UNKNOWN
hw.sensors.ipmi0.fan0=1800 RPM (FAN 1 RPM), OK
hw.sensors.ipmi0.fan1=4950 RPM (FAN 2 RPM), OK
hw.sensors.ipmi0.fan2=4950 RPM (FAN 3 RPM), OK
hw.sensors.ipmi0.fan3=4875 RPM (FAN 4 RPM), OK
hw.sensors.ipmi0.fan4=4950 RPM (FAN 5 RPM), OK
hw.sensors.ipmi0.fan5=4800 RPM (FAN 6 RPM), OK
hw.sensors.ipmi0.fan6=1800 RPM (FAN 1 RPM), CRITICAL
hw.sensors.ipmi0.fan7=1800 RPM (FAN 2 RPM), CRITICAL
hw.sensors.ipmi0.fan8=1800 RPM (FAN 3 RPM), CRITICAL
hw.sensors.ipmi0.fan9=1800 RPM (FAN 4 RPM), CRITICAL
hw.sensors.ipmi0.fan10=1800 RPM (FAN 5 RPM), OK
hw.sensors.ipmi0.fan11=1800 RPM (FAN 6 RPM), OK
hw.sensors.ipmi0.fan12=1800 RPM (FAN 7 RPM), OK
hw.sensors.ipmi0.fan13=1800 RPM (FAN 8 RPM), OK
hw.sensors.ipmi0.volt0=3.16 VDC (CMOS Battery), OK
hw.sensors.ipmi0.volt1=3.10 VDC (CMOS Battery), OK
hw.sensors.ipmi0.indicator0=On (Status ), OK
hw.sensors.ipmi0.indicator1=On (Status ), OK
hw.sensors.ipmi0.indicator2=Off (Intrusion), OK

So, when addressing an IPMI sensor in the symon.conf you need to write it like: "sensor(ipmi0.temp0),sensor(ipmi0.temp1),sensor(ipmi0.temp2)" not like sensor(0),sensor(1)... etc.

symux - the server

/etc/symux.conf

source 192.168.10.125
{
        accept { cpu(0), mem, pf, mbuf, proc(httpd),
                if(em0), if(em1), io(sd0),
                sensor(0), sensor(1), sensor(2)
                }

        write cpu(0)        in "/var/data/symon/192.168.10.125/cpu0.rrd"
        write mem           in "/var/data/symon/192.168.10.125/mem.rrd"
        write mbuf          in "/var/data/symon/192.168.10.125/mbuf.rrd"
        write proc(httpd)   in "/var/data/symon/192.168.10.125/proc_httpd.rrd"
        write if(em0)       in "/var/data/symon/192.168.10.125/if_em0.rrd"
        write if(em1)       in "/var/data/symon/192.168.10.125/if_em1.rrd"
        write io(sd0)       in "/var/data/symon/192.168.10.125/io_sd0.rrd"
        write sensor(0)     in "/var/data/symon/192.168.10.125/sensor0.rrd"
        write sensor(1)     in "/var/data/symon/192.168.10.125/sensor1.rrd"
        write sensor(2)     in "/var/data/symon/192.168.10.125/sensor2.rrd"
}

The server (symux) will be listening, and instructed to accept, the exact same collections sent from a specific machine. This allows you to set up different measures for different machines.

symux will accept the data stream and write out what is collected in rrd (Round Robin Database) format.

I have shown the write directive, which allows me to explicitly state which file I want the data written to. This becomes important to me later on when I am creating graphs. An alternate is to delete all those write statements and just list a directory. symux will use default file names: datadir "/var/www/data/symon/192.168.10.125"

Before starting

rrd files need to be created before symux can write to them. A script was included with the software.

/usr/local/share/symon/c_smrrds.sh

Create rrd files for symux.

Usage: c_smrrds.sh [oneday] [interval <seconds>] [all]    <rrd files>

Where:
oneday       = modify rrds to only contain one day of information
seconds      = modify rrds for non standard monitoring interval
all          = run symux -l to determine current configured rrd files
<rrd files>  = files ending in rrd that follow symux naming

The 'all' flag was pretty useful.

Ready to go

  • configure each client
  • configure the server
  • make sure the chosen port is open in firewall
  • create the rrd files
  • add program to server startup and shutdown scripts
  • run

You now have each client collecting gobs of facinating data and forwarding it to a central log server to be stored.

And that is it, symon is finished.

Part II rrd --> web pages

Setting up the collection and storage of system information is pretty easy with symon. Creating pretty graphs is a whole other story.

symon does offer their own tool, but I do not want to use PHP on my server.

syweb - draws rrdtool pictures of the stored data. syweb is a php script that can deal with chrooted apaches. It can show all systems that are monitored in one go, or be configured to only show a set of graphs.

I will just use static html and shell scripts to construct the graphs.

Directory structure

The first thing I did was get my directory structure in place.

/var/data/symon/            <-- this is where symux dumps the rrd data
    |-- 192.168.10.125
    |    |-- cpu0.rrd
    |    |-- if_em0.rrd
    |    |-- if_em1.rrd
    |    |-- io_sd0.rrd
    |    |-- mbuf.rrd
    |    |-- mem.rrd
    |    |-- proc_httpd.rrd
    |    |-- sensor0.rrd
    |    |-- sensor1.rrd
    |    `-- sensor2.rrd
    |
    |-- 192.168.10.126
    `-- 192.168.10.127

/var/www/private/reporting
    |-- symon               <-- this is where the final graphs go, for private viewing
    |    |-- 192.168.10.125
    |    |-- 192.168.10.126
    |    `-- 192.168.10.127
    `-- webstats            <-- just to show other familiar reports
         |-- 192.168.10.125
         |-- 192.168.10.126
         `-- 192.168.10.127

RRDtool

If you have ever dealt with rrd files and tools, then you know it is a magical science. There is a defacto way to work with them...

"RRDtool is the OpenSource industry standard, high performance data logging and graphing system for time series data."

It is a good idea to read thier documentation. I would also recommend reviewing the c_smrrds.sh script that comes with symon, to understand how the database files are created and seeded.

A while back I was debating how to actaully draw my graphs from the data collected by symon. I reviewed a lot of very able software packages.

From the RRDtool website, projects using RRDtool:

"A whole ecosystem of tools have sprung up around rrdtool. From tiny add-ons to big applications or even replacements for rrdtool itself."

I checked out a lot of them and many seemed like great tools. But, my needs were pretty simple. Maybe I did not need an ecosystem.

  • create one small graph for each measure - for summary page
  • create one large graph for each measure - for detailed view
  • ability to customize graph types (line, area, stack, etc)
  • ability to customize colors
  • ability to cron the whole thing, say.. every 5 minutes.

And some things I probably do not need or would not seriously use

  • did not need to drill down by network subnet, machine role, etc
  • did not need hourly, daily, weekly, monthly, annual views of the same data
  • did not need interactive graph building (but a neat idea)

symon-graph.sh

Fortunately, I had a trusted and admired colleague who had just tackled the same issue. Many, many, many thanks to Okan for getting me started. I believe he has moved on to a better solution, but, I still use a variant of that beautiful hack he showed me. As noted above when talking about the symon 'write directive', this script depends on using standarized rrd file names.

There is one script, "symon-graph.sh", that does it all.

  1. at the top are some default variables to be used for all images

  2. then, a collection of function calls for each possible rrd file that may be encountered. (explained below)

  3. At the bottom is the action code: 'for each host listed in a text file...' [1]
    • check to make sure the target directory exists
    • remove any existing graphs from the target directory
    • if specific rrd file exists (e.g. /var/data/symon/192.168.10.125/cpu0.rrd)
    • ..call the correct function to create the graphs

So, if the directory does not contain the cpu0.rrd file (you have chosen not to collect that data) then nothing is done, and the next code block would be evaluated.

One of defaults that is set in the top section is -345600. That means "go back 4 days worth of seconds" ( 60 x 60 x 24 x 4). This is plenty of history for me to see current values in context.

Here is a heavily snipped version. For brevity, I have left one function call (graphcpu) and one reference to that call.

#!/bin/sh

# ----------------- set variables and defaults -------------------------
rrdtool="/usr/local/bin/rrdtool graph"
rrd_dir=/var/data/symon

png_base=$rrd_dir/graphs
rrd_common="--alt-autoscale --imgformat PNG --start -345600"
width_l=1000
width_s=350
height_l=500
height_s=125

# ----------------- function calls to create graphs ---------------------
graphcpu() {
        rrd=$1
        size=$2
        echo "\t cpu ($size): user, nice, system, interrupt, idle"

        image=$png_dir/cpu_${size}.png
        if [ X"$size" == X"large" ]; then
                size="--width $width_l --height $height_l"
        elif [ X"$size" == X"small" ]; then
                size="--width $width_s --height $height_s"
        fi

        $rrdtool $rrd_common $size $image \
                --title "CPU on $host" -v "Percent" \
                DEF:A=$rrd:user:AVERAGE \
                DEF:B=$rrd:nice:AVERAGE \
                DEF:C=$rrd:system:AVERAGE \
                DEF:D=$rrd:interrupt:AVERAGE \
                DEF:E=$rrd:idle:AVERAGE \
                AREA:A#00FF00:"user" \
                STACK:B#00FFFF:"nice" \
                STACK:C#DDA0DD:"system" \
                STACK:D#9932CC:"interrupt" \
                STACK:E#F5FFFA:"idle" \
                >/dev/null
        return
}

        .... [snip other function calls]...

# ----------------- cycle thru directories looking for rrds ------------
for host in $(grep -v ^\# $rrd_dir/.symondb); do
        echo "building for $host ..."
        png_dir=$png_base/$host
        test -d $png_dir || mkdir -p $png_dir
        rm -f $png_dir/*.png

        if [ -f $rrd_dir/$host/cpu0.rrd ]; then
                graphcpu $rrd_dir/$host/cpu0.rrd large
                graphcpu $rrd_dir/$host/cpu0.rrd small
        fi

        .... [snip other if blocks]...

done
exit 0
[1]This little file is used in the 'for host in..' above; and other uses not presented here. This method is much easier when adding/removing hosts than having the above script scouring the file system. I tell it where to look.
$ cat .symondb
192.168.10.125
192.168.10.126
192.168.10.127

creating the graph

In the graphcpu() function above requires insight into RRDtools and symon. To find out exactly what symon has collected about the cpu, see the man page.

Data formats: cpu

Time spent in ( user, nice, system, interrupt, idle ). Total time is 100, data is offered with precision 2.

Note that the total will add up to 100. This means a stack style graph can be used and will be a uniform height, visually showing changes in each. The way you make a stack is to create an 'area' and 'stack' other values on top. Of course you can do a line graph, or whatever you want.

Now that we know what data elements are inside the rrd file, we can set about creating the small and large graphs.

The function is called in the 'for host in.. ' loop and passed 2 variables: the data file to use, and the size of the graph to make. I only request 2 sizes. First, a description is echo'd out (mostly for debugging) then the particulars about the image size are constructed. This information will be used in the RRDtool call, just after it.

The rrdtool statement is complex and reading the documentation is required. But, here is a quick overview, using the graphcpu() function.

$rrdtool $rrd_common $size $image \
        --title "CPU on $host" -v "Percent" \
        DEF:A=$rrd:user:AVERAGE \
        DEF:B=$rrd:nice:AVERAGE \
        DEF:C=$rrd:system:AVERAGE \
        DEF:D=$rrd:interrupt:AVERAGE \
        DEF:E=$rrd:idle:AVERAGE \
        AREA:A#00FF00:"user" \
        STACK:B#00FFFF:"nice" \
        STACK:C#DDA0DD:"system" \
        STACK:D#9932CC:"interrupt" \
        STACK:E#F5FFFA:"idle" \
        >/dev/null
  • rrdtool="/usr/local/bin/rrdtool graph" So, we are calling the graph function of rrdtool.
  • rrd_common="--alt-autoscale --imgformat PNG --start -345600" We want to autoscale the graph Y axis; output in png format; and we want to go back 4 days.
  • the $size and $image file name were determined above and passed in
  • we give the graph a --title on top a -vertical on the left side
  • we DEFine the data elements to look for, inside the rrd file, and assign them names. Single capital letters are really easy to use. (A,B,C,D...) Note that it is in these assignments that we use the symon data element names from the man page. Then we say we want the AVERAGE consolidation function
  • next we create the AREA repesented by the data 'A'; give it a color; and a title for the legend at the bottom
  • then we STACK on top of the area, using the same syntax

The 'for host in..' loop calls the graphcpu() function once for the small and once for the large image. This is my preference. The main reporting web page will show all the small images. When you click on a small image, the link takes you to the large image that is the exact same thing, but blown up so you can see more detail.

That was just an overview of one function. This needs to be repeated for each collection that symon is storing; for each rrd file. e.g. graphmem(), graphsensor(), graphproc_httpd(), etc.

Tie it all together

So far:

  1. symon collects system data on each machine and sends it to the central log host
  2. the log host writes all streams from the network to rrd files in specific places
  3. a cron job on the log host calls a script that runs over those rrd files and creates graphs

The last thing needed is a landing page and an index page for each host.

/var/www/private/reporting/index.html
    +---------------------+
    | Domain Reporting    | ../reporting/symon/{host}/index.html
    |                     |     +-----------------+
    | symon:              |     | symon: host 125 |
    |   192.168.10.125 => | --->|   (sm graphs)   |  => lg graph
    |   192.168.10.126 => | _   +-----------------+
    |   192.168.10.127 => |  |
    |                     |  |  +-----------------+
    | webstats:           |  |  | symon: host 126 |
    |   192.168.10.125 => |  `->|   (sm graphs)   |  => lg graph
    |   192.168.10.126 => |     +-----------------+
    |   192.168.10.127 => |
    +---------------------+

The landing page is pretty simple, just links to the info pages; by host. Here is a simplified sample of a host page. Note that because the symon-graph.sh script drops the images within the given directory, the images are locally referenced.

/var/www/private/reporting/192.168.10.125/index.html

<!DOCTYPE blah, blah>
<html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>symon 192.168.10.125</title>
</head>

<body>
<h1>Host: 192.168.10.125</h1>

<h2>cpu and mem</h2>
<a href="cpu_large.png"><img src="cpu_small.png"></a>
<a href="mem_large.png"><img src="mem_small.png"></a>

<h2>if</h2>
<a href="if_em0_large.png"><img src="if_em0_small.png"></a>
<a href="if_em1_large.png"><img src="if_em1_small.png"></a>

<h2>io</h2>
<a href="io_a_sd0_large.png"><img src="io_a_sd0_small.png"></a>
<a href="io_b_sd0_large.png"><img src="io_b_sd0_small.png"></a>

<h2>mbuf</h2>
<a href="mbuf_alloc_large.png"><img src="mbuf_alloc_small.png"></a>
<a href="mbuf_pages_large.png"><img src="mbuf_pages_small.png"></a>


<h2>proc</h2>
<a href="proc_httpd_large.png"><img src="proc_httpd_small.png"></a>

<h2>sensors</h2>
<a href="sensor0_large.png"><img src="sensor0_small.png"></a>
<a href="sensor1_large.png"><img src="sensor1_small.png"></a>
<a href="sensor2_large.png"><img src="sensor2_small.png"></a>

</body>
</html>

Proof of Concept

This may seem like a lot of work for silly images, but once it is all set up, you are done. And, it will be updated every 5 minutes via cron job... forever. To prove its usefulness, here is an image of the cpu on a box that had a run-away process.

/images/cpu_small.png

Late on a Friday night, something went nutso. After coffee on Saturday morning, while I was doing some routine checks on my network, I found this. (I had a pretty good idea what it was) I logged onto the box; found the problem; and fixed it. Since then, the graph shows what a 'normal' load should look like... yes, a little under utilized despite symon, symux, and creating all those graphs every 5 minutes.



Copyright © 20070731 genoverly
(db datestamp: 20070731)

nautical_flag_icon
Copyright © 2003-2015 genoverly