This file should be a detailed description of the config files. Unfortunately it's not finished yet ... Please have a look at the sample configuration files rules.pl, uxmon-rules.pl and bb-display.cfg
This file contains the configuration for the Big Sister Status Collector 'bbd'. Its name is bb-display.cfg since it originally did merely describe what html pages bbd was supposed to maintain. The file contains a set of one-line statements beginning with '%'. A few statements (e.g. %Groups) expect multi-line argument data. In this case the lines between this statement and the next line starting with '%' is treated as arguments. Lines starting with '#' are silently ignored. Statements: %Option [+|-]option1 ... Arguments: optionN: option name Set Status Collector options. If prefixed with a '-' switch the option off, if prefixed with '+' switch it on. If no prefix is given '+' is assumed. Known options: ImmediateHTML (default: off) write HTML status files immediately on status receipt no matter if the status text has changed or not. KeepGroups (default: off) read saved grouping information on startup and whenever the configuration changes - do not lose dynamic group information NewLog (default: on) Log all incoming agent messages to www/logs/history/status.log ("new log mechanism") BBLog (default: on) Log incoming status messages to www/logs/*.* files ("BB compatible logging") %Autoconn host1 ... hostn Arguments: hostx: host name NOTE: the host name must be resolvable! Tell bbd to automatically set the status of host.conn to 'green' each time a connection is coming in from this host. Set status to 'red' (not to 'purple') if no report for >15 min. %Autojoin what group Arguments: what: either 'new', 'all' or 'all_hosts' group: a group name %Autojoin new GROUP tells bbd to automatically put any newly appearing host into the group 'GROUP' ("newly appearing" means that bbd is receiving status messages for a host not yet known) %Autojoin all_hosts GROUP tells bbd to automatically put every known host into group 'GROUP' (where "host" means every object bbd is receiving status messages for) %Autojoin all GROUP tells bbd to automatically put every known object - hosts and groups - into group 'GROUP' %Pager cmd Arguments: cmd: the name of a program that should be invoked when bbd gets a page request ... The %Pager command is provided for compatibility with Big Brother. Big Brother clients use to send pages directly via their Display Server. Usually - in a Big Brother environment - you will use %Pager like: %Pager /usr/where-bb-is/bin/bb-pager.sh %Groups Arguments: none The %Groups statement is followed by a number of lines with the syntax: name(Display name) GROUP1 ... GROUPN The meaning is: 'name' will be printed as 'Display name' when appearing on a html page and belongs to the groups GROUP1 ... GROUPN. The group definitions are recursive. This means that e.g.: host1(Computer 1) GROUP1 GROUP2 GROUP1(Group of computers) GROUP3 is valid and means that the reported status values for host1 will influence not only GROUP1 and GROUP2 but also GROUP31. %Page name title template Arguments: name: the file name of the created page. ".html" is automatically appended if the name is not ending in ".htm" or ".html". The file is created in the directory 'www' title: the title of the created page template: the template file used (default is 'template.proto'. The file is expected to be in the 'www' directory. Note: Since release 0.29 'template' is obsolete and should not be used any more. Rather use the 'skin' mechanism as described below The %Page statement is usually followed by one or more statements describing the contents of the page (e.g. %table). The template file is a html file which contains variable references. 'bbd' does - when creating a page - read in the template file, replace the variables by their values and write the result to the right file. Valid variables names are: @TITLE@ - The page title @BGROUND@ - the URL to the background graphics file @EXPIRES@ - the time "now+5 minutes" when the page contents will expire @TIME@ - the current time in human readable format @TEXT@ - the page text generated by 'bbd' %skin skin1 ... skinN Arguments: list of skin names Use the skinset skin1 thru skinN to describe the look of created pages. See the www/skins directory for valid skin names and www/skins/*/README files for what the respective skins are meant to introduce. Most of the skins are incomplete - means they add certain details to another skin. Therefore you can list more than one skin in one %skin statement. The skins are treated in order - later listed skins taking precedence over earlier listed ones. The 'default' skin is always the first skin included. E.g.: %skin default white_bg static_lamps will force Big Sister to create pages with the default look but with a white background rather than the variable one and static instead of blinking lights. Since the default skin is always implicitly included it's exactly the same as: %skin white_bg static_lamps %Logskin skin1 ... skinN Arguments: like %skin %Logskin is similiar to %skin, but it does define the look of log pages (the pages where the current status of a monitor is shown) %title title Arguments: title This statement should only be used within %Page statements. It tells 'bbd' which title should be used for the tables generated by following %table statements. If 'title' is set to 'auto' the display names as defined in the %Groups section are used. %refto url (see also below) Arguments: url This statement should only be used within %Page statements. When creating tables 'bbd' will usually create hypertext links for hosts or groups appearing in the table (left row). This is done by appending a "#" charakter plus the name of the host or group to the url given. Special pseudo-urls: none - omit creating hypertext links self - the host/group will be found on the same page clear - clear table with individual urls (see below) %refto name url Arguments: name - name of group or host displayed url - the URL which should appear in this group's hrefs (see also above) This statement allows to set individual hypertext links for specific groups or hosts. If a group or host called 'name' appears in a table the "%refto url" is overriden by the url given in the respective "%refto name url" command. You can set up a table of name/url pairs by using multiple %refto statements. The table is truncated with the statement "%refto clear". %itemref url Arguments: url This statement should only be used within %Page statements. When creating tables 'bbd' will usually create hypertext links for each status lamp in the table. This is done by appending a '/' character, the name of the host/group, a '.' character plus the name of the status item to the url given. If a file with the prefix '.html' exists, then this one is used... Special pseudo-urls: none - omit creating hypertext links Reasonable %itemrefs are e.g.: %itemref logs point to the status message collected by the Status Collector %itemref html point to the HTML version of the status message %table GROUP1 ... GROUPN Arguments: list of group names This statement should only be used within %Page statements. It tells 'bbd' to create a table for each of the arguments containing all the hosts/groups contained in the respective group. Since 0.22beta each group may be prefixed by one or more "+" characters. In this case the table will not contain the groups/hosts in the respective group but all the groups/hosts found when descending the group tree. %image cfg-file Arguments: configuration file Inserts an image/HTML image map into the page. Please see below for a description of the config file (default: adm/display_map.cfg) NOTE: when using %image you must have installed the GD perl module. %ref name Arguments: name This statement should only be used within %Page statements. It tells 'bbd' to create a HTML label at the current position in the html code. 'bbd' does automatically generate labels for any table appearing within the page, so this statement is not commonly used. %Rsync host[:port] prefix GROUP1 ... GROUPN Arguments: host - remote status collector's host name port - remote status collector's port (defaults to 1984) prefix - the prefix prepended to each host name reported to the remote status collector GROUP1 ... GROUPN - a list of groups that must be reported to the remote status collector This statement tells 'bbd' to regularly (each 5 minutes cycle) build a list of stati known to it and report them to a remote status collector. Each host name is prepended by a prefix (prefix "none" means no prefix). Currently only host stati can be synced. No group information is exchanged. %Frameset output-file initial-page title Arguments: output-file the name of the generated html file initial-page the name of the html file initially displayed title the title of the page This creates a frameset frameing the Big Sister status pages. NOTE: The frameset is only generated when bb-display.cfg is re-read (e.g. after a configuration change or on startup). NOTE: for %Frameset to work correctly a skinset must be used that includes frame definitions (e.g. "frames") %include file1 ... filen Arguments: file the name of a file to include (if no absolute path is given the path is supposed to be relative to the Big Sister root directory) This statement only works when used within %Page statements. It tells bbd to include a file whenever the respective page is rebuild. The file is included at the 'current' position, so e.g. %Page ... %table TEST %include file will include the file 'file' after the table 'TEST'
Note: Release 0.36 introduced a new security model. Though the old adm/hosts.allow file and format is still supported for compatibility reasons the adm/permissions file should now be used. The permission file tells bbd which clients are allowed to connect and which operations they may perform. The file is read line by line. Each line contains both a pattern and a list of operations accepted or rejected for the matching clients. If a client matches multiple patterns the associated access lists are treated in a cumulative way and applied in the order they appear in the file. The format of each line is pattern => access list accepted patterns are: host name client name matches 'name' host ip client IP address matches 'ip' anonymous no user is logged on on this connection user name user 'name' is logged on member group the user logged on is member of group 'group' Each of 'name', 'ip', 'group' is a perl regular expression. Releases up to and including 0.36 do not support user authentication so each client will be treated as 'anonymous' by them. The access list is a list of keywords being associated with an operation or a group of client operations. Each keyword is preceded by either a "+" or a "-" character allowing or rejecting corresponding request. Accepted keywords are: all all operations authenticate client is permitted to send user authentication status client is authorized to send status messages page client is authorized to send page commands grouping client is permitted to send group join/leave and name commands archiving log file archiving operations alarm_acking alarm acknowledging operations Empty lines and lines starting with '#' are ignored. Example: # default - reject everything host .* => -all # # hosts in mydomain.com may only send status # messages host .*\.mydomain\.com => +status # # archiver may only do archiving host archiver => -all +archiving # # localhost may do everything host localhost => +all # # unauthenticated clients must never page anonymous => -page # # user 'root' may do anything no matter from # which client he connects user root => +all # # selected hosts may send grouping information host group1 => +grouping host group2 => +grouping
The Big Sister status collector allows clients to join/leave groups on client request. The file adm/grouping is used by uxmon (provided you are using the standard uxmon-rules.pl file) and contains a list of hosts and the groups they should join. The file is read line by line. Each line should start with a host name continue with a list of groups uxmon should tell the status collector to join this host, e.g.: server1 EAST UNIX ALL server2 WEST NETWARE ALL hub1 WEST HUB ALL NOTE: lines starting with '#' are treated as comments NOTE: Using the 'grouping' feature means that uxmon will send the grouping information to the status collector each time status is sent (usually each 5 minutes). NOTE: If uxmon is terminated normally (e.g. when an agent machine is halted intentionally) it sends a group leave command to the status collector and the status collector will forget about these hosts (and remove the status files from www/logs). If it crashes for some reason or other the groups won't be left and the status collector will still have the hosts in its lists.
If the file adm/uxmon-asroot exists bb_start will start up uxmon with root priviledges.
When using the provided uxmon-rules.pl file this file tells uxmon what checks it should run on what hosts and where to send status information to. It is read line by line. If a '#' character is found all the characters behind # are treated as a comment. Lines ending in '\' are treated as being multi-line entries (like e.g. in shell). Each line is of one of the following formats: hostname check1 var1=content1 check2 check3 ... checkn or hostname(alias) check1 var1=content1 check2 check3 ... checkn (quotes " and ' allowed and interpreted) Hostname is the name of the host uxmon should check. The name returned by /bin/hostname or 'localhost' are recognized as the local system. Alias is the name uxmon should use when reporting status to the status collector. This allows e.g. the following check: server1-interface1(server1) ping server1-interface2(server1) ping http (where server1 is meant to be a multihomed host with two network interfaces which should both be checked but should both be reported as being the same machine) Var=content sets a variable named 'Var' with the value 'content'. Some of the checks allow for passing arguments this way. The variable space is cleared after each line ... Common variables are: frequency=xx tell uxmon how frequently to run a check. This is not really a frequency but rather an interval - the value is a number specifying the time in minutes between 2 checks, e.g. foobar frequency=10 ping will ping foobar every 10 minutes Checks are names of checks uxmon should run for the respective host. When interpreting checks uxmon-rules.pl does look for files carrying the name of the check in first adm/Config then in uxmon/Config. If found it does interpret it as a perl script setting up some check, runs it and passes (optional) arguments (var=...) to it. Currently implemented checks are: bbdisplay not really a check ... tells uxmon to send status reports to this host and use Big Brother compatibility mode. Multiple hosts may be bbdisplays. Note that some of the functionality is lost when using 'bbdisplay': no dynamic grouping (see file adm/grouping), no multiple status reports per tcp connection. Usage: mydisplay port=1984 timeout=8 fqdn=no bbdisplay (port, timeout and fqdn are optional and default to 1984, 8 and no) 'fqdn' tells uxmon to either report host names with stripped domain (fqdn=no) to the status collector or with "." in hostnames replaced by "_" or "," (fqdn=yes). So e.g.: mydisplay fqdn=yes bsdisplay foo.bar.com ping will report foo_bar_com.conn to status collector while mydisplay fqdn=no bsdisplay foo.bar.com ping will only report foo.conn 'bbdisplay' will rather use a ',' to replace dots in FQDNs while bsdisplay will use '_'. bsdisplay same as before but use Big Sister protocol. This should be preferred to bbdisplay. cpuload Usage: localhost cpu_yellow=10 cpu_red=20 cpuload check the CPU load as reported by the 'uptime' command. cpu_yellow defaults to 10, cpu_red to 20. statusfile Usage: localhost file=adm/mystatus statusfile read status information from a file and report it to the Status Collector. This monitor is thought to be useful for interfacing to external simple monitors - they can write their status to a file rather than careing about TCP connections to Status Collectors, e.g.: echo "status myhost.mytest green wow - everything is ok" > adm/mystatus echo "status myhost.oterh yellow something went wrong" >> adm/mystatus would do ... bbscript Usage: localhost env="LIMIT1=5;LIMIT2=10" file=adm/bb-oracle.sh bbscript Use BB style monitor script. "file=" must point to the script to execute each 5 minutes, "env=..." lists optional environment variables to be set in the scripts environment. The common variables (such as BBHOME and the like) are automatically set. http check http response. When used without arguments it will connect to port 80 and try to get the file "/". Other URLs/ports may be passed in either of the following two ways: server url=http://host:port/somewhere/somefile.html http or simple: server http://host:port/somewhere/somefile.html tcp check if the host does respond to tcp connection request. Some well known services (such as smtp, pop3, nntp, ica) will be recognized and not only connected to but also checked against some expect/send pairs (e.g. when checking SMTP uxmon will expect an answer starting with '22'). Usage: service=pop3,smtp,printer tcp where service is a variable set to a comma separated list of services 'tcp' should check. Some well known services have their own aliases, so they can directly be listed without "service=... tcp", e.g. server1 pop3 smtp printer is ok. ping does a ping. Note that ICMP pings are only possible when uxmon is running with root priviledges. By default 'ping' will therefore use the UDP protocol. Ping supports icmp/udp/tcp though. Most of the IP stacks are implementing both ICMP and UDP echo services ... For running uxmon with root priviledges create file adm/uxmon-asroot and bb_start will start up uxmon as root. Usage: server1 ping or server1 proto=udp ping proto=tcp ping proto=icmp ping server2 proto=fping ping special protocols: if proto=external then a operating system command will be executed instead of using the built-in ping methods. Use proto=external like: server1 proto=external pingcmd="ping -c1" ping proto=karl is another special implementation of using the operating system ping. It is limited to Solaris though. If you have got the fping command on your system you probably want to use proto=fping rpc does an 'rpc ping', means it does send a 'NULL' remote procedure call to the respective program and checks for a correct answer. Usage: server1 rpc=mount,nlm rpc rpc=nfs version=3 rpc Note: Some of the checks have their own aliases, so you can also write: server1 mount nfs nlm yp yppasswd Note: currently known programs are: mount, nfs, nlm, yp, yppasswd (list can be extended in Monitor::rpc_ping) Note: rpc does need a working portmapper on the remote system procs does check for running processes. On Win32 Systems this monitor checks for running services. Usage: localhost procs=nfsd(1-16),sendmail,lpd(1-40) procs ("there must be 1 to 16 nfsd processes, at minimum one sendmail process and 1 to 40 lpd processes") localhost pscomm="ps cax" procs=nfsd(1-16),sendmail,lpd(1-40) procs (same as above but use the command "ps cax" for finding the running processes) NOTE: On Win32 systems the "pscomm" argument is ignored. NOTE: On Win32 systems it is possible to monitor remote systems. diskfree does check file systems for free space Usage: localhost type=ufs fs=/(1000-5000),/var(10000-20000) diskfree ("status red if / is below 1MByte free, yellow if below 5MBytes,..., the file system type is ufs") Note: the default type is "ufs" Note: the default fs is "all file systems of type" and limits are (5000-10000) diskload check the average disk load (4 minutes) as reported by 'sar'. Usage: localhost yellow=3 red=8 diskload ("report status yellow when load >3%, report status red when load >8%") load check for CPU idle time, I/O-wait, freeswap as reported by sar (4 minutes period) Usage: localhost idle=10 wio=50 freeswap_red=100000 freeswap_yellow=200000 load ("report 'yellow' when %idle is below 10%, or %wio is greater than 50% or freeswap is below 200000 blocks, report red when freeswap is below 100000 blocks") Note: defaults: idle=15, wio=50, freeswap_red=20000, freeswap_yellow=60000 dumpdates check for last backup if using dump/ufsdump. Usage: localhost fs=.7(6-10),.0(30-40),/dev/rdsk/c0t1d0s0.7(1-2) dumpdates yellow: last level 7 backup older than 6 days, last level 0 backup older than 10 days, last level 7 backup of /dev/rdsk/c0t1d0s0 older than 1 day red: last level 7 backup older than 10 days, last level 0 backup older than 40 days, last level 7 backup of /dev/rdsk/c0t1d0s0 older than 2 days syslog check system log files Usage: localhost syslog cfg=/etc/bs_syslog.cfg syslog Note: syslog does need its own configuration file. By default this is etc/syslog. See below for a description of the file format Note: On startup syslog will re-read the last 15 minutes of the log file but at most 30kBytes eventlog check event log on Win32 systems Usage: server1 eventlog Note: eventlog does need its own configuration file. By default this is etc/eventlog. This config file is the same format as the one for the syslog monitor. See below for a description of the syslog config file. snmp_trap check the log file being generated by bstrapd (var/snmp_traplog). This is only useful if bstrapd is running (usually bstrapd is started up if its configuration file adm/bstrapd.cfg exists). Usage: server1 snmp_trap Note: snmp_trap has its own configuration file called etc/snmp_trap. See below for a description (same format as syslog config file). snmp remotely monitor hosts running SNMP agents Usage: server1 novell hub2 hub server3 type=ping,net,storage,nwusers,cpu,novell snmp router1 community=secret type=net,ping snmp The default community (when not passwed with "community=...") is 'public'. "type" may be a list of checks out of the following: ping report in host.conn if snmp poll was successfull net check any network interface for InputErrors and OutputErrors and report a failure if device reports more than 2/s (yellow) or 6/s (red) storage (currently only together with "novell" and "nt") use the 'hrStorage' oid for checking disk and memory useage cpu uses 'hrProcessorLoad' to monitor CPU load. Does report "yellow" if load is >80% nwusers (will only work with Netware servers) Does check 'nwMaxLogins' against 'nwLoginCount', report 'yellow' if only 10 users are left, 'red' if less than 2 are left hub / nt / novell / caty / cds / notes: these do not include any check but will tell 'snmp' which types of machine are checked. Some snmp checks are depending on the type of host ... hub: network hub nt: Windows NT novell: Novell Netware caty: Cisco Catalyst cds: Axis CD-Server Note: snmp does read the file etc/mibs.txt. By default this contains only the Internet MIB-II (ping/net). You need to add more mibs from the contrib/mibs directory, e.g. cat host.txt mib-2.txt nwhostx.txt nwserver.txt > mibs.txt OV monitor HP Openview trapd.log Usage: localhost cfg=/etc/bs_OV.cfg syslog Note: OV does need its own configuration file. By default this is etc/OV. See below for a description of the file format Note: On startup OV will re-read the last 24 hours of the log file but at most 1MByte metastat monitor Solstice Disksuite metadevices Usage: localhost stat=/usr/opt/SUNWmd/sbin/metastat metastat default value of 'stat' is /usr/opt/SUNWmd/sbin/metastat, so 'stat=' can be omitted in most cases. FQDN / noFQDN (obsolete, use "fqdn=" with bsdisplay/bbdisplay instead) Usage: localhost FQDN rather an option than a monitor. Tells uxmon to either report host names with stripped domain (noFQDN) to the status collector or with "." in hostnames replaced by "_" (FQDN). So e.g.: localhost FQDN foo.bar.com ping will report foo_bar_com.conn to status collector while localhost noFQDN foo.bar.com ping will only report foo.conn ntp Usage: foo ntp Check if the machine is running an ntp server. This check uses the ntpdate command - therefore only works on systems with ntpdate installed. mrtg Usage: foo.bar.com prefix=10.1.1.253/10.1.1.253.1 column=mrtg \ maxlev=10485760 bits=1 mrtg NOT REALLY DOCUMENTED YET atmport etherport Usage: fooswitch switch=192.168.0.4 port=3A2 vpi=0 atmport fooswitch switch=192.168.0.4 port=3A2 etherport check operating status of specific ports. NOT REALLY DOCUMENTED YET software Usage: foo type=asn9000 expected=ForeThought.*6\.0\.1 software foo2 expected=Linux.*2\.0\.36 software Get the firmware release via SNMP and check it against a configured "expected" version. Report a yellow status if the version does not match. The "expected=" argument is a regular expression tested against the value of system.sysDescr.0. Some special "type=" values exist: type=asx FORE ASX ATM Switches checks ASXSoftware.0 instead of sysDescr type=asn9000 FORE ASN9000 PowerHub checks ASNSoftware.0 instead of sysDescr type=es2810 FORE ES2810 Ethernet Switch
Note: There are quite many things you can do with the log file monitor. Unfortunately this fact rendered its config files rather complex. It's always a good idea to have a look at some samples - e.g. the etc/OV file. Various monitors perform about the same function: They watch log files. These monitors include syslog, eventlog, logfile, OV. Each of them has its own config file usually called etc/syslog, etc/eventlog and so on. The format of all of these files is identical. There are only very few differences in the semantics. Each file consists of one section per system log file that should be watched. Empty lines and lines starting with '#' are ignored. Each section starts with a line containing the file name of a log file that should be watched followed by a ':', e.g. /var/adm/messages: If you would like to use the same config file on multiple systems and the file is called different on some of them you might find the following be a solution. You can list multiple filenames comma separated on the same line. The monitor will then use the first existing file out of this list: /var/adm/messages,/var/log/messages: For the eventlog monitor the semantics is different. The 'file' here must be the name of a log, e.g. "System" or "Security". The lines following the file name are of the format (fields separated by one or more tabs): pattern status minutes text topic where 'pattern' is a regular expression (perl style!) that should be matched against each line appearing in the log file, status is the status that should be reported if a matching line is found (e.g. "yellow" or "red"), minutes is the time in minutes the status should be reported (remember that "line appearing in log file" is an event passing by very fast - so we extend the time a little :-)), text is the text that should be appearing in the status message and topic is service type part of the status name (defaults to "msgs", thus 'topic' is optional). Like in perl patterns may include sections in parentheses "(...)". These can be referenced in text with $1 through $9. E.g. the following rule: /var/log/syslog: to=([^,]+) yellow 10 someone sent mail to $1 funny will report a status of machine.funny: yellow someone sent mail to blabla for 10 minutes if someone sends an email message to user 'blabla'. Apart from a color 'status' may be the word 'clear'. In this case the log file monitor will not log the corresponding message but will rather find any tracked message matching the text and remove it. E.g.: host (.*) is down red 20 host $1 down host (.*) is up clear 0 host $1 down will make the monitor report "host ... down" for 20 minutes after seeing "host ... is down" in the watched log file. If within these 20 minutes the message "host ... is up" is detected the message is immediately cleared from the memory though without waiting for the whole 20 minutes. In 'clear' patterns the message text is treated as a regular expression if it is preceded by a '+' sign, e.g. host (.*) was disposed clear 0 +host.*$1 will clear any message containig host.*... if the message "host ... was disposed" is detected. If pattern is 'default' the line is treated as the default status that should be reported if no other event is pending. E.g.: default green hostname everything looks fine Will report "green everything looks fine" to the Display Server for the host 'hostname' if there's nothing else to report. Usually you will use "*" for hostname - semantics: set the default for any host known to the log file monitor. This can be used in a less obvious way too. Consider the following example: /var/log/dhcp: default yellow 0 don't know what's going on dhcp DHCPOFFER green 30 we are still giving out addresses dhcp DHCPACK green 30 we are still ack'ing leases dhcp no free leases red 10 oops - we are out of leases dhcp This will report 'yellow don't know what's going on' if no log entries are written, "green we are still giving out addresses" if an address was given out during the last 30 minutes ('DHCPOFFER'), "green we are still ack'ing leases" if a lease was acknowledged during the last 30 minutes and "red oops - we are out of leases" if "no free leases" was logged during the last 10 minutes. Since you have not got much influence on what is written to log files host names appearing there may sometimes be different from the names you want them to appear in Big Sister. For this reason the pattern node logname=truename has been introduced. Some monitors will automatically add 'node' entries during runtime. E.g. the OV monitor will add one each time it detects a "System name changed" message.
Note: The file may be called as you like - adm/display_map.cfg is the default. This file describes how to build a graphical status display when using the %image statement in bb-display.cfg. It consists of a series of one line statements. Empty lines and lines starting with "#" are ignored. All file names are paths relative to the Big Sister root directory. Known statements are: template filename read background graphic from file filename (must be GIF format). This is mandatory! No image without "template" ... name coord coordname remember the display position 'coord' under the name 'coordname'. After this statement whenever a display position is expected 'coordname' can be used instead (e.g. in 'at' or 'line' statements). Example: name 100,150 NewYork name 70,200 Dallas line NewYork Dallas DALLASWAN at NewYork NEW_YORK at Dallas DALLAS red filename yellow filename purple filename green filename the graphic to be inserted for red/yellow/purple/green status (GIF). at coord group display status for group "group" (groups as configured in bb-display.cfg) at position "coord" (0,0 is at upper left corner). Example: at 100,150 NEW_YORK at 30,80 WASHINGTON line coord1 coord2 group draws a line from display position coord1 to display position coord2 with the color of the status of the group group dump filename write the image to file filename. This must be the last statement. The file must be in the "www" directory, otherwise browsers will not find it.
Bb_event_generator.cfg is the configuration file for the alarm generator (bb_event_generator). The file tells on what conditions alarms should be sent to whom and with what priority. It consists of one-line-rules. Like in other files empty lines and lines starting with '#' are ignored. '\' at the end of a line will tell bb_event_generator that the statement continues on the following line. Each rule is composed of one or more patterns separated by ';' and a list of variable settings. The variable settings are separated by spaces or tabs. So a rule looks like: pattern1;...;patternN var1=text1 var2=text2 ... varN=textN where pattern1 thru patternN are patterns, var1...varN are variable names and text1...textN are variable values. Each pattern is composed of a host part and a check part: host.check where 'host' may be one out of the following: - a host name as reported to Big Sister status collector e.g. myserver, www, ... - an IP host or network address in "[" "]" parentheses e.g. [139.79.159.1], [192.168.50] NOTE: this will only work for host names that can be resolved into an IP address - a group as known to the Big Sister status collector with prefix '@' e.g. @ALL, @ROUTER - an asterisk "*" matching any host Since version 0.29, the pattern may be extended by an additional condition. The syntax is host.check{condition} where condition is a boolean expression, e.g. *.*{$mail == "test"} mail="nobody" special functions are 'daytime' and 'weekday', they can be used like this: *.*{daytime 22:00-06:00 or weekday Sat,Sun} postpone=30 *.*{daytime 22:00-06:00} postpone_to=06:00 and 'check' may be either an asterisk "*" matching any check or a check as displayed in the columns of the status display. Whenever a status change is detected, bb_event_generator.cfg goes through the config file and looks for matching patterns. Each variable associated with the matching patterns is then set as listed. If multiple patterns are matching the associated variables are set in order. Interpreted variables are: - mail mail addresses where to send alarm (comma separated) - prio priority level (0..100) - repeat if set bb_event_generator will send the alarm again all x minutes until the alarm condition has cleared - repeatprio the priority level for repeated alarms (see "repeat") - keep the duration in minutes the alarm is not cleared by the event_generator after the alarm condition is telling us that everything is ok again - norepeat the duration in minutes no alarm can be sent for the same condition - delay the duration in minutes between when the alarm is raised and sent to the user - check a boolean expression that is checked during the 'delay' time and forces the alarm to be aborted if the condition is not met once during this time - down (one out of "green", "purple", "yellow", "red", "never") tells the event generator which status should be interpreted as "down". E.g.: "yellow" means that if the status is "yellow" or below ("red") is detected then the corresponding service is down. - up (like down) tells the event generator which status should be considered as "up". E.g. down=yellow up=green means that a service is considered as down from the time when it changes to yellow or red to the time when it goes to "green" again (but not if it's going to "purple"!) - maxmsg a numeric value which is the maximum size of a message sent in the subject line of the alarm mail (e.g. if you send it through a pager gateway ...) - postpone if set alarms won't be sent for additional x minutes and stay rather in the queue. If during the postpone time the alarm condition is cleared the alarm is silently thrown away. Postpone is meant to be used e.g. during when you don't want to be alarmed when there's a problem for a limitted time only - postpone_to same as postpone but the value is expected to be a daytime rather than an interval (e.g. "06:00"). - pager use alternative pager program (instead of the default 'log_mail', e.g. 'notify' is a good choice!) - trap if set bb_event_generator will raise an SNMP event for any alarm/acknowledgement. The contents of trap is a trap destination composed of a community and a host of the form community@host. If the community is missing "public" is assumed. See also SNMP_AGENT. Examples: Usually you will put a general rule with a pattern matching any host/check and the default variable values as your first rule, e.g.: # default values *.* mail=alarm prio=50 norepeat=20 down=yellow up=green maxmsg=60 if you do not want to get an alarm about e.g. smtp being down when you already know that the connection to the host is down then you could use the following rule for instance: *.smtp delay=5 check="$host.conn" (semantics: if the "conn" goes down within 5 minutes after smtp down is detected then throw away the smtp alarm, otherwise send it after 5 minutes) If your very important machines are in a group called "IMPORTANT" then you may wish to do something like: @IMPORTANT.* prio=100 repeat=30 repeatprio=60 (semantics: if a service of a machine in the group IMPORTANT goes down then send an alarm with priority 100 and send a reminder with priority 60 each 30 minutes ("yell for help")) If the machines in a group EAST are all located in a network connected to router "router-east" then you may get plenty of alarms when "router-east" goes down since any machine behind is unreachable. You can avoid this by e.g.: @EAST.conn check=router-east.conn delay=5 router-east.conn check="1" delay=0 (semantics: if a host is in group EAST and the connection to it goes down wait for five minutes and if within these five minutes the connection to router-east is lost too then do not send an alarm for this host. If the host is the router itself send an alarm immediately) or @EAST.* router=router-east *.conn check="($router.conn) or not $router" delay=5 router-east.conn check="1" delay=0 (semantics: if a host is in group EAST set the variable "router" to "router-east". If the connection to any host is going down then wait for five minutes and check if either there is no router configured for this machine or the connection to the router goes down as well. Discard the alarm if the router goes down. Of course except for if the machine is the router itself) NOTE: you cannot use variables in patterns, so e.g. the example above cannot be written as (not yet): @EAST.* router=router-east $router.conn check=1 delay=0 Postpone is used during times when system failures are less important, e.g. during night. You can postpone alarms for a time interval: *.*{daytime 22:00-06:00} postpone=60 This will tell the event generator to keep a raising alarm in the postpone queue for 1h before sending an alarm mail. If during this time the alarm condition clears no alarm is sent at all. If you never want to be waked up by alarms, then *.*{daytime 22:00-06:00} postpone_to=06:00 might be what you want (Semantics: when an alarm is detected during night send it at 06:00)
Roland Roberts contributed an add on to the original 'log_mail' allowing to send alarms through other than the default 'mail' method configurable on a per user basis. If you'd like to use this extension then you'll have to put *.* pager=notify into your bb_event_generator.cfg Roland Roberts says on notify: This isn't really a replacement; by default it calls log_mail if there is no alias defined for the user. It also doesn't pay attention to the BATCHABLE flag in notify.cfg and instead sends out each page independently. If you want to use this, you have to modify bb_event_generator to use `notify' instead of `log_mail'. Here is my copy of notify.cfg: ---- adm/notify.cfg ---- # This will reroute mail sent via "mail=userid" inthe event generator. # All programs need to accept the "standard" command swiches of # -D DEBUGLEVEL # -s Subject # -H Extended Subject # -S Severity Level (numeric) # -o Offender (e.g., xyzzy.disk) # -t Type (what's this?) # -c Message Code (what's this) # -h Host # -M Mail host (may be ignored, used for mail relaying via SMTP). # # The list of user IDs will follow all options. # # # USERID # The user ID specified in the "mail=" setting of bb_event_generator.cfg # ALIAS # Replace USERID with ALIAS when calling this program. # PROGRAM # The program to run. If it is a relative path, it is relative to the BS # install base directory. # BATCHABLE # Boolean (0 or 1) to indicate whether or the program can handle sending # to multiple user IDs or if the program must be called once for each. # OPTIONS # Any extra parameters which PROGRAM needs to run. Note that if these # parameters only apply to one user, you will need to set the BATCHABLE # flag to 0. # rroberts:1205780:skytel.pl:1: BTW: -t and -c are currently not used with event generator. Their meaning is: -t type: is a string consisting of a message type (only one defined at the moment is "alert") followed by an "_" and an optional key (kind of authentication) -c code: message code. While the Message text should be something human readable, the message code is the message (or message class) in a machine readable way (currently used e.g.: "BOOT" = system boot, "ETHER" = ethernet network error, "SECURITY" = security alert, "DISK" = disk error, ...)