alarm
alarm plugin
Description
The alarm
plugin is intended to do some basic system status checks on
the monitored system and report alarms to WombatOAM when any of the monitored
parameters reach a certain threshold.
Applications it depends on
kernel
Modules
wombat_plugin_alarm
Reports
The plugin reports the following alarms:
process_limit
port_limit
ets_limit
atom_limit
module_limit
export_limit
memory_limit
open_file_limit
open_socket_limit
os_cpu_load
disk_capacity
shell_history_size
process_message_queue
system_information
old_code
Configuration options
The interval at which the checks are performed is configurable, in case it is necessary to regulate plugin's moderate resource use:
collection_interval
(integer, default: 60000): Specifies how many milliseconds to wait between checking whether any process-related alarm (e.g.process_message_queue
) should be raised or ceased.interval
(integer, default: 60000): Specifies how many milliseconds to wait between checking whether any system limit-related alarm (e.g.atom_limit
) should be raised or ceased.app_check_interval
(integer, default: 60000): Specifies how many milliseconds to wait between checking whether the version of any application changed. In case of any change, WombatOAM will raise adifferent_application_versions
alarm.
The "node info alarms" are raised by the WombatOAM server, based on the node info reported by this plugin:
node_info_opts/app_version_alarms
(default: true): Iftrue
, then alarm will be raised if nodes in the same family have different versions of the same application, or the application is not running on all nodes. Application started or stopped on nodes will be logged as notifications.node_info_opts/time_diff_alarms
(default: true): Iftrue
, then alarms will be raised if nodes in the same family are in different time zones.
The system_checks
option is a list of system checks that the plugin shall
perform.
process_limit
,port_limit
,ets_limit
,atom_limit
,module_limit
,export_limit
,memory_limit
,open_file_limit
,open_socket_limit
,os_cpu_load
,disk_capacity
: These system checks are a minor alarm limit and a major alarm limit. After these limits are reached, an appropriate alarm is raised. The thresholds are expressed as percentages.shell_history_size
,process_message_queue
: These system checks are a minor alarm limit and a major alarm limit. After these limits are reached, an appropriate alarm is raised. The thresholds are absolute numbers.system_information
,old_code
: These system checks only have an alarm severity, which specifies the severity of the alarm that should be raised for them.
Example wombat.config entries
1 2 3 4 5 6 7 |
|
The system check configuration entries can be overridden individually:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|
To disable all system checks or enable only a few of them, list only those that shall be performed:
1 2 3 4 5 6 7 8 |
|