alarm
alarm plugin
Description
The alarm plugin is intended to do some basic system status checks on
the monitored system and report alarms to WombatOAM when any of the monitored
parameters reach a certain threshold.
Applications it depends on
kernel
Modules
wombat_plugin_alarm
Reports
The plugin reports the following alarms:
process_limitport_limitets_limitatom_limitmodule_limitexport_limitmemory_limitopen_file_limitopen_socket_limitos_cpu_loaddisk_capacityshell_history_sizeprocess_message_queuesystem_informationold_code
Configuration options
The interval at which the checks are performed is configurable, in case it is necessary to regulate plugin's moderate resource use:
collection_interval(integer, default: 60000): Specifies how many milliseconds to wait between checking whether any process-related alarm (e.g.process_message_queue) should be raised or ceased.interval(integer, default: 60000): Specifies how many milliseconds to wait between checking whether any system limit-related alarm (e.g.atom_limit) should be raised or ceased.app_check_interval(integer, default: 60000): Specifies how many milliseconds to wait between checking whether the version of any application changed. In case of any change, WombatOAM will raise adifferent_application_versionsalarm.
The "node info alarms" are raised by the WombatOAM server, based on the node info reported by this plugin:
node_info_opts/app_version_alarms(default: true): Iftrue, then alarm will be raised if nodes in the same family have different versions of the same application, or the application is not running on all nodes. Application started or stopped on nodes will be logged as notifications.node_info_opts/time_diff_alarms(default: true): Iftrue, then alarms will be raised if nodes in the same family are in different time zones.
The system_checks option is a list of system checks that the plugin shall
perform.
process_limit,port_limit,ets_limit,atom_limit,module_limit,export_limit,memory_limit,open_file_limit,open_socket_limit,os_cpu_load,disk_capacity: These system checks are a minor alarm limit and a major alarm limit. After these limits are reached, an appropriate alarm is raised. The thresholds are expressed as percentages.shell_history_size,process_message_queue: These system checks are a minor alarm limit and a major alarm limit. After these limits are reached, an appropriate alarm is raised. The thresholds are absolute numbers.system_information,old_code: These system checks only have an alarm severity, which specifies the severity of the alarm that should be raised for them.
Example wombat.config entries
1 2 3 4 5 6 7 | |
The system check configuration entries can be overridden individually:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | |
To disable all system checks or enable only a few of them, list only those that shall be performed:
1 2 3 4 5 6 7 8 | |