Skip to content

WombatOAM Plugin Guide

What is a plugin?

Plugins for WombatOAM are user-supplied modules that extend its capabilities to monitor new Erlang applications. Plugins provide a way to extend WombatOAM and give it the ability to collect application-specific metrics, notifications, raise custom alarms and implement services. They can also be added to a binary WombatOAM release; this requires a restart of the system. Plugins run on the managed node and communicate with the WombatOAM server.

Viewing and managing plugins

The web dashboard shows the plugins that are running on each node. Click Topology → select a node → Agents. You can turn individual agents on or off on a per-node basis.

Configuring & deploying plugins

WombatOAM will start a plugin automatically on the managed node if the node is running the matching versions of the dependant Erlang applications declared for that plugin in one of the WombatOAM config files (sys.config or wombat.config).

For WombatOAM to find a plugin, the compiled BEAM files for the plugin need to be present in the plugins directory of the WombatOAM release when WombatOAM is started. This directory can be overridden: in the wombat.config file, set the environment variable plugin_dir of the wo_utils application:

1
{set, wo_utils, plugin_dir, "...path/to/plugins"}.

The main module of the plugin should be named wombat_plugin_<APPNAME>.beam. For example, a plugin for the wo_test application should be named wombat_plugin_wo_test.beam.

A plugin's settings need to be declared in WombatOAM's wombat.config file. For example, to enable the plugin for the wo_test application, wombat.config needs the following:

1
2
3
4
5
6
7
%% {PluginName, DependantApplications, ExtraModules, Options}
{replace, wo_plugins, plugins, wo_test,
 {wo_test,
  [{myapp1, "1\\.3"}, [{myapp2, ".*"}, {myapp3, ".*"}]],
  [wombat_plugin_dummy_module],
  [{test_option, 42},
   {required_wombat_apps,  [{kernel, "^[3-9]\\..*"}]}]}}.

The main plugin module is wombat_plugin_wo_test and there is a library module called wombat_plugin_dummy_module which gets loaded into the managed node. Module names must also have a wombat_plugin_ prefix.

A dependant application is an application that must be running on the managed node in order to be able to use the plugin. A plugin can have more than one dependant application. In the example above, the wo_test plugin depends on {myapp1, "1\\.3"} and [{myapp2, ".*"}, {myapp3, ".*"}]. This means that wo_test can be used if either:

  • version 1.3 of myapp1; or
  • any version of both myapp2 and myapp3

are available on the target node. [{myapp2, ".*"}, {myapp3, ".*"}] is known as a dependant application group, which means that all the dependencies defined within the group must be met in order to be able to use the plugin.

Extra options to the plugins can be passed via the Options property list.

This list should be used in case of the plugin's binaries requires some extra applications to be loaded into the Wombat node. That can be declared by the required_wombat_apps property that should list the dependant applications. In the example above, the wo_test plugin will be loaded into Wombat, only if Wombat is running on Erlang/OTP 17 or newer versions as its binary contains op codes that can't be interpreted by older Erlang versions (e.g. uses maps). If the property is not set, Wombat always tries to load the plugin's binaries.

When developing a plugin, it is often useful to set the verbosity that belongs to the plugin to the debug level:

1
{set, wo_plugins, plugin_action_verbosity, my_plugin, <<"debug">>}.

This means that notifications will be generated for many plugin actions, such as starting, terminating, sending metrics and alarms to WombatOAM, handling requests, etc. See more information about this settings in the Configuration section.

Note that this setting may have a negative impact on WombatOAM's performance. It should be only used while you are developing a plugin or experiencing issues.

Writing plugins for WombatOAM

The main module of the plugin must implement the wombat_plugin behaviour by implementing its callback functions. Each plugin will run in its own process and it has a state in which it can store information. The callback functions init and terminate are called when the plugin is started/stopped. The callback handle_info/2 is called when the plugin process receives a message.

A plugin can report metrics, notifications, alarms and other information by:

  • Implementing the wombat_plugin callbacks.
  • Using the library functions in wombat_plugin and wombat_plugin_utils.

Reporting metrics is mostly callback-oriented: the callbacks capabilities/1, live_metrics2comp_units/2 return the list of available metrics, while the callbacks collect_metrics/1 and collect_live_metrics/1 return the metric samples. If the list of available metrics change, the plugin should call the wombat_plugin:announce_capabilities/1 function.

The notifications, alarms and other information are reported not via callbacks but via calling library functions: wombat_plugin:report_log/2 for reporting notifications, wombat_plugin:raise_alarm/2 and wombat_plugin:clear_alarm/1 for reporting alarms, and wombat_plugin:report_internal_data/2 for reporting other information.

A plugin can also implement services. First it has to report as capabilities which services it implements (see Capabilities section). Then it has to implement 3 callbacks to implement a request for a certain service (see Callbacks section).

There are many useful library functions too, organised into 3 modules.

  • wombat_plugin contains functions to implement the core wombat_plugin behaviour,

  • wombat_plugin_utils includes general utility functions,

  • wombat_plugin_services provides functions to create new services by implementing the wombat_plugin_services behaviour.

The plugin is started on the managed node, supervised by other WombatOAM processes. Some general guidelines:

  • Plugins are expected to generate a moderate amount of data; currently, WombatOAM doesn't try to throttle plugins.
  • As WombatOAM supervises the plugin processes, start-up notifications for the plugins and their supervisors might show up in logs on the managed node, for example, when SASL is started.

Types

The types used by the plugins are defined in the wombat_types module and exported:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
%%%-----------------------------------------------------------------------------
%%% Capabilities
%%%-----------------------------------------------------------------------------

-type capabilities() :: [capability()].
%% A plugin exposes a list of capabilities.

-type capability() :: {capability_id(), capability_info()}.
%% Each capability has an id and a list of information.

-type capability_id() :: [binary()].
%% For metrics, the final component is the name of the metric, the prefix list
%% is the name of the metric group. Notifications (aka logs) are
%% currently not using these ids.

-type capability_info() :: metric_info()
                         | notification_info()
                         | alarm_info()
                         | service_info().

-type capability_tags_item() :: {tags, capability_tags()}.
%% List of tags should be assigned to alarms and metrics in the capabilities.

-type capability_tag() :: binary().
%% A tag is represented as a binary string.

-type capability_tags() :: list(capability_tag()).


%%%-----------------------------------------------------------------------------
%%% Capabilities: Metrics
%%%-----------------------------------------------------------------------------

-type metric_info() :: [metric_info_item()].

-type metric_info_item() :: {type, metric}
                            % UTF-8 binary string (not currently used, could be
                            % tooltip, for example)
                          | {description, binary()}
                          | {metric_type, metric_type()}
                          | {metric_unit, metric_unit()}
                          | capability_tags_item().

-type metric_cap_id_last() :: term().
%% The last element of the capability id of a metric, but as a term and not as a
%% binary. E.g. 'Folsom cpu'.
%%
%% From the plugin's point of view, this data type is used to administer live
%% metrics: e.g. when a live metric is enabled on the dashboard, a list of
%% metric_cap_id_last() items are send to the plugins to start sending the
%% appropriate metric periodically.

%%%-----------------------------------------------------------------------------
%%% Capabilities: Notifications
%%%-----------------------------------------------------------------------------

-type notification_info() :: [notification_info_item()].

-type notification_info_item() :: {type, notification}
                                | {description, binary()}.

%%%-----------------------------------------------------------------------------
%%% Capabilities: Alarms
%%%-----------------------------------------------------------------------------

-type alarm_info() :: [alarm_info_item()].

-type alarm_info_item() :: {type, alarm}
                         | {probable_cause, binary()}
                         | {proposed_repair_action, binary()}
                         | {severity, alarm_severity()}
                         | capability_tags_item().

-type alarm_severity() ::
    critical | major | minor | warning | indeterminate | cleared.

%%%-----------------------------------------------------------------------------
%%% Capabilities: Services
%%%-----------------------------------------------------------------------------

-type service_info() :: [service_info_item()].

-type service_info_item() :: {type, service_capability_type()}
                           | {description, binary()}
                           | {label, binary()}
                           | {feature, term()}
                           | {is_internal, boolean()}
                           | {is_exclusive, boolean()}
                           | {priority, integer()}
                           | {arguments, [service_option_name()]}
                           | {options, [service_option()]}.

-type service_capability_type() :: configurator | explorer | executor.

-type service_option() :: [service_option_item()].

-type service_option_item() ::
          %% This key is used in the request.
          {option_name, service_option_name()}

          %% The label of the option on the Dashboard.
        | {option_label, binary()}

          %% The type of the option. It determines whether `option_values' or
          %% 'listitem_type' need to be also specified.
          %%
          %% If the option type is a list, then the listitem_type field contains
          %% the types in the list. A list type will actually form a table. E.g.
          %% if the listitem_type field describes key1 and key2, then the user
          %% can fill in a table with two columns (key1 and key2) and any number
          %% of rows.
        | {option_type, service_option_type()}

          %% The default value of the option.
        | {option_default, binary()}

          %% Whether the option can be seen and directly set by the users.
        | {option_enabled, boolean()}

          %% The list of possible values that the option can have. When type is
          %% not enum, it is an empty list.
        | {option_values, [OptionValue :: binary()]}

          %% This definition is recursive, but in reality only the top service
          %% option can be a list, its children cannot. So the listitem_type
          %% will be empty for the children.
        | {listitem_type, [service_option()]}.

-type service_option_name() :: binary().

-type service_option_type() :: string | number | enum | list.

%%%-----------------------------------------------------------------------------
%%% Reporting
%%%-----------------------------------------------------------------------------

-type capability_data() :: metric_data().

%%%-----------------------------------------------------------------------------
%%% Reporting: Metrics
%%%-----------------------------------------------------------------------------

-type metric_data() :: {metric, capability_id(), metric_type(), metric_value()}.

-type live_metric_data() :: {live_metric, capability_id(),
                             metric_type(), metric_value()}.

-type live_metric_comp_unit() :: term().

-type metric_type() :: gauge | counter | histogram | meter | spiral | duration.
-type metric_unit() :: numeric | byte | percentage.
-type metric_value() :: term().

%%%-----------------------------------------------------------------------------
%%% Reporting: Notifications
%%%-----------------------------------------------------------------------------

-type severity() :: binary().
%% The severity of a notification. The recommended values are the following:
%% critical, error, warning, info, debug.

-type log_message() :: binary().
%% The text of a notification.

%%%-----------------------------------------------------------------------------
%%% Reporting: Alarms
%%%-----------------------------------------------------------------------------

-type alarm_id() :: term().
%% This is the same as alarm_id() in elarm.hrl.

-type alarm_add_info() :: term().
%% This is the same as additional_information() in elarm.hrl.

%%%-----------------------------------------------------------------------------
%%% Implementing: Services
%%%-----------------------------------------------------------------------------

-type request_args() ::
          [{KeyBinStr :: binary(),
            ValueBinStr :: binary() |
                           [[{InnerKeyBinStr :: binary(),
                              InnerValueBinStr :: binary()}]]}].
%% A piece of input given by the user, which is used to execute a certain
%% request.
%%
%% The type of the key (as defined in the service capability of this
%% request_args term) defines the type of the value:
%%
%% * For keys that have string, number, enum type, ValueBinStr is a binary.
%% * For keys that have list type, ValueBinStr is a list that contains inner
%%   lists. Each inner list is a proplist, and each inner list has the same
%%   keys (as defined in the capability).

-type display_info() :: #display_info{}.
%% A display info term describes for the GUI how to display streamed data. It is
%% created by the plugin process.

-type display_info_option_item() :: {is_interactive, boolean()} |
                                    {table_headers, [binary()]}.
%% Modifiers for the display_info().

-type execution_info() :: #execution_info{}.
%% An execution info term describes for the wombat_plugin behaviour how to
%% execute the request. It is created by the plugin process.

-type stream_data() :: stream_data_value() | stream_data_table().
%% Data to be streamed from the plugin to the GUI

%% Stream data constructions (values and tables):
-type stream_data_value() :: plain_value() | interactive_value().
-type stream_data_table() :: stream_data_plain_table()
                           | stream_data_interactive_table().
-type stream_data_plain_table() :: [[plain_value()]].
-type stream_data_interactive_table() :: [[interactive_value()]].

%% Basic building blocks for stream data:
-type plain_value() :: binary().
-type interactive_value() :: #interactive_value{}.
-type action() :: #action{}.

-type from_ref() :: {pid(), wombat_plugin_services:exec_req_ref()}.
%% A from_ref() reference is used by the plugins to identify an execute_request
%% call that will reply later asynchronously.

-type async_reply() :: {continue | close,
                        no_data |
                        {data, StreamData :: wombat_types:stream_data()}} |
                       {error, ReasonBinStr :: binary()}.

%%%-----------------------------------------------------------------------------
%%% Miscellaneous
%%%-----------------------------------------------------------------------------

-type plugin_state() :: term().
%% The plugins usually define and use their own state() type.

Capabilities

The metrics and services plugin interfaces use a list of capabilities to return information back to WombatOAM. When WombatOAM asks for information on available metrics, the capabilities/1 function of all plugins will be queried.

A plugin exposes a list of capabilities (capabilities()). Currently this is used to report the metrics that the plugin can report and services that can be requested from the plugin. Optionally, the alarms the plugin may raise can be reported. Each capability has an id (capability_id()) and a list that contains further information about the capability (capability_info()). An id is made up of a list of binary UTF-8 strings.

Metrics capabilities

In case of metrics capabilities the final component of the id is the name of the metric. The prefix list of the id (i.e. the list containing all elements of the capability id except the last one) becomes the name of the metric group.

Actual metrics samples either have the type metric_data() (in case of so-called collected metrics that are collected automatically) or live_metric_data() (in case of live metrics that are collected on-demand).

Note about the order of the entries in the capability list: When WombatOAM presents the metrics to the user, it shows them in the order they were received from the plugin in the capabilities callback or by calling wombat_plugin:announce_capabilities. If new metrics are added and reported later (either by capabilities or by wombat_plugin:announce_capabilities), WombatOAM will insert the new metrics. If metrics are deleted, they will be still shown to the user at least as long as WombatOAM stores samples from the metric. If the metrics are reordered, WombatOAM will prefer the new ordering. The recommendation is to keep a consistent order and not to reorder existing metrics though for two reasons:

  1. It is better user experience to see the metrics always at the same place.
  2. When some metrics are reordered and some are removed, WombatOAM is not always able to locate the correct position of the removed metrics, so they will be moved to unexpected locations.

Adding and removing metrics causes no problem, unless the same metrics appears at different places different times.

Services capabilities

For each service announced the plugin should declare its identifier, priority, type (configurator, explorer, executor), a label (displayed name on the dashboard) and description, whether the service is internal and exclusive, specification of the arguments of the service (label, type, default value; options field) and a subset of these which are the mandatory argument names (arguments field).

The feature field is the identifier of a service. Multiple plugins can implement the same service. WombatOAM aggregates the announced implementations of a service using the feature field. Then the generalised interface of the service is available for users to submit new requests. When a request arrives WombatOAM will try to initiate the request by asking the satisfied implementations in order. The implementations are ordered by priority and mandatory arguments count. This mechanism allows a way to override built-in services or implement more specific ones (e.g. a custom configuration service which does not use the OTP application environment).

The plugin should use the wombat_plugin_services:create_capability/6 function to create a service capability. (see the wombat_plugin_services API section)

Alarms capabilities

Announcing alarms capabilities is optional. Alarms capabilities defines the assigned tags and provides additional information about the severity, the probable cause and the proposed repair action.

Alarm capabilities are matched to alarms using two identifiers, namely, using capability id and alarm id. If there is no matching alarm capability for a certain alarm, only information included by the alarm will be available and it will be tagged with the default tags. Although alarm id can be an arbitrary Erlang term, the matching algorithm works only with atoms and only those tuples whose first element is an atom. This atom is converted to an UTF-8 binary string, that is matched against the list item of the capability id.

As examples for the matching, consider the ETS limit and the missing application alarms. The alarm id of the ETS limit alarm is ets_limit that matches to the following capability id: [<<"ets_limit">>]. Considering the missing application alarm that has a parametric id {missing_appliaction, App} the correct capability id is [<<"missing_application">>].

The wombat_plugin_utils:create_alarm_capability/5 utility function (refer to the Useful functions in wombat_plugin_utils section for further detail) should be used to create an alarm capability, which should be returned by the capabilities/1 callback or be announced using the wombat_plugin:announce_capabilities/1 function.

Callbacks of the wombat_plugin behaviour

The following callbacks are defined in the wombat_plugin behaviour:

1
2
3
-callback init(Arguments :: [term()]) -> {ok, wombat_types:plugin_state()} |
                                         {skip, Msg :: binary()} |
                                         {error, _}.

When the plugin is started, its init function is called with the arguments specified for the plugin in sys.config/wombat.config. It either returns the initial state, which is typically a record (just like in case of a gen_server module) or an error to indicate an unexpected problem. Alternatively skip can be returned to gracefully exit without generating a crash report on the managed node. Msg will be shown as a notification from the plugin on Wombat dashboard.

1
2
3
-callback capabilities(wombat_types:plugin_state()) ->
              {wombat_types:capabilities(),
               NewState :: wombat_types:plugin_state()}.

After the plugin is started, WombatOAM will retrieve the list of capabilities provided by this plugin by calling the capabilities function. Currently only metrics, alarms and services are handled as capabilities. In the future, this function might be called on other occasions too. A typical pattern is to calculate the capabilities in init, store it in the state record and simply read and return them in capabilities.

1
2
3
4
-callback handle_info(Message :: term(),
                      wombat_types:plugin_state()) ->
              {noreply,
               NewState :: wombat_types:plugin_state()}.

This function is called when the plugin process receives a message, just like in case of a gen_server.

1
-callback terminate(wombat_types:plugin_state()) -> any().

This function is called when the plugin is terminated. This can happen for a number of reasons: the plugin is disabled by the user; WombatOAM is stopped; the node is removed from WombatOAM; the connection between WombatOAM and the node is stopped; etc.

1
2
3
4
-callback collect_metrics(wombat_types:plugin_state()) ->
              {ok,
               [wombat_types:metric_data()],
               NewState :: wombat_types:plugin_state()}.

This function is called periodically for those plugins whose capabilities function reported that they have at least one metric. (Those plugins who reported that they didn't have metrics but later on realized that they do have them can use the wombat_plugin:announce_capabilities function.) The function should return the list of metric samples (i.e. metric values). The order of the samples is irrelevant. This function should not report metrics that have not been announced beforehand via capabilities or wombat_plugin:announce_capabilities.

1
2
3
4
5
6
-callback live_metrics2comp_units([wombat_types:metric_cap_id_last()],
                                  wombat_types:plugin_state()) ->
              {ok,
               [wombat_types:live_metric_comp_unit()],
               NewState :: wombat_types:plugin_state()} |
              {error, term(), wombat_types:plugin_state()}.

When handling live metrics, metrics are divided into "computation units" for the sake of optimization. There will be one collect_live_metrics call for each computation unit (i.e. the metrics in the same computation unit are calculated in one collect_live_metrics call).

As an example, let's assume for example that metric a and metric b are computed by calling a costly function that computes a proplist, and a returns one item of the proplist while b returns another. Metric c is a different independent metric. In this case, we would put a and b into the same computation unit (let's call it ab_group), which would then to be calculated in a common collect_live_metrics function call. c could be in different computation unit (let's call it c_group), so it would be calculated independently.

The live_metrics2comp_units function gets the list of metrics that the user currently wants to monitor as live metrics, and it should return the computation units that include those metrics. The data type of the computation units is up to the plugin.

In the example above, the function would return {ok, [ab_group, c_group], State}.

1
2
-callback collect_live_metrics(wombat_types:live_metric_comp_unit()) ->
    {ok, [wombat_types:live_metric_data()]} | {error, term()}.

For each computation unit that is returned by live_metrics2comp_units, a process will be started. Each process will periodically (once per second by default) call collect_live_metrics with one of the computation units.

In the example above, if the user wanted to monitor metrics a, b and c, we would have two processes, one of them calling collect_live_metrics(ab_group) and the other calling collect_live_metrics(c_group).

When the process for a computation unit crashes, the plugin won't be stopped, only the live collection of the metrics handled by the computation unit.

Notes:

  • All the above callbacks are obligatory.
  • If any of the functions throw an exception, the terminate function is called and the plugin is stopped. (Note that terminate will receive the state data that was given to the function that threw the exception; changes applied to the state but not returned by that function are lost.)

Services callbacks

A request for a service is fulfilled by executing a suitable implementation of the service. A certain request is identified by ReqId, while the implementation to be executed is specified by CapabilityId. State is the current plugin state, it is shared among concurrent requests being served by a certain plugin. The execution consists of the following 3 phases.

1
init_request -> (execute_request)+ -> cleanup_request.
  1. The process begins with asking the implementations whether they are willing to serve the request (init_request/4). The first implementation which accepts the request will be executed.

  2. Real execution takes place by calling the execute_request/4 callback of the implementation. Data pushed back to WombatOAM is created during this phase. In case of periodic requests this callback will be called multiple times (once every period).

  3. After the execution has finished, the implementation is allowed to clean up. Releasing resources, cleaning the plugin state should be the part of the cleanup_request/3.

For each phase a callback is defined in the wombat_plugin_services behaviour described below. All 3 callbacks must be implemented to implement a new service.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
-spec init_request(ReqId :: binary(),
                   CapabilityId :: wombat_types:capability_id(),
                   ReqArgs :: wombat_types:request_args(),
                   State :: wombat_types:plugin_state()) ->
              {out_of_scope,
               ReasonBinStr :: binary(),
               NewState :: wombat_types:plugin_state()} |
              {error,
               ReasonBinStr :: binary(),
               NewState :: wombat_types:plugin_state()} |
              {ok,
               DisplayInfo :: wombat_types:display_info(),
               ExecutionInfo :: wombat_types:execution_info(),
               NewState :: wombat_types:plugin_state()}

This callback initializes a request (identified by ReqId) for a service (announced as CapabilityId by the plugin) based on the input arguments. The validation can have the following outcomes:

  • Serving the request is out of the plugin's scope. For instance, consider a special configurator plugin that changes only the configs of the MongooseIM application. If it is asked to change the config of a Riak application, it is simply not capable of performing the change.
  • The plugin is capable of serving such a request (it has all necessary input) but the provided arguments are incorrect. For instance, consider the Etop service that receives the <<"ETC">> binary as the value of its interval argument for which it only accepts binaries that can be converted to non-negative integers.
  • The plugin is capable and willing to serve the request (all mandatory arguments are given, have been checked and considered to be valid). In this case it initialises the request by storing any necessary data in its state, provides information about how to display the result of the request (refer to wombat_plugin_services:create_display_info/3) and how to execute the request (refer to wombat_plugin_services:create_execution_info/3).
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
-spec execute_request(ReqId :: binary(),
                      CapabilityId :: wombat_types:capability_id(),
                      From :: wombat_types:from_ref(),
                      State :: wombat_types:plugin_state()) ->
              {continue | close,
               no_data | {data, StreamData :: wombat_types:stream_data()},
               NewState  :: wombat_types:plugin_state()} |
              {reply_later,
               NewState  :: wombat_types:plugin_state()} |
              {error,
               ReasonBinStr :: binary(),
               NewState  :: wombat_types:plugin_state()}.

The goal is to really execute the request (identified by ReqId), to provide data to be streamed and then to be displayed based on the previously given DisplayInfo, and to define what will happen to the stream (should be closed or kept open to continue the execution).

It can return:

  • continue to continue a periodic request. (Non-periodic requests cannot return this.) Can return stream data or no_data.
  • close to indicate that the request completed. Can return stream data or no_data.
  • The plugin can indicate to reply_later. This is useful to execute longer jobs in a separate worker process. In this case it can use the wombat_plugin:spawn_worker/1 function to initiate a worker and the From reference received as input argument and the wombat_plugin_services:request_reply/2 function to send stream data back to WombatOAM later.
  • The plugin can indicate an error with a human readable reason to be displayed on the dashboard. In this case depending on the specified restart strategy and the number of previous retries the execution can continue or finish.

Data to be pushed to WombatOAM fall into the following 3 categories.

  1. plain_value(). The simplest category. This will be displayed as is.

  2. stream_data_plain_table(). List of lists built up from plain_value(). This will be rendered as a table on the dashboard.

  3. stream_data_interactive_table(). List of lists built up from interactive_value(). To each value a list of actions is assigned which will be listed under the value's local menu on the dashboard. 2 general API functions and a utility function are available to construct such data, which are wombat_plugin_services:create_interactive_value/2, wombat_plugin_services:create_action/4, wombat_plugin_services:create_process_actions/1.

1
2
3
4
-spec cleanup_request(ReqId :: binary(),
                      CapabilityId :: wombat_types:capability_id(),
                      State :: wombat_types:plugin_state()) ->
              {ok, NewState :: wombat_types:plugin_state()}.

This callback can do any cleanup necessary after the execution of the request has finished. It will always be called, regardless of how the execution finished (successfully completed, failed, or runtime error occurred).

Notes:

  • All these callbacks are evaluated in the plugin process. That means while the callbacks are being evaluated the plugin process cannot handle other tasks (i.e.: cannot push metrics, logs, alarms).

  • The execution of periodic requests can always be stopped by the users. It is stopped by finalising the request instead of scheduling its next execution. Requests being executed are not effected by stop commands, they are allowed to normally terminate. Non-periodic requests cannot be stopped by the users.

  • Information provided in the capabilities is used by

  • WombatOAM to create services by aggregating the capabilities that describe different implementations of the same feature.

  • WombatOAM to categorise the services. Services will be displayed under their category group (configurator, explorer, executor) on the dashboard.

  • Information provided in the display info (DisplayInfo) is used by

  • The wombat_plugin behaviour to control the execution of requests.

  • The dashboard to display data to be streamed by the plugins.

  • The dashboard to decide whether users are allowed to stop requests.

The wombat_plugin_services API

The following functions in the wombat_plugin_services module can be used to implement services (for example to create structures).

1
2
3
4
5
6
7
-spec create_capability(CapabilityID :: binary(),
                        Type :: wombat_types:service_capability_type(),
                        Description :: binary(),
                        Label :: binary(),
                        Feature :: term(),
                        Options :: [wombat_types:service_info_item()]) ->
          wombat_types:capability().

Create a service capability. Properties given in Options override the default properties of the service. These properties together with their defaults are:

  • is_internal (false)
  • is_exclusive (false)
  • priority (0)
  • arguments ([])
  • options ([])

Note

  • The options defined by the same capability should have unique names.
  • The mandatory options specified by listing their names should be defined as options.
1
2
3
4
5
-spec create_string_option(Name :: wombat_types:service_option_name(),
                           Label :: binary(),
                           Default :: binary(),
                           IsEnabled :: boolean()) ->
          wombat_types:service_option().
1
2
3
4
5
-spec create_number_option(Name :: wombat_types:service_option_name(),
                           Label :: binary(),
                           Default :: binary(),
                           IsEnabled :: boolean()) ->
          wombat_types:service_option().
1
2
3
4
5
6
-spec create_enum_option(Name :: wombat_types:service_option_name(),
                         Label :: binary(),
                         Default :: binary(),
                         IsEnabled :: boolean(),
                         OptionValues :: [binary()]) ->
          wombat_types:service_option().

These 3 functions create a scalar option. An empty binary (<<"">>) means no default value. Note that the Default value for enums should be the member of OptionValues (or an empty binary).

1
2
3
4
-spec create_list_option(Name :: wombat_types:service_option_name(),
                         Label :: binary(),
                         Components :: [wombat_types:service_option()]) ->
          wombat_types:service_option().

Create a list option. The components of the list are specified as options. For instance, consider that a list of module names should be given by users. Then, a list option with one component, which is a string option, would be suitable to require this input. For another example, check the built-in configurator service allowing to change a batch of configs at once.

1
2
3
4
-spec create_display_info(DataStructure :: value | table,
                          Label :: binary(),
                          Options :: [display_info_option_item()]) ->
          wombat_types:display_info().

Create a display info about a service. Properties given in Options override the default properties of display info. These properties together with their defaults are:

  • is_interactive (false)
  • table_headers ([])
1
2
3
4
-spec create_execution_info(Period :: once | non_neg_integer(),
                            RetryAfter :: never | non_neg_integer(),
                            MaxRetries :: non_neg_integer()) ->
          wombat_types:execution_info().

Create an execution info about a service. Execution info to be returned by the init_request/4 callback is used by the framework to know how to execute a request.

  • The Period specifies how often data will be streamed. It can be once or a non negative number. once means that data will be streamed only once and users are not allowed to stop the execution of the requests. Periodic requests can always be stopped by the users. The period of executing such requests is defined by the value of this option, namely, the given value defines the number of milliseconds elapsed between two executions.

  • The RetryAfter specifies how the failures should be handled. It can be never or a non negative integer. never means the evaluation of the request should be never retried, whilst the given number defines the number of milliseconds after the evaluation can be retried.

  • The MaxRetries defines the maximum number of attempts to evaluate the request in a row. If the number of attempts reaches the defined maximum, the plugin process gives it up and finalises the request. If its value is 0, the plugin process will never retry the evaluation and gives up immediately after the first failure occurs.

1
2
3
-spec create_interactive_value(Data :: binary(),
                               Actions :: [wombat_types:action()]) ->
          wombat_types:interactive_value().

Create an interactive value within a stream data, can be used to construct a cell in an interactive tables. Data will be displayed on the dashboard as the content of the cell. Actions specifies the content of the local menu. To construct an arbitrary action, use create_action/4. If the Data is a pid, use create_process_actions/1 utility function to define the same local menu that appears for processes in the Etop service's output.

1
2
3
4
5
-spec create_action(Label :: binary(),
                    ObjectType :: node | family,
                    FeatureName :: term(),
                    FeatureArgs :: wombat_types:request_args()) ->
          wombat_types:action().

Create an arbitrary action for an interactive value. Imagine an action as a zero-arity fun expression, which will be evaluated when the user request for it. The body of the fun expression is a complete request for an other, already implemented service. The target of the request can be the node creating the action or this node's family. This is specified by ObjectType. FeatureName is the feature identifying the service, which is implemented as a capability by a plugin. (Same as the 5th argument passed to create_capability/6). FeatureArgs are the request arguments, with which the request will be initialised. Label will be shown as the link of this action in the local menu.

1
-spec create_process_actions(pid()) -> [wombat_types:action()].

Create a list of actions related to the given process. The list of actions can be directly used as the actions of interactive values. The actions are Terminate process, process info, process messages, process dictionary, process state, process stack trace.

1
2
3
-spec request_reply(From :: wombat_types:from_ref(),
                    Reply :: wombat_types:async_reply()) ->
          ok.

This function can be used by a plugin to explicitly send stream data to WombatOAM. When the execute_request/4 callback wants to return and send stream data only later, it can return reply_later and use this function later to send the stream data. The From parameter received in execute_request/4 must be provided to this function. Note well that one From value can be used only once (i.e. it cannot be used to send back multiple stream data messages).

The wombat_plugin API

The following functions in the wombat_plugin module can be used by plugins.

1
2
-spec report_log(Severity :: wombat_types:severity(),
                 LogMessage :: wombat_types:log_message()) -> ok.

Report a notification entry.

1
2
3
-spec raise_alarm(AlarmId :: wombat_types:alarm_id(),
                  AddInfo :: wombat_types:alarm_add_info()) -> ok.
-spec clear_alarm(AlarmId :: wombat_types:alarm_id()) -> ok.

Raise/clear an alarm.

1
-spec announce_capabilities(Capabilities :: wombat_types:capabilities()) -> ok.

Push the list of capabilities to WombatOAM. It needs to be called with the list of all capabilities of the plugin (not only the new ones).

Calling the wombat_plugin API from outside of the plugin process

The wombat_plugin API is simple because when its functions are called, WombatOAM's plugin infrastructure knows who the caller is. But when a plugin calls these functions, WombatOAM will not know who they are; therefore calling these functions from other processes is not allowed. Instead, those processes need to obtain the counterparts of these functions in the wombat_plugin_utils module:

1
2
3
4
-spec report_log_cb(Options :: plugin_options()) ->
          fun((Severity :: wombat_types:severity(),
               LogMsg :: wombat_types:log_message()) -> ok) |
          undefined.

Report a notification entry.

The Options parameter that needs to be passed to these functions is the same as the Arguments parameter that is received by the init function of the module.

The following is an example that shows how this function can be used:

1
2
3
4
init(Options) ->
    LogCB = wo_plugin_utils:report_log_cb(Options),
    LogCB(<<"error">>, <<"Test notification">>),
    {ok, #state{}}.
1
2
3
4
-spec raise_alarm_cb(Options :: plugin_options()) ->
    fun((AlarmId :: term(), Message :: term()) -> ok) | undefined.
-spec clear_alarm_cb(Options :: plugin_options()) ->
    fun((AlarmId :: term()) -> ok) | undefined.

Raise/clear an alarm.

1
2
-spec announce_capabilities_cb(Options :: plugin_options()) ->
    fun((Capabilities :: wombat_types:capabilities()) -> ok) | undefined.

Push the list of capabilities to WombatOAM. It needs to be called with the list of all capabilities of the plugin (not only the new ones).

Useful functions in wombat_plugin_utils

1
-spec binfmt(Fmt :: io:format(), Args :: [term()]) -> binary().

Print the given arguments into a binary.

1
-spec spawn_worker(fun(() -> any())) -> pid().

Spawn a worker process from a plugin. The return value of the fun is ignored. The process is linked to the plugin process and has special treatment. (Never use plain erlang:spawn_link from a plugin process!)

1
2
3
4
5
6
-spec create_metric_capability(MetricId :: wombat_types:capability_id(),
                               Description :: binary(),
                               Type :: wombat_types:metric_type(),
                               Unit :: wombat_types:metric_unit(),
                               Tags :: wombat_types:capability_tags()) ->
          wombat_types:capability().

Create a metric capability term. Note that the create_metric_capability/4 function is deprecated, kept only for backward compatibility. It uses the dev and the op tags to create the metric capability.

1
2
-spec cap_id_to_cap_id_last(wombat_types:capability_id()) ->
          wombat_types:metric_cap_id_last().

Return the last element of the capability id as an atom.

1
2
3
4
5
6
-spec create_alarm_capability(CapabilityId :: wombat_types:capability_id(),
                              Severity :: wombat_types:alarm_severity(),
                              ProbableCause :: binary(),
                              ProposedRepairAction :: binary(),
                              DefaultTags :: wombat_types:capability_tags()) ->
          wombat_types:capability().

Create an alarm capability term.

Starting periodic jobs

The types used are defined in wombat_types.erl:

1
2
-type task_fun() :: fun(() -> ok | stop).
-type millisecs() :: non_neg_integer().
1
2
-spec periodic(Period :: wombat_types:millisecs(),
               Job :: wombat_types:task_fun()) -> pid().

Start a periodic job from a main wombat plugin module. The process either stops when the fun doesn't return ok or the plugin is stopped.

1
-spec stream_task_data(term()) -> ok.

Function to be called by the job (task) processes in order to stream results back to the WombatOAM plugin process. Streamed data format is {'$task_data', TaskPid, Data}.

Using wombat_tracer as a service

If you want to write a plugin that collects trace information, you should use the tracing service provided by the wombat_plugin application. The service is implemented as a server that is locally registered under the name wombat_plugin_tracer.

To subscribe, call wombat_plugin_tracer:subscribe(Who, Topic, Filter), where:

  • Who is the pid of the receiver,
  • Topic is {FlagList, MFA}, where the variables share the types defined in the documentation of erlang:trace_pattern.
  • Filter has the type fun((TraceMsg) -> boolean() | {true, Msg}). It needs to pre-select the trace messages that shall be delivered to the receiver. The type of TraceMsg is defined in the documentation of erlang:trace. If the filter returns true the TraceMsg is forwarded to the subscriber as is. In case the filter returns a custom Msg that will be then sent to the subscriber instead of the original TraceMsg.

Optionally also a FinishFlag can be provided when calling wombat_plugin_tracer:subscribe(Who, Topic, Filter, FinishFlag) which can have the following values:

  • undefined (default): only call trace messages are sent to the tracer
  • return_trace: apart from the call messages also a `return_from' trace message is sent upon return from the traced function.
  • exception_trace: same as return_trace, plus; if the traced function exits due to an exception, an `exception_from' trace message is generated, whether the exception is caught or not.

The result of the call can be:

  • ok, meaning the subscription was successful and the tracing is active.
  • {warning, Reason}, meaning the subscription was okay but the tracing is not active.
  • {error, Reason}, meaning the subscription wasn't done due to bad arguments were passed as parameters.

The tracer sends messages that have the form {wombat_plugin_tracer, Msg}, where Msg is one of the following:

  • TraceMsg as defined in the documentation of erlang:trace.
  • tracer_inactived, meaning no trace messages can be expected, tracing is not active.
  • tracer_actived, meaning trace messages can be expected, tracing is active.

A strong recommendation is to link the receivers – the plugins – to the tracer. Hence, the plugins can restart in case the tracer restarts, simplifying the implementation of the plugins.

Notes:

  • There is no need for unsubscribing from the tracer, as the plugin is monitored by the tracer.
  • Tracing calls towards the functions of a module that is reloaded or loaded after the trace pattern has been enabled is supported. However, there is one exception. Trace patterns matching to any modules ('_') won't receive traces for modules that have been reloaded or loaded after the pattern has been activated. Also note that when loading a module is triggered by a first call towards that module, then this first call will not be traced.
Example

Assume you want to keep track which modules are loaded into the VM. Then, first subscribe to trace erlang:load_module/2 calls during init:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
init(_Args) ->
    Topic = {[], {erlang, load_module, 2}},
    Filter = fun({trace, _, call, {erlang, load_module, [_ | _]}}) -> true;
                (_) -> false
             end,

    case wombat_plugin_tracer:subscribe(self(), Topic, Filter) of
        ok ->
            ok;
        {warning, Warning} ->
            Formatted =
                wombat_plugin_utils:binfmt("Tracers response: ~p", [Warning]),
            wombat_plugin:report_log(<<"warning">>, Formatted);
        {error, Reason} ->
            Formatted =
                wombat_plugin_utils:binfmt("Tracers response: ~p", [Reason]),
            wombat_plugin:report_log(<<"error">>, Formatted)
    end,

    link(whereis(wombat_plugin_tracer)).

To receive the collected trace messages and other system messages sent by the tracer, add the following function clause to handle_info/2.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
handle_info({wombat_plugin_tracer, tracer_actived}, State) ->
    wombat_plugin:report_plugin_error(<<"info">>, <<"Tracer activated.">>),
    {noreply, State};
handle_info({wombat_plugin_tracer, tracer_inactived}, State) ->
    wombat_plugin:report_plugin_error(<<"warning">>, <<"Tracer is inactive.">>),
    {noreply, State};
handle_info({wombat_plugin_tracer, {trace, Pid, call, MFA}}, State) ->
    {erlang, load_module, [Module, _Binary]} = MFA,
    Msg = wombat_plugin_utils:binfmt("~p module is loaded", [Module]),
    wombat_plugin:report_log(<<"info">>, Msg),
    {noreply, State};

Example of a complete plugin

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
%%%=============================================================================
%%% @copyright 2015-2016, Erlang Solutions Ltd
%%% @doc Example WombatOAM plugin.
%%%
%%% This example plugin demonstrates how to write a simple WombatOAM plugin. It does
%%% the following:
%%%
%%% - It provides two metrics: nodes_count and hidden_nodes_count.
%%% - It raises an alarm and sends a notification if there is a process
%%%   registered with the name "Troublemaker". This is checked once a second.
%%%   (In a real plugin, this value would be much higher to avoid overloading
%%%   the system, e.g. one minute.)
%%%
%%% To activate this plugin, the following entry needs to be added to
%%% wombat.config:
%%%
%%% ```
%%% {replace, wo_plugins, plugins, example,
%%%  {example, [{kernel, ".*"}], [], []}}.
%%% '''
%%% @end
%%%=============================================================================
-module(wombat_plugin_example).
-copyright("2015-2016, Erlang Solutions Ltd.").

-behaviour(wombat_plugin).
-behaviour(wombat_plugin_services).

%% wombat_plugin callbacks
-export([init/1, capabilities/1,
         handle_info/2, terminate/1,
         collect_metrics/1, live_metrics2comp_units/2, collect_live_metrics/1]).

%% wombat_plugin_services callbacks
-export([init_request/4, execute_request/4, cleanup_request/3]).

-define(CHECK_INTERVAL, 1000). % 1 second

%%------------------------------------------------------------------------------
%% Types
%%------------------------------------------------------------------------------

-record(state,
        {
         %% The plugin's internal representation of the metrics provided.
         metric_info_tuples = [] :: [metric_info_tuple()],

         %% The WombatOAM representation of the metrics provided.
         capabilities = [] :: [wombat_types:capability()],

         %% True if there is a process called 'troublemaker'.
         troublemaker_exists :: boolean(),

         requests = [] :: [{ReqId :: binary(),
                            ReqInfo :: tm_mode()}]
         }).

-type state() :: #state{}.
%% Plugin state.

-type metric_internal_id() :: atom().
%% An id used by this plugin to identify a metric.

-type metric_info_tuple() :: {MetricInternalId :: metric_internal_id(),
                              MetricNameBin :: binary(),
                              Type :: wombat_types:metric_type(),
                              Unit :: wombat_types:metric_unit(),
                              Tags :: wombat_types:capability_tags()}.
%% A tuple that is used by this plugin to describe a metric.

-type tm_mode() :: binary().
%% the mode of the troublemaker to be started

%%%=============================================================================
%% wombat_plugin callbacks
%%%=============================================================================

%%------------------------------------------------------------------------------
%% @doc Initialise the plugin state.
%% @end
%%------------------------------------------------------------------------------
-spec init(Arguments :: [term()]) -> {ok, state()} | {error, _}.
init(_) ->

    %% Metrics
    Metrics = get_metric_info_tuples(),
    MetricCapabilities =
        [ wombat_plugin_utils:create_metric_capability(
            metric_name_to_capability_id(Name), Name, Type, Unit, Tags)
          || {_Id, Name, Type, Unit, Tags} <- Metrics ],
    ServiceCapabilities =  service_capabilities(),
    AlarmCapabilities = alarm_capabilities(),

    Capabilities =
        MetricCapabilities ++ ServiceCapabilities
            % Note that announcing alarm capabilities is optional.
            ++ AlarmCapabilities,

    %% Alarms and notifications pushed to WombatOAM based on periodic checks
    %% The process started as periodic task will check whether a process
    %% registered as 'troublemaker' exists. The result of each check is
    %% streamed to the plugin process to perform any necessary further actions.
    wombat_plugin:periodic(
      ?CHECK_INTERVAL,
      fun() ->
              %% Determine the current status of the troublemaker process.
              TroubleMaker = erlang:whereis(troublemaker),
              %% Inform the plugin process about the troublemaker process.
              ok = wombat_plugin:stream_task_data(TroubleMaker)
      end),

    %% Perform the initial check.
    TroublemakerExists =
        case erlang:whereis(troublemaker) of
            undefined ->
                wombat_plugin:clear_alarm(there_is_a_troublemaker),
                false;
            Pid ->
                wombat_plugin:raise_alarm(there_is_a_troublemaker,
                                          [{pid, Pid}]),
                true
        end,

    {ok, #state{metric_info_tuples = Metrics,
                capabilities = Capabilities,
                troublemaker_exists = TroublemakerExists}}.

%%------------------------------------------------------------------------------
%% @doc Return the capabilities of the plugin.
%% @end
%%------------------------------------------------------------------------------
-spec capabilities(state()) -> {wombat_types:capabilities(), state()}.
capabilities(#state{capabilities = Capabilities} = State) ->
    {Capabilities, State}.

%%------------------------------------------------------------------------------
%% @doc Handle a message.
%% @end
%%------------------------------------------------------------------------------
-spec handle_info(Message :: term(), state()) -> {noreply, state()}.
handle_info({'$task_data', _Pid, Troublemaker},
            #state{troublemaker_exists = TroublemakerExistsOld} = State) ->
    NewState =
        case {TroublemakerExistsOld, Troublemaker} of
            {false, undefined} ->
                %% No troublemaker.
                State;
            {true, undefined} ->
                %% The troublemaker disappeared.
                wombat_plugin:clear_alarm(there_is_a_troublemaker),
                State#state{troublemaker_exists = false};
            {false, Pid} ->
                %% The troublemaker appeared.
                wombat_plugin:raise_alarm(there_is_a_troublemaker,
                                          [{pid, Pid}]),
                Msg = wombat_plugin_utils:binfmt(
                        "We have a troublemaker: ~p", [Pid]),
                wombat_plugin:report_log(<<"warning">>, Msg),
                State#state{troublemaker_exists = true};
            {true, Pid} ->
                %% The troublemaker is still there.
                Msg = wombat_plugin_utils:binfmt(
                        "The troublemaker is still there: ~p", [Pid]),
                wombat_plugin:report_log(<<"warning">>, Msg),
                State
        end,
    {noreply, NewState};
handle_info(_Message, State) ->
    {noreply, State}.

%%------------------------------------------------------------------------------
%% @doc Terminate the plugin.
%% @end
%%------------------------------------------------------------------------------
-spec terminate(state()) -> any().
terminate(_State) ->
    ok.

%%------------------------------------------------------------------------------
%% @doc Return the metrics' values belonging to the already announced
%% capabilities.
%% @end
%%------------------------------------------------------------------------------
-spec collect_metrics(state()) -> {ok, [wombat_types:metric_data()], state()}.
collect_metrics(#state{metric_info_tuples = Metrics} = State) ->
    Samples = [ {metric, metric_name_to_capability_id(Name), Type,
                 get_metric_value(Id)}
                || {Id, Name, Type, _Unit, _Tags} <- Metrics ],
    {ok, Samples, State}.

%%------------------------------------------------------------------------------
%% @doc Convert live metrics into computation units.
%% @end
%%------------------------------------------------------------------------------
-spec live_metrics2comp_units(LiveMs :: [wombat_types:metric_cap_id_last()],
                              state()) ->
          {ok, [metric_info_tuple()], state()} | {error, term(), state()}.
live_metrics2comp_units(LiveMs, #state{metric_info_tuples = Metrics} = State) ->
    %% Return those metric_info_tuples whose cap_id_last is present in LiveMS
    %% (i.e. those metrics that shall be collected).
    CompUnits = [MetricInfoTuple
                 || MetricInfoTuple <- Metrics,
                    lists:member(
                      metric_info_tuple_to_cap_id_last(MetricInfoTuple),
                      LiveMs)],
    {ok, CompUnits, State}.

%%------------------------------------------------------------------------------
%% @doc Return the values of the given live metric.
%% @end
%%------------------------------------------------------------------------------
-spec collect_live_metrics(MetricInfoTuple :: metric_info_tuple()) ->
          {ok, [wombat_types:live_metric_data()]} | {error, term()}.
collect_live_metrics({Id, Name, Type, _Unit, _Tags}) ->
    {ok, [{live_metric, metric_name_to_capability_id(Name), Type,
           get_metric_value(Id)}]}.

%%------------------------------------------------------------------------------
%% @doc Initialize a service request.
%% @end
%%------------------------------------------------------------------------------
-spec init_request(ReqID :: binary(),
                   CapabilityID :: wombat_types:capability_id(),
                   ReqArgs :: wombat_types:request_args(),
                   State :: wombat_types:plugin_state()) ->
    {ok,
     DisplayInfo :: wombat_types:display_info(),
     ExecutionInfo :: wombat_types:execution_info(),
     NewState :: wombat_types:plugin_state()} |
    {out_of_scope,
     ReasonBinStr :: binary(),
     NewState :: wombat_types:plugin_state()} |
    {error,
     ReasonBinStr :: binary(),
     NewState :: wombat_types:plugin_state()}.
init_request(_ReqID, [<<"troublemaker status">>], _ReqArgs, State) ->
    {ok,
     wombat_plugin_services:create_display_info(
       _DataStructure = value,
       _Label = <<"Troublemaker status">>,
       _Options = []),
     wombat_plugin_services:create_execution_info(
       _Period = once,
       _RetryAfter = never,
       _MaxRetries = 0),
     State};
init_request(_ReqID, [<<"troublemaker watcher">>], _ReqArgs, State) ->
    {ok,
     wombat_plugin_services:create_display_info(
       _DataStructure = table,
       _Label = <<"Troublemaker status">>,
       _Options = [
                   {is_interactive, true},
                   {table_headers, [<<"Status">>]}
                  ]),
     wombat_plugin_services:create_execution_info(
       _Period = 3000,
       _RetryAfter = never,
       _MaxRetries = 0),
     State};
init_request(ReqID, [<<"troublemaker start">>], ReqArgs, State) ->
    case proplists:get_value(<<"mode">>, ReqArgs) of
        undefined ->
            {error, <<"Mandatory argument 'mode' missing.">>, State};
        Mode when Mode =:= <<"Persistent">>; Mode =:= <<"Temporary">> ->
            {ok,
             wombat_plugin_services:create_display_info(
               _DataStructure = table,
               _Label = <<"Troublemaker process id">>,
               _Options = [
                           {is_interactive, true},
                           {table_headers, [<<"Result">>, <<"Pid">>]}
                          ]),
             wombat_plugin_services:create_execution_info(
               _Period = once,
               _RetryAfter = never,
               _MaxRetries = 0),
             add_req_info(ReqID, Mode, State)};
        Mode ->
            {out_of_scope,
             wombat_plugin_utils:binfmt(
               "Unknown value for argument mode: ~p", [Mode]), State}
    end;
init_request(_ReqID, [<<"troublemaker stop">>], _ReqArgs, State) ->
    {ok,
     wombat_plugin_services:create_display_info(
       _DataStructure = value,
       _Label = <<"Result">>,
       _Options = []),
     wombat_plugin_services:create_execution_info(
       _Period = once,
       _RetryAfter = never,
       _MaxRetries = 0),
     State}.

%%------------------------------------------------------------------------------
%% @doc Execute a service request.
%% @end
%%------------------------------------------------------------------------------
-spec execute_request(ReqID :: binary(),
                          CapabilityID :: wombat_types:capability_id(),
                          From :: wombat_types:from_ref(),
                          State :: wombat_types:plugin_state()) ->
              {continue | close,
               no_data | {data, StreamData :: wombat_types:stream_data()},
               NewState  :: wombat_types:plugin_state()} |
              {reply_later,
               NewState  :: wombat_types:plugin_state()} |
              {error,
               ReasonBinStr :: binary(),
               NewState  :: wombat_types:plugin_state()}.
execute_request(_ReqId, [<<"troublemaker status">>], _From, State) ->
    Status = troublemaker_status(),

    {close, {data, Status}, State};
execute_request(_ReqId, [<<"troublemaker watcher">>], From, State) ->
    %% The worker is only spawned for the sake of example
    wombat_plugin_utils:spawn_worker(
      fun() ->
              Data = watch_troublemaker(),
              wombat_plugin_services:request_reply(From, {continue, {data, Data}})
      end),

    {reply_later, State};
execute_request(ReqId, [<<"troublemaker start">>], _From, State) ->
    Mode = get_req_info(ReqId, State),

    {Result, Pid} = start_troublemaker(Mode),

    BinPid = wombat_plugin_utils:binfmt("~p", [Pid]),
    PidActions = wombat_plugin_services:create_process_actions(Pid),
    Data =
        [[
          wombat_plugin_services:create_interactive_value(Result, []),
          wombat_plugin_services:create_interactive_value(BinPid, PidActions)
         ]],

    {close, {data, Data}, State};
execute_request(_ReqId, [<<"troublemaker stop">>], _From, State) ->
    case stop_troublemaker() of
        error ->
            {error, <<"No Troublemaker process running">>, State};
        ok ->
            {close, {data, <<"Done">>}, State}
    end.

%%------------------------------------------------------------------------------
%% @doc Clean up a service request.
%% @end
%%------------------------------------------------------------------------------
-spec cleanup_request(ReqID :: binary(),
                          CapabilityID :: wombat_types:capability_id(),
                          State :: wombat_types:plugin_state()) ->
              {ok, NewState :: wombat_types:plugin_state()}.
cleanup_request(ReqID, [<<"troublemaker_start">>], State) ->
    {ok, delete_req_info(ReqID, State)};
cleanup_request(_, _, State) ->
    {ok, State}.

%%==============================================================================
%% Internal functions
%%==============================================================================

alarm_capabilities() ->
    % The capability Id defines the matching alarms.
    % Considering CapabilityId, the matching alarms are identified by
    % - the 'there_is_a_troublemaker' atom, or
    % - any tuple with arbitrary size while the first element of the
    % tuple is the 'there_is_a_troublemaker' atom, for examples,
    % {there_is_a_troublemaker, Pid} and {there_is_a_troublemaker,[Pid]}.
    CapabilityId = [<<"there_is_a_troublemaker">>],
    Severity = minor,
    ProbableCause =
        <<"A process has been registered with the name troublemaker.">>,
    ProposedRepairAction =
        <<"Use the Stop Troublemaker service to terminate the process.">>,
    % Relevant only for operators
    Tags = [<<"op">>],
    [wombat_plugin_utils:create_alarm_capability(
       CapabilityId, Severity, ProbableCause, ProposedRepairAction, Tags)].

%%------------------------------------------------------------------------------
%% @doc Return the metrics that this plugin provides (in its own internal
%% representation).
%% @end
%%------------------------------------------------------------------------------
-spec get_metric_info_tuples() -> [metric_info_tuple()].
get_metric_info_tuples() ->
    [{nodes_count,
      <<"Number of non-hidden nodes">>,
      counter,
      numeric,
      % Metric is relevant for developers.
      [<<"dev">>]},
     {hidden_nodes_count,
      <<"Number of hidden nodes">>,
      counter,
      numeric,
      % Metric is relevant for both developers and operators.
      [<<"dev">>, <<"op">>]}].

%%------------------------------------------------------------------------------
%% @doc Calculate the value of a given metric.
%% @end
%%------------------------------------------------------------------------------
-spec get_metric_value(MetricInternalId :: metric_internal_id()) -> integer().
get_metric_value(nodes_count) ->
    length(nodes());
get_metric_value(hidden_nodes_count) ->
    length(nodes(hidden)).

%%------------------------------------------------------------------------------
%% @doc Converts a metric name info a capability id.
%% @end
%%------------------------------------------------------------------------------
-spec metric_name_to_capability_id(MetricName :: binary()) ->
          wombat_types:capability_id().
metric_name_to_capability_id(MetricName) ->
    [<<"Example metrics">>, MetricName].

%%------------------------------------------------------------------------------
%% @doc Convert a metric from a metric_info_tuple into a metric_cap_id_last
%% value.
%% @end
%%------------------------------------------------------------------------------
-spec metric_info_tuple_to_cap_id_last(metric_info_tuple()) ->
          wombat_types:metric_cap_id_last().
metric_info_tuple_to_cap_id_last({_Id, Name, _Type, _Unit, _Tags}) ->
    wombat_plugin_utils:cap_id_to_cap_id_last(
      metric_name_to_capability_id(Name)).

%%==============================================================================
%% Internal functions - Services
%%==============================================================================

service_capabilities() ->
    [wombat_plugin_services:create_capability(
       [<<"troublemaker status">>], % CapabilityId
       explorer, % Type
       <<"Return whether the Troublemaker process is alive or not.">>, % Description
       <<"Get Troublemaker status">>, % Label
       troublemaker_status, % Feature
       []), % Options - no arguments, use defaults
     wombat_plugin_services:create_capability(
       [<<"troublemaker watcher">>], % CapabilityId
       explorer, % Type
       <<"Periodically return whether the Troublemaker process is alive or not.">>, % Description
       <<"Watch Troublemaker">>, % Label
       troublemaker_watcher, % Feature
       []), % Options - no arguments, use defaults
     wombat_plugin_services:create_capability(
       [<<"troublemaker start">>], % CapabilityId
       executor, % Type
       <<"Start the Troublemaker process.">>, % Description
       <<"Start Troublemaker">>, % Label
       troublemaker_start, % Feature
       [
        {is_internal, false},
        {options, [wombat_plugin_services:create_enum_option(
                     <<"mode">>, % Name
                     <<"Mode">>, % Label
                     <<"">>, % No Default
                     true, % IsEnabled
                     [<<"Persistent">>, <<"Temporary">>] % EnumValues
                    )]},
        {arguments, [<<"mode">>]}
       ]), % Options
     wombat_plugin_services:create_capability(
       [<<"troublemaker stop">>], % CapabilityId
       executor, % Type
       <<"Stop the Troublemaker process.">>, % Description
       <<"Stop Troublemaker">>, % Label
       troublemaker_stop, % Feature
       []) % Options
    ].

%%------------------------------------------------------------------------------
%% @doc Start a troublemaker process if one is not started already
%% @end
%%------------------------------------------------------------------------------
-spec start_troublemaker(tm_mode()) -> {binary(), pid()}.
start_troublemaker(Mode) ->
    Parent = self(),
    TMPid =
        spawn(
          fun() ->
                  try register(troublemaker, self()) of
                      true ->
                          Parent ! {started, self()},
                          Timeout =
                              case Mode of
                                  <<"Persistent">> ->
                                      infinity;
                                  <<"Temporary">> ->
                                      10000
                              end,
                          receive
                              stop -> ok
                          after
                              Timeout -> ok
                          end
                  catch error:badarg ->
                          Parent ! {already_started, self()}
                  end
          end),
    receive
        {started, TMPid} ->
            {<<"Started">>, TMPid};
        {already_started, TMPid} ->
            {<<"Already started">>, whereis(troublemaker)}
    end.

%%------------------------------------------------------------------------------
%% @doc Stop the troublemaker process
%% @end
%%------------------------------------------------------------------------------
-spec stop_troublemaker() -> ok | error.
stop_troublemaker() ->
    case whereis(troublemaker) of
        undefined ->
            error;
        Pid ->
            exit(Pid, shutdown),
            ok
    end.

%%------------------------------------------------------------------------------
%% @doc Check troublemaker status and create an interactive table data
%% accordingly.
%% @end
%%------------------------------------------------------------------------------
-spec watch_troublemaker() -> wombat_types:stream_data_interactive_table().
watch_troublemaker() ->
    case whereis(troublemaker) of
        undefined ->
            [[wombat_plugin_services:create_interactive_value(
                <<"Not running">>, [])]];
        _Pid ->
            Action = wombat_plugin_services:create_action(
                       <<"Stop Troublemaker">>, %% Label
                       node, %% Object type - this node
                       troublemaker_stop, %% Feature name
                       [] %% No arguments
                      ),
            [[wombat_plugin_services:create_interactive_value(
                <<"Running">>, [Action])]]
    end.

%%------------------------------------------------------------------------------
%% @doc Return troublemaker status as a binstring
%% @end
%%------------------------------------------------------------------------------
-spec troublemaker_status() -> binary().
troublemaker_status() ->
    case whereis(troublemaker) of
        undefined ->
            <<"Not running">>;
        _Pid ->
            <<"Running">>
    end.

%%------------------------------------------------------------------------------
%% @doc Add info about a request to the state.
%% @end
%%------------------------------------------------------------------------------
-spec add_req_info(ReqId :: binary(),
                   ReqInfo :: tm_mode(),
                   State :: state()) -> NewState :: state().
add_req_info(ReqId, ReqInfo, #state{requests = Requests} = State)->
    State#state{requests = [{ReqId, ReqInfo}|Requests]}.

%%------------------------------------------------------------------------------
%% @doc Get the info of a request from the state.
%% @end
%%------------------------------------------------------------------------------
-spec get_req_info(ReqId :: binary(),
                   State :: state()) -> tm_mode().
get_req_info(ReqId, #state{requests = Requests}) ->
    {_ReqId, ReqInfo} = lists:keyfind(ReqId, 1, Requests),
    ReqInfo.

%%------------------------------------------------------------------------------
%% @doc Delete the info of a request from the state.
%% @end
%%------------------------------------------------------------------------------
-spec delete_req_info(ReqId :: binary(),
                      State :: state()) -> NewState :: state().
delete_req_info(ReqId, #state{requests = Requests} = State)->
    NewRequests = lists:keydelete(ReqId, 1, Requests),
    State#state{requests = NewRequests}.

Rules about passing callback functions to non-WombatOAM processes

There are two important rules to keep in mind when passing a callback function to a non-WombatOAM process:

  1. Agent modules (including plugin modules and plugin infrastructure modules) should never pass a reference to an anonymous or local function (e.g. sys:install(interesting_gen_server, {fun (FuncState, SysMsg, ServerState) -> ... end, FuncState0}) or sys:install(interesting_gen_server, {fun my_dbg/3, FuncState0})) to a non-agent process, because when the agent module is purged, the process with the reference to the unloaded module will be killed by code:purge.

Agent modules should pass only exported functions using the MFA syntax (e.g. sys:install(interesting_gen_server, {fun ?MODULE:my_dbg/3, FuncState0})), because this way the non-WombatOAM process will keep only the MFA in its memory as opposed to a reference, so it is not affected by the agent module being purged. When the callback is called, the caller will get an "undefined function" error, but that can be caught easily by the non-WombatOAM process. The plugin developer should check whether the error is indeed caught by the process that the plugin is observing.

  1. The callback functions should be very quick: they should not take more than 1 second even if the system is loaded heavily. This is because if a process is executing a callback function defined in an agent module, WombatOAM will give 1 second for that function call to finish before doing a hard purge (which would kill the process if it were still executing the callback).

The following snippet demonstrates the problem behind the first rule:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
$ cat test.erl

-module(test).
-compile(export_all).

 f() ->
     io:format("Finished f_fun").

 f_fun() ->
     fun() ->
         io:format("Finished f_fun")
     end.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
$ erl

% We load the test module.
1> c(test).
{ok,test}

% We create a reference to a function in the test module.
2> F = test:f_fun().
#Fun<test.0.124694843>

% We don't have old code yet (only new code), so check_process_code is
% false when called with the shell process.
3> erlang:check_process_code(self(), test).
false

% We mark the test module as old code.
4> code:delete(test).
true

% Now check_process_code says that we do have old code.
5> erlang:check_process_code(self(), test).
true

% Code purge calls check_process_code on each process to decide if it
% uses the old version of the purged module, and if so, it kills the
% process. In this case it kills the shell process.
6> code:purge(test).
*** ERROR: Shell process terminated! ***
Eshell V5.10.4  (abort with ^G)

If line 2 is replaced with F = fun test:f_fun/0, then this problem will not occur:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
% Now F only contains the information that is should call
% test:f_fun/0, and not a real reference that points inside the byte
% code of the test module.
2> F = fun test:f/0.
#Fun<test.f.0>

[...]

% Therefore it doesn't use the test module...
5> erlang:check_process_code(self(), test).
false

% ...and therefore it is not killed by purge.
6> code:purge(test).
false

% If we now call F(), we will simply get an undef error that can be
% caught by 'catch'. Before doing that, let's set the path to an
% empty list, otherwise Erlang would automatically load test.beam
% when we call F.
7> code:set_path([]).
true

8> F().
** exception error: undefined function test:f/0

9> catch F().
{'EXIT',{undef,[{test,f,[],[]},
                {erl_eval,do_apply,5,[{file,"erl_eval.erl"},{line,560}]},
                {erl_eval,expr,5,[{file,"erl_eval.erl"},{line,357}]},
                {shell,exprs,7,[{file,"shell.erl"},{line,674}]},
                {shell,eval_exprs,7,[{file,"shell.erl"},{line,629}]},
                {shell,eval_loop,3,[{file,"shell.erl"},{line,614}]}]}}

A typical scenario is to pass an MFA (which points to a WombatOAM plugin) to a non-WombatOAM process that will use it as a callback. Examples include:

  • Passing debug functions to gen processes using sys:install/sys:remove. This scenario is analysed below.
  • Passing callback functions to event handler processes.

Let's say the plugin uses sys:install to install a debug function into the interesting_gen_server process. When WombatOAM wants to stop the plugin, the plugin shall call sys:remove, with 0 timeout:

1
2
3
4
terminate(_State) ->
    ...
    catch sys:remove(interesting_gen_server, fun ?MODULE:my_dbg_function/3, 0),
    ...