Monitoring¶
MetalPipe lets you easily monitor your pipeline, identify bottlenecks, and help diagnose failures.
Logging table¶
While a pipeline is being executed, a table of information will periodically be logged (at the INFO logging level). Each row provide diagnostic information about a single node in the pipeline. This is a typical example:
We’ll go through each column of the table.
The Node column contains the name of a node. This is the name that was
given in the configuration file as a top-level key in the nodes section.
If the name is printed in red (as in contacts_epoch_to_timestamp in the
example), then the node is a “bottleneck”. In order to identify bottlenecks,
MetalPipe periodically polls each node to determine if (1) it input queue is
full and (2) its output queue is not full. If those conditions are frequently
met, then the node is identified as a bottleneck.
Note that being a bottleneck is not necessarily a sign of inefficiency. For any sufficiently long-running pipeline, it is very likely that some node will happen to be the slowest, and it will be considered a bottleneck.
The Class column simply gives the class of the MetalNode object, which
tells you what function it is performing.
The Received, Sent, and Queued columns tell you how many messages
are at various stages of processing. The Received number indicate how
many messages have been procesed by the node, including any message that is
currently being procesed. Sent gives how many messages have been output
by this node. Finally, Queued is the number of messages that are on that
nodes incoming queue(s). If there are several incoming queues, then this number
is the sum. Note that for a source node, the value of Received will always
be zero, and for any sink node, the value of Sent will be zero.
The Status column has three possible values: running, success,
and error. Here, success means that the node has completed its work
and has terminated without raising an error. A node is considered to be
done with its work when its parent nodes (if any) have completed, its incoming
queues are all empty, and it is not processing any messages. An error is
indicated whenever a node raises an Exception. When this happens, the entire
pipeline is shut down automatically. These status messages are colored yellow,
green, and red respectively.
Finally Time is the total amount of time the node has spent running. When
it is in a non-running state (either success or error), the clock stops.
