30. 09. 2021 Damiano Chini Development, NetEye

Tornado: Tracing

How can we allow a Tornado administrator to successfully track down the flow of an event through Filters, Rules and Actions of Tornado, if Tornado is processing thousands of events per second?

Tornado administrators can have a hard time reading Tornado logs to understand why for example an action error comes from. Take this log as example:

Aug 20 11:53:49 neteye tornado[59069]: Aug 20 11:53:49.649 ERROR tornado_executor_smart_monitoring_check_result: SmartMonitoringExecutor - Director host creation action failed with error ActionExecutionError { message: "DirectorExecutor API returned an error. Response status: 500 Internal Server Error. Response body: {\n \"error\": \"Error during the creation of the object at runtime\"\n}\n", can_retry: true, code: None }.

How can I understand why this action was triggered? Which Tornado Action do I have to fix? in which Rule?

What I would need to do is to look back in the logs and try to find a related log, which tells me the Rule that was matched.

But this becomes an impossible task if Tornado is processing thousands of events each second.

This is where Tracing can help us! While “Logging” aims at giving information about a single incident happening in the application, “Tracing” has awareness of whole application flow and can then provide information on which is the journey that an event has gone through before arriving at a certain point in the application.

In the context of our Tornado application, this means that with Tracing, we can not only for example inform the user that an Action failed, but also which is the Event that triggered the Action, and which Filters and Rules the Event went through before triggering the Action.

From NetEye 4.19, Tornado integrates the Tokio Tracing project and from NetEye 4.20 Tornado implements traces for Event flows which depict the journey of an Event in the Tornado application. To see what this actually means in practice you can have a look at the following extract:

Aug 20 13:08:08 neteye tornado[8624]: Aug 20 13:08:08.879 ERROR MatcherActor{trace_id="4c72f980-bae5-4bb3-afb2-0dbbea18d084"}:dispatch_filter{name="root"}:dispatch_ruleset{name="ruleset_01"}:dispatch_rule{name="smart_monitoring_check_result"}:dispatch_action{action=0 action_id="smart_monitoring_check_result"}: tornado_executor_smart_monitoring_check_result: SmartMonitoringExecutor - Director host creation action failed with error ActionExecutionError { message: "DirectorExecutor API returned an error. Response status: 500 Internal Server Error. Response body: {\n    \"error\": \"Error during the creation of the object at runtime\"\n}\n", can_retry: true, code: None, data: {"method":"POST","url":"https://httpd.neteyelocal/neteye/director/host?live-creation=true","payload":{"object_name":"acme-host","object_type":"Object"}} }.

Here you can see that the Event identified by the given “trace_id”, went through the Filter “root”, the Ruleset “ruleset_01”, the rule “smart_monitoring_check_result”, triggered its Action “smart_monitoring_check_result” and this action execution resulted in an error. Having this information, we can now know for sure which action we need to modify to fix the error.

We applied Tracing to this particular use case in Tornado, but we just started exploring its opportunities, so it is very likely that in the next future many more improvements will be brought to Tornado thanks to Tracing!

Damiano Chini

Damiano Chini

Author

Damiano Chini

Leave a Reply

Your email address will not be published. Required fields are marked *

Archive