20. 12. 2022 Attilio Broglio NetEye

How to Parse HTML Email Messages with Tornado

Tornado is a CEP “Complex Event Processor” that receives reports of events from data sources such as monitoring and email, matches them against preconfigured rules, and executes the actions associated with those rules. Some vendors provide static notification systems that cannot be customized. For example, during one project we were faced with a tool that only sends notifications to NetEye via email in HTML format.

REGEXes can provide the required parsing flexibility, but some “special” delimiters can prove to be a source of problems. So Tornado should be able to manage an email like this one:

{"body":"\n<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\n<html>\n<head>\n  <title>Mail Title</title>\n</head>\n<body bgcolor='#eceded'>\n\t 
...
<span class=\"row\">Field1</span></td>\n\t\t\t<td><span class=\"row\">17,20 MWh</span>
..."}

The value “Field1” is inside a SPAN and should be extracted using a REGEX like:

<span class="row">Field1</span></td>\n\t\t\t<td><span class="row">(.+?)\s

But this syntax is not properly managed by RUST engine in Tornado because the \n and \t are not correctly interpreted, and the parsing of this part \n\t\t\t fails. In order to bypass this issue we use hexadecimal notation for the match filter and replace these fields with:

  • \x0a instead of \n
  • \s+ instead of \t

The final result is a Match filter able to manage these special character and provide us with the correct value:

Example of WITH session in order to extract the variable Filed1
Example of extracted value (with some replace)

The REGEX used to extract this value is:

<span class="row">Field1</span></td>\x0a\s+<td><span class="row">(.+?)\s

So by combining Tornado and a REGEX with hexadecimal we can also parse and extract fields from HTML emails with special characters.

These Solutions are Engineered by Humans

Did you like this article? Does it reflect your skills? We often get interesting questions straight from our customers who need customized solutions. In fact, we’re currently hiring for roles just like this and other roles here at Würth Phoenix.

Attilio Broglio

Attilio Broglio

Author

Attilio Broglio

Leave a Reply

Your email address will not be published. Required fields are marked *

Archive