A Custom Query Language
If you recall, our CEP processor currently forces us to write elixir code to work on our windows of data, and our current implementation stores and operates on our data in the same function.
In a typical CEP application, the storing of incoming data is invisible to the end user and the code operating on it. Also, most (all?) CEP libraries and frameworks tend to provide a SQL like query language. In another system our operation code would be written like
If we squint a little we can see that the above code looks a lot like our Elixir Worker module.
CEP frameworks also tend to have the concept of streams of data, your code queries the stream and operates on the values returned. Streams are also usually defined using a SQL like define statement.
Again, with a bit of squinting these definiations like a lot like elixir structs, so lets see what it takes to convert SQL like statements into Elixir code so we can add the same functionality to our CEP application.
Lexing and Parsing
Lex and Yacc are two well known and well understood pieces of software used by many languages to lex and parse text. I am not going to dive deep into compiler theory in this post (the internet is full of resources), I merely hope show how to use the erlang versions of lex (leex) and yac (yecc) with elixir.
To turn a statement such as
into an elixir struct
we need to perform three steps, first, we need to tokenise it using leex, then we need to convert the tokens into an AST (abstract syntax tree) using yeec, and finally, we need to convert our AST to one that can be compiled by the elixir compiler.
Leex
A lexer split our text into a series of tokens and associated metadata. To use leex, we need to create a s file that defines how we create tokens from text. A rule is simply:
regex: erlang code
Note: Leex and Yeec are erlang tools and expect code, atom names etc. to be written in erlang.
To make it easier to read, we can split our rules file into a definitions section, where we define our regular expressions, and a rules section, where we create our token tuples. Our rules must use erlang syntax and they must return a tuple of
{token, Value}
Ok, so lets create our rules file in lib/lexer.xrl
Definitions.
KEYWORD = [a-zA-Z_]+
WHITESPACE = [\s\t\n\r]
Rules.
define : {token, {define, TokenLine, TokenChars}}.
table : {token, {table, TokenLine, TokenChars}}.
{KEYWORD} : {token, {keyword, TokenLine, TokenChars}}.
\( : {token, {''('', TokenLine}}.
\) : {token, {'')'', TokenLine}}.
, : skip_token.
{WHITESPACE}+ : skip_token.
Erlang code.
The Erlang code section can be empty but the header must be present. Now lets try it out in a iex console
iex(1)> :leex.file(''lib/lexer.xrl'')
{:ok, ''lib/lexer.erl''}
iex(2)> c("lib/lexer.erl")
[:lexer]
iex(3)> {:ok, toks, _} = :lexer.string(''define table TickStream (symbol, timestamp, value)'')
{:ok,
[{:define, 1, ''define''}, {:table, 1, ''table''}, {:keyword, 1, ''TickStream''},
{:"(", 1}, {:keyword, 1, ''symbol''}, {:",", 1}, {:keyword, 1, ''timestamp''},
{:",", 1}, {:keyword, 1, ''value''}, {:")", 1}], 1}
Great, we pass in a text string and get back a data structure. Now that we have our list of tokens, lets convert it into an AST.
Yecc
Our next step is to convert the tokens we received from leex into an AST structure that we can process.
yecc also expects a rule file. A yecc rule is:
Left_hand_side -> Right_hand_side : Associated_code.
The Erlang docs state: “The left hand side is a non-terminal category. The right hand side is a sequence of one or more non-terminal or terminal symbols with spaces between.”
I want to keep this code focused and hopefully the example will speak for itself, but basically a non-terminal is a category that is not full expanded, whereas a terminal is fully expanded (finished, “terminated”). In other words if you have a non-terminal you need to keep adding rules until you reach a terminal. Note the terminal names/tokens are those that are created by our lexer rules in the previous step.
In our associated code we use $1, $2, $3 etc. to reference items matched by our rule, the first item being $1 etc. Often we will simply ignore certain matched items as you will see below, they provide guidance to the human writer and the parser but are of little use after parsing.
Nonterminals definition name fields field.
Terminals define table keyword ''('' '')'' '',''.
Rootsymbol definition.
definition -> define table name ''('' fields '')'': {struct, ''$3'', {fields, ''$5''}}. % We are interested in the table and field names
name -> keyword: {name, extract(''$1'')}.
fields -> field: [''$1''].
fields -> field '','' fields: [''$1''] ++ ''$3''.
field -> keyword: extract(''$1'').
Erlang code.
extract({_, _, Value}) -> Value.
We declare a top level non-terminal, definition, as our entry point. We then say we expect a definition to have a defined format, define table
Ok, lets try it out
iex(4)> :yecc.file(''lib/parser.yrl'')
{:ok, ''lib/parser.erl''}
iex(5)> c("lib/parser.erl")
[:parser]
iex(6)> {:ok, ast} = :parser.parse(toks)
{:ok,
{:struct, {:name, ''TickStream''}, {:fields, [''symbol'', ''timestamp'', ''value'']}}}
Now we have a datastructure that contains all we need to create an elixir struct.
Elixir AST
Elixir has powerful metaprogramming support due to its use of macros. We can easily see the AST elixir is expecting by quoting our expected struct definition:
iex(1)> quote do
...(1)> defmodule TickStream do
...(1)> defstruct symbol: nil, timestamp: nil, value: nil
...(1)> end
...(1)> end
{:defmodule, [context: Elixir, import: Kernel],
[{:__aliases__, [alias: false], [:TickStream]},
[do: {:defstruct, [context: Elixir, import: Kernel],
[[symbol: nil, timestamp: nil, value: nil]]}]]}
We see what elixir expects is slightly more involved than what we currently produce, we now have two options. We can modify our parser rules to produce what elixir is expecting, or, we can use elixir code (functions) to modify our structure.
In the long run the code approach gives us more flexibility so thats what I choose here (and elixir pattern matching makes it much easier once our transpilier/compilier gets more complex.)
Create a file called transformer.ex and add a function to convert the AST structure to an elixir AST data structure.
We can test it in a console
iex(18)> c("lib/transformer.ex")
[Transformer]
iex(20)> exast = Transformer.transform(ast)
{:defmodule, [context: Elixir, import: Kernel],
[{:__aliases__, [alias: false], [:TickStream]},
[do: {:defstruct, [context: Elixir, import: Kernel],
[[symbol: nil, timestamp: nil, value: nil]]}]]}
And see our function has converted our AST to the AST expected by the elixir compiler. Finally, lets compile and run it
iex(22)> Code.compile_quoted(exast)
[{TickStream,
<<70, 79, 82, 49, 0, 0, 5, 0, 66, 69, 65, 77, 69, 120, 68, 99, 0, 0, 0, 99, 131, 104, 2, 100, 0, 14, 101, 108, 105, 120, 105, 114, 95, 100, 111, 99, 115, 95, 118, 49, 108, 0, 0, 0, 2, 104, 2, 100, 0, ...>>}]
iex(23)> %TickStream{}
%TickStream{symbol: nil, timestamp: nil, value: nil}
iex(24)> %TickStream{symbol: "AAPL", timestamp: 100, value: 124.56}
%TickStream{symbol: "AAPL", timestamp: 100, value: 124.56}
As easy as that! We can now use the TickStream struct as if it had been written in pure elixir.
Conclusion
In this post we have examined how to lex, parse and compile a custom language into elixir. The facts that erlang comes with leex and yeec as standard, and that elixir has powerful and easy to use code generation capabilities, means we have no excuse not to use these tools if a suitable use cases arises.