Tuesday, 15 September 2009

The compilation process

This is the second in a series of posts about performance in XmlPrime 1.0. The previous post in the series can be found here.

In order to better explain some of the optimizations we perform, I will first outline the query compilation process.

XQuery sourceParse
Syntax treeBind & Type Check
Compiled syntax treeOptimize
Optimized syntax tree


Firstly, the XQuery source is parsed to create an syntax tree. The syntax tree is a tree representation of the query, which can be processed further. The syntax tree was constructed mostly following the rules provided in the XQuery 1.0 and XPath 2.0 Formal Semantics specification, although we have started to deviate from the formal semantics to provide better error reporting (by providing more context about where the expression came from) or for increased performance (for example making a general comparison an atomic expression). There is still a lot more we can do in this area (for example not expanding out filter expressions until after type checking has been performed, which reduces the number of expressions we need to create).

The second stage of compilation is binding and type checking. Because of the way namespaces are bound in XQuery we do not necessarily know the namespaces of qualified names as we parse them. Consider the following query:

<e attr1="{$foo:bar}" 
xmlns:foo="http://www.example.org/"
/>

When parsing this query we do not know which namespace the prefix foo refers to until we have parsed the namespace declaration. To avoid parsing the query multiple times, we bind variables and functions in a separate pass. The binding pass works out the in-scope namespaces for any namespace-dependant expressions, and binds variable references and function calls to the variables and functions they refer to.

Once the expression is bound, the static types of all expressions are computed, and where a static type does not match an expected type for an expression, either an error is raised (if the static typing feature is enabled), or an extra run time check is inserted as appropriate.

After binding and type checking, we are left with the compiled syntax tree, which contains all the information we need to run the query.

The final compilation stage is optimization. This stage attempts to transform the compiled syntax tree with the aim to making the end result more efficient. There are many optimizations that XmlPrime performs, too numerous to list here. Since this optimization stage is a large part of what makes XmlPrime so fast, we will outline many of the optimizations we perform in future posts to this blog.

You can see the optimization process in action by using the XmlPrime command line tool. The -e flag indicates that the query plan should be outputted. The query plan is a textual representation of the abstract syntax tree. By using the -c (compile but don't run) and toggling the -o0 (disable optimizations) flag you can see the query plan before and after the optimization pass has taken place.

No comments:

Post a Comment