Friday, 18 September 2009

Unboxed evaluation

This is the third in a series of posts about XmlPrime 1.0 optimizations and performance.

In the previous post I outlined the compilation process used by XmlPrime. I did not however explain how the resultant syntax tree is evaluated.

In XmlPrime 0.9 every expression in the tree had several evaluate methods. Each of these methods took a dynamic context which contains everything needed to evaluate an expression, for example the values of the in-scope variables. These methods returned the result of evaluating the expression. The results returned were all of type XdmItem, and so an XdmItem instance had to be constructed for each item returned from an expression.

Boxing is a term that refers to wrapping a value or object in another object in order to pass it to something else. In .NET, boxing usually refers to wrapping a value type to expose it as a reference type, and happens implicitly in most .NET languages, for example when passing a double to a function that takes an object as an argument.

In XmlPrime 0.9 the values returned were boxed as XdmItems in order to return them, and needed to be unboxed when their contents was examined. In XmlPrime 1.0 we have managed to eliminate this boxing and unboxing in almost all cases at runtime.

XmlPrime already had some rudimentary unboxing for XPathNavigator values, and so the improvements are most noticeable on queries that use a lot of atomic values, for example a lot of arithmetic expressions. Take the following query for example, which finds the sum of the first 10 square numbers.

sum(for $x in 1 to 10 return $x * $x)

In XmlPrime 0.9 each number in the input sequence (1 to 10) is boxed. This item is then placed on the stack. Each reference to $x retrieved the variable from the stack. The multiplication operator took two items as arguments and unboxed them to obtain the values to multiply. The values were multiplied and then boxed again and returned. Finally all these returned values were unboxed, their values summed and the result boxed to return to the outside environment.

This is a total of 21 boxing and 30 unboxing operations for such a simple query.

In XmlPrime 1.0 the only place this boxing occurs is to return the final result from the query. As you can imagine the savings quickly add up.

Aside: Entries on the stack can be sequences of items and can be values or closures (expressions which are evaluated when they are first used). For this reason another kind of boxing still needs to take place when placing items on or retrieving items from the stack. Changing this to avoid first boxing all items that are to be placed on the stack proved suprisingly difficult. More information on how we managed this will be given in a future post.

The implementation details


In XmlPrime 1.0 we changed the evaluation processl; this is probably the biggest change since 0.9. The expressions themselves are no longer responsible for evaluation. Instead another compilation pass occurs (which we refer to as code generation) which transforms the syntax tree into an evaluation tree.

Optimized syntax treeCode generation
Evaluation tree


This code generation phase does not generate code in the traditional sense. Indeed this is something we would like to do in the future, in the vein of XslCompiledTransform, but this is not always practical. For now the evaluation tree consists of pre-written evaluators which consist of methods to evaluate the expression.

In XmlPrime 0.9 the base class for all items was called XdmItem, and all values passed around were wrapped in objects derived from XdmItem. All nodes in XmlPrime were represented as XPathNavigator instances, which had to be wrapped in NavigatorValue (which derives from XdmItem) instances in order to pass them around. The expressions had methods to evaluate themselves as lazy sequences (whose values are returned by an IEnumerator instance), or as single or optional XdmItem instances (which avoids the overhead of an enumerator.

Historically the cost of constructing NavigatorValues (which are effectively XPathNavigators boxed as XdmItems) showed up as a significant performance bottleneck, and so we added alternative methods to evaluate every expression that returned XPathNavigator values instead.

In XmlPrime 1.0 we alleviated the problem somewhat by renaming XdmItem to XPathAtomicValue, and deriving this from XPathItem (which is the base class for XPathNavigator). Since XPathItem is now used as the base class of our hierarchy, it means that XPathNavigator instances never need to be boxed.

The performance boost that was gained from unboxing XPathNavigators in XmlPrime 0.9 was huge, so we wanted to bring this performance enhancement to all items, and not just nodes. XmlPrime uses over 20 different types to represent values, so adding all these evaluation methods (remember that expressions can be evaluated to return a single item, zero or one items, or a sequence of items) would have been impractical. If we added any more methods of evaluation (for example eager evaluation) then this would also be fairly difficult.

The evaluation tree consists of objects implementing IEvaluator<T> which provides methods for evaluating a single item, zero or one items, or a lazy sequence of items. The interface is generic in the type of the values returned and so can be used for unboxed evaluation of most expressions. A lot of the evaluator implementations themselves are implemented as generic classes (for example the evaluator for fn:subsequence), which means that unboxed evaluation is used in almost all cases where the static type of an expression allows it.

Another advantage to seperating out the evaluation tree from the syntax tree is that it has lowered the bar for specialized versions of expressions, as now only a new evaluator class needs to be written, rather than an entire (rather complicated) expression class. When adding a new expression class we also had to ensure that it would not prevent any other optimizations. Adding a new evaluator has none of these difficulties. This has meant that we have been able to add more specialized versions of expressions to be used at run time (for example a typeswitch expression for when all the types being tested against are subtypes of node()?) without harming other optimizations.

No comments:

Post a Comment