Friday, 18 September 2009

Unboxed evaluation

This is the third in a series of posts about XmlPrime 1.0 optimizations and performance.

In the previous post I outlined the compilation process used by XmlPrime. I did not however explain how the resultant syntax tree is evaluated.

In XmlPrime 0.9 every expression in the tree had several evaluate methods. Each of these methods took a dynamic context which contains everything needed to evaluate an expression, for example the values of the in-scope variables. These methods returned the result of evaluating the expression. The results returned were all of type XdmItem, and so an XdmItem instance had to be constructed for each item returned from an expression.

Boxing is a term that refers to wrapping a value or object in another object in order to pass it to something else. In .NET, boxing usually refers to wrapping a value type to expose it as a reference type, and happens implicitly in most .NET languages, for example when passing a double to a function that takes an object as an argument.

In XmlPrime 0.9 the values returned were boxed as XdmItems in order to return them, and needed to be unboxed when their contents was examined. In XmlPrime 1.0 we have managed to eliminate this boxing and unboxing in almost all cases at runtime.

XmlPrime already had some rudimentary unboxing for XPathNavigator values, and so the improvements are most noticeable on queries that use a lot of atomic values, for example a lot of arithmetic expressions. Take the following query for example, which finds the sum of the first 10 square numbers.

sum(for $x in 1 to 10 return $x * $x)

In XmlPrime 0.9 each number in the input sequence (1 to 10) is boxed. This item is then placed on the stack. Each reference to $x retrieved the variable from the stack. The multiplication operator took two items as arguments and unboxed them to obtain the values to multiply. The values were multiplied and then boxed again and returned. Finally all these returned values were unboxed, their values summed and the result boxed to return to the outside environment.

This is a total of 21 boxing and 30 unboxing operations for such a simple query.

In XmlPrime 1.0 the only place this boxing occurs is to return the final result from the query. As you can imagine the savings quickly add up.

Aside: Entries on the stack can be sequences of items and can be values or closures (expressions which are evaluated when they are first used). For this reason another kind of boxing still needs to take place when placing items on or retrieving items from the stack. Changing this to avoid first boxing all items that are to be placed on the stack proved suprisingly difficult. More information on how we managed this will be given in a future post.

The implementation details


In XmlPrime 1.0 we changed the evaluation processl; this is probably the biggest change since 0.9. The expressions themselves are no longer responsible for evaluation. Instead another compilation pass occurs (which we refer to as code generation) which transforms the syntax tree into an evaluation tree.

Optimized syntax treeCode generation
Evaluation tree


This code generation phase does not generate code in the traditional sense. Indeed this is something we would like to do in the future, in the vein of XslCompiledTransform, but this is not always practical. For now the evaluation tree consists of pre-written evaluators which consist of methods to evaluate the expression.

In XmlPrime 0.9 the base class for all items was called XdmItem, and all values passed around were wrapped in objects derived from XdmItem. All nodes in XmlPrime were represented as XPathNavigator instances, which had to be wrapped in NavigatorValue (which derives from XdmItem) instances in order to pass them around. The expressions had methods to evaluate themselves as lazy sequences (whose values are returned by an IEnumerator instance), or as single or optional XdmItem instances (which avoids the overhead of an enumerator.

Historically the cost of constructing NavigatorValues (which are effectively XPathNavigators boxed as XdmItems) showed up as a significant performance bottleneck, and so we added alternative methods to evaluate every expression that returned XPathNavigator values instead.

In XmlPrime 1.0 we alleviated the problem somewhat by renaming XdmItem to XPathAtomicValue, and deriving this from XPathItem (which is the base class for XPathNavigator). Since XPathItem is now used as the base class of our hierarchy, it means that XPathNavigator instances never need to be boxed.

The performance boost that was gained from unboxing XPathNavigators in XmlPrime 0.9 was huge, so we wanted to bring this performance enhancement to all items, and not just nodes. XmlPrime uses over 20 different types to represent values, so adding all these evaluation methods (remember that expressions can be evaluated to return a single item, zero or one items, or a sequence of items) would have been impractical. If we added any more methods of evaluation (for example eager evaluation) then this would also be fairly difficult.

The evaluation tree consists of objects implementing IEvaluator<T> which provides methods for evaluating a single item, zero or one items, or a lazy sequence of items. The interface is generic in the type of the values returned and so can be used for unboxed evaluation of most expressions. A lot of the evaluator implementations themselves are implemented as generic classes (for example the evaluator for fn:subsequence), which means that unboxed evaluation is used in almost all cases where the static type of an expression allows it.

Another advantage to seperating out the evaluation tree from the syntax tree is that it has lowered the bar for specialized versions of expressions, as now only a new evaluator class needs to be written, rather than an entire (rather complicated) expression class. When adding a new expression class we also had to ensure that it would not prevent any other optimizations. Adding a new evaluator has none of these difficulties. This has meant that we have been able to add more specialized versions of expressions to be used at run time (for example a typeswitch expression for when all the types being tested against are subtypes of node()?) without harming other optimizations.

Tuesday, 15 September 2009

Extension methods for XPath 2.0 in .NET (C#)

.NET provides easy use of XPath with the Select(...) and SelectSingleNode(...) methods on XPathNavigator.

This blog post will show how you can use XmlPrime to add similar methods to provide XPath 2.0 support on XPathNavigator also.

This .NET 3.5 is required as this example uses extension methods.

First download and install XmlPrime, then in you project add a reference to the installed XmlPrime dll.

Now create a class with the following code:
using XmlPrime;
using System.Xml.XPath;

namespace xp.ExtensionMethods
{
public static class XPathNavigatorExtensions
{
public static XPathNavigator SelectSingleNode2(
this XPathNavigator navigator,
string expression)
{
var compiledExpression = XPath.Compile(expression, navigator.NameTable);
var item = compiledExpression.EvaluateToItem((XPathItem)navigator);
return (XPathNavigator)item;
}
}
}

This will add a method to XPathNavigator, so you can then use XPath 2.0 in .NET just by typing:

navigator.SelectSingleNode2("/foo/bar[matches(.,'[a-z]')]")

wherever you would have used navigator.SelectSingleNode(...), but now you can use all the XPath 2.0 functionality; for example regular expression functions.

Implementing an equivalent of navigator.Select(...) and similar functions on XmlNode is equally straightforward.

We plan to include such extension methods in a the XmlPrime library in a future release.

The compilation process

This is the second in a series of posts about performance in XmlPrime 1.0. The previous post in the series can be found here.

In order to better explain some of the optimizations we perform, I will first outline the query compilation process.

XQuery sourceParse
Syntax treeBind & Type Check
Compiled syntax treeOptimize
Optimized syntax tree


Firstly, the XQuery source is parsed to create an syntax tree. The syntax tree is a tree representation of the query, which can be processed further. The syntax tree was constructed mostly following the rules provided in the XQuery 1.0 and XPath 2.0 Formal Semantics specification, although we have started to deviate from the formal semantics to provide better error reporting (by providing more context about where the expression came from) or for increased performance (for example making a general comparison an atomic expression). There is still a lot more we can do in this area (for example not expanding out filter expressions until after type checking has been performed, which reduces the number of expressions we need to create).

The second stage of compilation is binding and type checking. Because of the way namespaces are bound in XQuery we do not necessarily know the namespaces of qualified names as we parse them. Consider the following query:

<e attr1="{$foo:bar}" 
xmlns:foo="http://www.example.org/"
/>

When parsing this query we do not know which namespace the prefix foo refers to until we have parsed the namespace declaration. To avoid parsing the query multiple times, we bind variables and functions in a separate pass. The binding pass works out the in-scope namespaces for any namespace-dependant expressions, and binds variable references and function calls to the variables and functions they refer to.

Once the expression is bound, the static types of all expressions are computed, and where a static type does not match an expected type for an expression, either an error is raised (if the static typing feature is enabled), or an extra run time check is inserted as appropriate.

After binding and type checking, we are left with the compiled syntax tree, which contains all the information we need to run the query.

The final compilation stage is optimization. This stage attempts to transform the compiled syntax tree with the aim to making the end result more efficient. There are many optimizations that XmlPrime performs, too numerous to list here. Since this optimization stage is a large part of what makes XmlPrime so fast, we will outline many of the optimizations we perform in future posts to this blog.

You can see the optimization process in action by using the XmlPrime command line tool. The -e flag indicates that the query plan should be outputted. The query plan is a textual representation of the abstract syntax tree. By using the -c (compile but don't run) and toggling the -o0 (disable optimizations) flag you can see the query plan before and after the optimization pass has taken place.

Monday, 7 September 2009

XmlPrime 1.0.2 Released

XmlPrime 1.0.2 is available now

This version fixes some bugs from the initial release.

Here is the full change log
  • Set default type checking mode to optimistic, as per documentation.
  • Fixed compatibility with XmlDocument
    • Fixed NullReferenceException when using data() caused by XmlDocument returning an IXmlSchemaInfo instance with null SchemaType.
    • Fixed infinite loop caused by XPathNodeIterators over XmlDocuments not consistantly returning false after the end of the node set.
  • Fixed the XQuery version declaration for cultures where '.' is not the decimal point.
  • Fixed normalization rules of general comparisons as per erratum XQ.E18.
  • For increased performance, the command line tool now runs in x86 mode by default.

Friday, 4 September 2009

XmlPrime 1.0 performance enhancements

This is the first post in a series of posts about the performance updates in XmlPrime.

First off, some numbers.

In order to demonstrate the performance changes I have run the 3 samples that we provide, with the command line tool in the XmlPrime 0.9 Technical Preview and the command line tool provided with XmlPrime 1.0. All these tests were run on an x64 machine with the x64 VM (under x86 the performance is much better - we will cover this in a later post).

The times are as reported by the command line tool, so the compile time includes parsing, optimization and compilation; the run time includes the time to evaluate the query and serialize the results.

Raytracer

The times here are from running ppm.xq of the final raytracer at a resolution of 640x480. Here is the command used to run this test:
XQuery -t ppm.xq width=640 height=480
compile(ms)run(ms)
0.91182412186
1.0163307593

Currency

These times were computed with a modified version of currency.svg that takes a source document as a context item, supplied as GBPNoon.xml. This is to avoid network latency from affecting the timings.

The command used to run this query was
XQuery -t -s GBPNoon.xml currencysvg.xq
compile(ms)run(ms)
0.911991344
1.064955

iTunes Artist Graph

The iTunes query used was the one from part 3 of the sample. This because the part 4 query is largely IO bound, as it retrieves a lot of data from Wikipedia.

The command used to run this query is:
XQuery -t iTunes3.xml
compile(ms)run(ms)
0.9145893483
1.08098529

This particular query runs slower in 1.0 than in 0.9 which we were not expecting. The reason behind this is that we changed some of the rules around variable dereferencing when we added support for modules to be compiled seperately. In this case global variables cannot be removed as they are part of the public interface of the query. Variables were being dereferenced in the main module, but their declarations were not being removed. Compounded with the fact that currently all global variables are evaluated eagerly meant that the source document was being validated several times. Suffice it to say after discovering this problem, the dereferencing issue has been fixed, and will be available in the next version available in the next couple of weeks. A run with the latest dev version brings the runtime down to 66884 ms.
[UPDATE (09/09/2009)]: These changes have been incorporated in XmlPrime 1.0.2

So how have we acchieved this performance boost? We have implemented a number of significant optimizations since 0.9. The biggest ones are listed here:
  • Unboxed evaluation of expressions
  • Improved strictness analysis
  • Unboxing variables
  • Static type analysis of recursive functions
  • Strictness and unboxing analysis of function arguments
  • Precompiling the DLL
  • Elimination of some quadratic algorithms during common sub-expression elimination
  • New XDM type system implementation, tighter bound to XML Schema
  • Function specialization
  • Specialization of core expressions
Most of these changes will probably sound quite cryptic, so this shall mark the start of a series of posts on how XmlPrime works under the hood, with in-depth articles on some of the different optimizations performed by XmlPrime.

Welcome

As has been announced elsewhere, XmlPrime 1.0 has been released.

We realise that it has been a long time (almost a year) since the 0.9 technical preview of XmlPrime was released and we have published very little information in the meantime.

Now XmlPrime has had a proper release, we have started this blog to provide information about the new and exciting features we are working on. We will start with a series of technical posts explaining some of the new performance enhancements we have been working on for XmlPrime 1.0.