Tuesday, 22 December 2009

XmlPrime 1.2 Released

XmlPrime 1.2 has just been released.

The major new features in this release are push-based evaluation.

Queries can now be evaluated directly to an XmlWriter or directly serialized to a TextWriter or Stream. This prevents an immediate result tree from being constructed, which reduces memory usage, and allows the result to be returned sooner, and quicker.

All expressions returning sequences can now be evaluated via push-based mechanisms. This method of evaluation is more efficient, but only where the entire sequence is consumed. This optimization is applied automatically to any expression that is consumed eagerly.

Expect a post next week with a more detailed analysis of this feature.

See the full change log for more information about the changes in this version, or download the new version now.

Thursday, 17 December 2009

Avoiding StackOverflowExceptions

This post describes several situations that can cause StackOverflowExceptions in XmlPrime, and how to avoid them.

Look at the following fairly innocuous XQuery function:

declare function local:fact($x)
{
if ($x = 0)
then 1
else $x * local:fact($x-1)
};

As you probably expect this function works fine for small values of $x, but as $x grows large then the function will overflow the stack. The stack is overflowed far more quickly than a direct C# implementation of this function, as a function call is made for each expression. In this case the call stack contains one call to prepare an (XmlPrime) stack frame, one call to bind the arguments, one call to evaluate the if expression, and finally a call to evaluate the multiplication operator before recursing. This means that the stack will overflow quite quickly.

So, what can we do about this?

The trick to solving this problem is an optimization called tail recursion. If a function returns the result of a function call as a value, then the runtime can evaluate this new function call in place of the old one, without growing the call stack. Let me try to demonstrate what I mean by this. The factorial function we defined above is not tail recursive, as the value from the recursive function call is not returned directly, but is multiplied by $x after it is returned. We can rewrite the function in order to make it tail recursive, by adding an accumulating argument:

declare function local:fact2($x, $accum)
{
if ($x = 0)
then $accum
else local:fact($x - 1, $accum * $x)
};

declare function local:fact($x)
{
local:fact($x, 1)
};

Note that the value of the recursive call is now returned directly, this means that the tail-recursion optimization can be applied. Evaluating this function is now a lot more like evaluating the following C# function:

int Fact(int x)
{
int accum = 1;
while (x != 0)
{
x = x-1;
accum = accum * x;
}
return accum;
}

As you can see, evaluated in this way the function should no longer trigger a stack overflow exception. Unfortunately though, the function had to be manually rewritten in order to take advantage of this optimization. In the future we would like to try to make this transformation automatically, to widen the set of functions in which tail recursion can be applied.

Does this solve our problem? For this case, yes, but things can get more complicated.

Consider writing our own fold function. fold takes a function f that takes two arguments and returns an accumulated value. fold applies this function to a sequence by applying it to current result and each item in turn to build up a final result. For example if we fold the addition function over a sequence we get the sum of the sequence, and if we fold the multiply function over a sequence we get the product.

In XQuery 1.1, fold can be written like this:

declare function local:fold($func as function(item()*, item()) as item()*,
$seq as item()*,
$accum as item()*)
{
if (empty($seq))
then $accum
else local:fold($func, subsequence($seq, 2), $func($accum, $seq[1]))
};

However with XQuery 1.0, which does not support higher-order functions, we have to fake it by passing around a node to represent the function, and writing an apply function that applies it:

declare function local:apply($func as node(),
$accum as item()*,
$arg as item())
{
typeswitch ($func)
case element(add) return $accum + $arg
case element(multiply) return $accum * $arg
case element(reverse) return ($arg, $accum)
...
default error((), "Function not recognised")
};

declare function local:fold($func as node(),
$seq as item()*,
$accum as item()*)
{
if (empty($seq))
then $accum
else local:fold($func,
subsequence($seq, 2),
local:apply($func, $accum, $seq[1]))
};

local:fold is tail recursive and so we should never get a stack overflow, right?

Unfortanately, this is not the case. The problem occurs because local:apply (or $func in the XQuery case) is not nescassarily strict in the accumulator argument. In the XQuery 1.0 case this is because we might hit the default clause in the local:apply function, and in the XQuery 1.1 case we might have a function like:

declare function last($accum as item()*, $item as item()) as item*
{
$item
};

which when folded over a sequence returns the last item.

Aside: In the XQuery 1.0 version, if you are to call fold with a statically known function, then XmlPrime may specialize the fold function and create a new version inlining the function application. This simplifies the function enough to identify that the argument is strict. This may not always be the case though, so it shouldn't be relied on.

Since $accum may not be evaluated, it is defined lazily. So rather than applying the function and passing the value to the recursive function, a closure is created and the value is computed later.

Consider evaluating the function call

local:fold(<add />, (1,2,3,4,5), 0)

After the first iteration the closure for $accum looks like:

local:apply(<add />, 0, 1)

The next iteration applies the value 2 to $accum, and so the closure for $accum looks like

local:apply(<add />,
local:apply(<add />, 0, 1), 2)

And when the function finally returns $accum it returns a closure that evaluates

local:apply(<add />,
local:apply(<add />,
local:apply(<add />,
local:apply(<add />,
local:apply(<add />, 0, 1), 2), 3), 4), 5)

When $seq is quite large, computing the result of this closure will result in a stack overflow error.

So how can we fix this?

In order to fix this we need to ensure that the argument for the recursive call is evaluated fully. In the general case there is no easy way to fix this beyond ensuring that the argument is used entirely in every eventuality. In order to facilitate this we have added a new xmlprime:eager extension expression that will be available in the upcoming XmlPrime 1.1 release.

To force the argument to be fully evaluated we can annotate the expression like this:

declare namespace xp = "http://www.xmlprime.com/"

declare function local:fold($func as node(),
$seq as item()*,
$accum as item()*)
{
if (empty($seq))
then (# xmlprime:eager #) { $accum }
else local:fold($func,
subsequence($seq, 2),
local:apply($func,
(# xmlprime:eager #) { $accum },
$seq[1])))
};

Now $accum is evaluated fully in both branches. This forces the strictness analysis to identify $accum as a strict variable and hence the function becomes a simple tail recursive function and is more resiliant to stack overflow errors.

This fold function still has a problem that can cause stack overflow errors. The $seq argument is evaluated lazily (as each iteration only looks at the first value of $seq). Whilst technically the function is strict in $seq (as it does inspect every value eventaully), the current analysis engine is not capable of spotting this. It is impossible to always determine the strictness of an argument with 100% certainty as this can be specialized to the halting problem. The value of $seq ends up in a closure that looks something like this:

subsequence(
subsequence(
subsequence(
subsequence(
subsequence(..., 2), 2), 2), 2), 2)

This will eventually cause stack overflow errors.

This case of using the first item of a sequence and recursing with the remainder of the sequence is an important and common one in functional programs. In a future version of XmlPrime we would like to spot this case, read the first few items from the enumerator, and recurse by passing the same enumerator to the next iteration. This would avoid building a closure to evaluate the subsequence operation, and avoid caching the values returned and so should drastically improve performance of this kind of function (probably by a factor of about O(n) where n is the depth of recursion). Unfortunately this feature has not yet been implemented, and so this problem enforces a limit on the number of items in a sequence that can be processed in this manner.

You may be wondering why it is necessary to put the eager extension on the then branch of the if expression. Consider the following query, which reverses a sequence:

local:fold(<reverse />, $seq, ())

This query behaves properly and avoids the problem. However, lets make a simple modification to the query, to return the $nth last item in $seq:

local:fold(<reverse />, $seq, ())[$n]

For large values of $n removing the xmlprime:eager around the then clause this would cause the query to stack-overflow. This is because we do not always use all of the result of the function, and so the variable is evaluated lazily. This causes a tree of enumerators that perform the concatenation to be produced, and can cause a stack overflow when iterating over the sequence.

This case is another case that we would like to fix in a future version of XmlPrime. To fix this we would need to implement a custom MoveNext method that rather than returning a boolean value would return a new IEnumerator to use to enumerate the rest of the sequence (or null if we are at the end of the sequence). This would avoid pushing a new (C#) stack frame for each concatenation in a lazy result.

Monday, 2 November 2009

Using XmlPrime to access SQL databases

One of the uses of XQuery is to query relational databases. This posting is going to demonstrate how one might use XmlPrime to access a SQL database.

To access various data sources you can write a collection resolver implementing the ICollectionResolver interface. To do this you need to implement the following method:
public IXPathNavigable[] Resolve(AnyUri uri,
DocumentSet documentSet,
XmlNameTable nameTable,
XsltWhitespaceSettings whitespaceSettings)

First thing we need is a URI format to indicate we are connecting to the database, for example:

mssql://username:password@server/database/table


We need to convert this URI to a SQL connection string, this can be done as follows:

var connectionString = new StringBuilder();
connectionString.Append("Server=" + uri.Host + ";");
var split = uri.Path.Split('/');
var table = split[2];
connectionString.Append("Database=" + split[1] + ";");
var userInfo = uri.UserInfoItems;
if (userInfo == null)
connectionString.Append("Trusted_Connection=True;");
else
{
connectionString.Append("User ID=" + userInfo[0] +
";Password="+userInfo[1] + ";");
}


We can now connected to the database and generate some XML nodes for our collection. The XML format is going to be:

<row column1="value1" column2="value2"/>
<row column1="value3" column2="value4"/>


We now open the database and get the data from the specified table:

using (var connection = new SqlConnection(connectionString.ToString()))
{
connection.Open();
var command = new SqlCommand("SELECT * FROM " + table, connection);
var collection = new List<IXPathNavigable>();
var doc = new XmlDocument(nameTable);
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
...
}
}
return collection.ToArray();
}


The XML nodes need to be created with the same name table as the query is using, hence an XmlDocument is created with the name table passed into the Resolve method.

Next we need some code to put in the while loop to generate the XML elements for each row. A certain degree of mapping is required between SQL field types and XML types. This code shows a subset of possible field types for demonstration purposes.

var row = doc.CreateElement(table);
for(var i = 0;i<reader.FieldCount;i++) {
if (!reader.IsDBNull(i))
{
var type = reader.GetFieldType(i);
XPathAtomicValue value;
if(type == typeof(Boolean))
value = XPathAtomicValue.Create(reader.GetBoolean(i));
else if(type = typoeof(Byte[])) {
var length = reader.GetBytes(i, 0, null, 0, 0);
var bytes = new byte[length];
reader.GetBytes(i, 0, bytes, 0, (int)length);
value = XPathAtomicValue.CreateBase64Binary(bytes);
}
else if (type == typeof(DateTime))
value = XPathAtomicValue.Create(
new DateTimeWithZone(reader.GetDateTime(i), (TimeSpan?)null)
);
else if (type == typeof(Int32))
value = XPathAtomicValue.Create(reader.GetInt32(i));
else if (type == typeof(String))
value = XPathAtomicValue.Create(reader.GetString(i));
else
// TODO: similar methods will need to be implemented
// for other SQL field types
throw new NotImplementedException();
var attr = d.CreateAttribute(reader.GetName(i));
attr.Value = value.Value;
row.Attributes.Append(attribute);
}
}
collection.Add(row);


So the whole class will be:

using System;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.Text;
using System.Xml;
using System.Xml.XPath;

namespace xp.Connectors
{
public class SimpleSqlCollectionResolver : ICollectionResolver
{
public IXPathNavigable[] Resolve(AnyUri uri,
DocumentSet documentSet,
XmlNameTable nameTable,
XsltWhitespaceSettings whitespaceSettings)
{
if (uri != null && uri.Scheme == "mssql" && uri.Path != null)
{
var connectionString = new StringBuilder();
connectionString.Append("Server=" + uri.Host + ";");
var split = uri.Path.Split('/');
var table = split[2];
connectionString.Append("Database=" + split[1] + ";");
var userInfo = uri.UserInfoItems;
if (userInfo == null)
connectionString.Append("Trusted_Connection=True;");
else
{
connectionString.Append("User ID=" + userInfo[0] +
";Password="+userInfo[1] + ";");
}

using (var connection = new SqlConnection(connectionString.ToString()))
{
connection.Open();
var command = new SqlCommand("SELECT * FROM " + table, connection);
var collection = new List<IXPathNavigable>();
var doc = new XmlDocument(nameTable);
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
var row = doc.CreateElement(table);
for (var i = 0; i < reader.FieldCount; i++)
{
if (!reader.IsDBNull(i))
{
var type = reader.GetFieldType(i);
XPathAtomicValue value;
if (type == typeof(Boolean))
value = XPathAtomicValue.Create(reader.GetBoolean(i));
else if (type == typeof(Byte[]))
{
var length = reader.GetBytes(i, 0, null, 0, 0);
var bytes = new byte[length];
reader.GetBytes(i, 0, bytes, 0, (int)length);
value = XPathAtomicValue.CreateBase64Binary(bytes);
}
else if (type == typeof(DateTime))
value = XPathAtomicValue.Create(new DateTimeWithZone(reader.GetDateTime(i), (TimeSpan?)null));
else if (type == typeof(Int32))
value = XPathAtomicValue.Create(reader.GetInt32(i));
else if (type == typeof(String))
value = XPathAtomicValue.Create(reader.GetString(i));
else
throw new NotImplementedException();
var attr = doc.CreateAttribute(reader.GetName(i));
attr.Value = value.Value;
row.Attributes.Append(attr);
}
}
collection.Add(row);
}
}
return collection.ToArray();
}
}
return null;
}
}
}


This class can now be passed into a DocumentSet which can then be set on the DynamicContextSettings of a query.

var documentSet = new DocumentSet(nameTable,null,
new SimpleSqlCollectionResolver(),
null);
var dynamicSettings = new DynamicContextSettings {DocumentSet = documentSet};


We can then run queries accessing the database by using:

collection('mssql://SERVER/Northwind/Employees')


or if we are accessing multiple tables from the same database a nice syntax is

declare base-uri "mssql://SERVER/Northwind/";
collection('Employees')


Of course, the performance of this will be appalling compared to native SQL queries. In a future release of XmlPrime we aim to offer native SQL support with a number of the XQuery operations mapped onto SQL operations and run on the database, rather than loading whole tables as XML documents.

In another post we'll look at implementing ICollectionTypeResolver so that our query optimizer starts taking advantage of the field static type information when compiling the query.

Wednesday, 28 October 2009

Variables and Strictness analysis in XmlPrime 1.0

In the last post I discussed the different ways in which variables are placed on the stack in XmlPrime 0.9, and the strictness analysis that is performed in order to determine which of these methods is most appropriate for each variable.

In this post I will explain how this mechanism has been expanded for XmlPrime 1.0.

In 0.9 variables were considered either to be eager, or lazy. For XmlPrime 1.0 we redefined these terms slightly. A variable is strict if it is always evaluated at least in part. A variable is eager if whenever it is evaluated it is evaluated in its entirety. If a variable is not eager, it is lazy. This provides us with a middle ground of a non-strict eager variable. $x in the code sample below is an example of a non-strict eager variable.

if (...)
then
sum($x)
else
()

These variables are placed on the stack in closures, but when they are needed they are evaluated in their entirety, and the value on the stack is replaced with the strict version. By evaluating the variable in its entirety, the extra layer of caching that was required for lazy variables in XmlPrime 0.9 is not needed.

Both of these conditions propagate throughout the query in a similar way so the only significant barrier for development was annotating all the built in expressions and functions to more accurately describe how they might use their arguments. This may depend on how the function itself is evaluated.

The biggest change to variables in XmlPrime 1.0 is that they can now be unboxed. Recall that unboxing is one of new optimizations in XmlPrime 1.0. In XmlPrime 0.9, all entries on the stack were instances of IEnumerable<T> for items. In order for unboxing to have a significant effect we needed to extend the stack to allow unboxed values. In XmlPrime 1.0 all entries on the stack implement IStackEnumerable which provides IEnumerable<T> implementations for all types that are used in XmlPrime (although in most cases most of the GetEnumerator methods throw an exception).

Implementations of all the types of closure are generic and can be used to store lists or closures of any unboxed type. The implementations for XdmItem
also allow for evaluation as appropriate unboxed types.

Whilst individual items need not be boxed, the value as a whole still needs to be wrapped in a suitable stack entry. This can be mitigated in some cases, for example variables defined in for expressions now share the same stack entry object for every item in their definition.

As with unboxed expressions, a variable can only be unboxed if its dynamic type is not needed to be preserved anywhere that the variable is used. The strictness analysis was extended with a new property - a variable is unboxable if its type will never be inspected. The strictness analysis pass was extended to propagate this property as well, and so all the information required to identify which type of closure is to be created at runtime is described by the static type of the variable, and these three properties. The combination of these three properties is known as the evaluation strategy.

The techniques above fairly thoroughly cover variables introduced by let clauses. Local variables are also introduced by function arguments, and so a lot of work has gone into these for XmlPrime 1.0.

In XmlPrime 0.9 all function arguments were evaluated eagerly. Clearly this is not always beneficial. The strictness analysis rules have been extended to be applied to user defined function arguments.

The first challenge that is that the evaluation strategy of an argument depends on the evaluation strategy of the function itself. The effect of the strict property is easy to determine - if a function is not strict (might not be evaluated) then neither is its arguments. As it turns out strictness and eagerness of a function will never affect the unboxability of its arguments and vice versa. The strictness rules are applied twice to every function - once assuming that it is evaluated eagerly and unboxed, and once assuming that it is evaluated lazily and boxed. The evaluation strategy for each argument in each case is stored, and these can be combined to determine the evaluation strategy of each argument wherever the function is called.

The other big challenge for computing the evaluation strategy for function arguments comes with recursive (or mutually recursive functions). How do we determine whether $y is strict in the following case for example?

declare function local:f($x, $y)
{
if ($x = 0)
then $y
else f($x - 1, $y)
}

The solution to this problem is to assume that every argument to every recursive function is strict, lazy and not unboxable. After the strictness analysis has been applied, if any of the evaluation strategies have been updated then the analysis is run again for any function that calls it. The analysis is repeated until the stratgies do not change. This algorithm ensures that optimal evaluation strategies are always selected.

Another improvement that has been applied to function calls is that if a function calls another function, then often it is passing local variables as arguments. Using the model described above, these arguments are either fully evaluated, or a lazy closure is created wrapping the previous definition. If a variable is fully evaluated then this causes an extra copy of the variables value to be stored in memory every time the function recurses. Adding an extra lazy thunk causes an extra level of caching to be added to the variable each time it recurses. This latter effect can compound causing successive recursions of a function perform increasingly badly. XmlPrime adds a simple check for this case during code generation - if a local variable is passed as an argument, then the stack entry is simply copied into the new stack frame.

This case (setting a variable to be the value of another variable) only occurs for function arguments, as otherwise it would have been rewritten by the optimizer. Another case that is eliminated by the optimizer is local variables whose definitions are values. When a function is called with a constant argument, the stack entry can be constructed ahead of time, and so no boxing is needed to make the call.

There are still many ways in which argument passing could be much improved. One case is for lazy variables whose definitions are only inspected once (this can only occur in function arguments, as otherwise the variable would have been eliminated). Currently the values are cached as the variable is evaluated, but this is unnecessary if the variable will never be inspected again. Look out for improvements like this in future versions of XmlPrime.

Friday, 9 October 2009

Variables and strictness analysis in XmlPrime 0.9

This is the fourth post in a series about performance and optimizations in XmlPrime. The first post in the series is available here. In this post I will look at how we implement variables in XmlPrime.

XmlPrime 1.0 has introduced a number of improvements to the way we use variables, mainly to increase performance of user defined functions and to unbox variable values. See the previous post for more information about unboxing.

All expressions returning sequences in XmlPrime are (currently) evaluated lazily. This means that every expression returns an enumerator, and the values are computed as they are needed.

This post looks at how we implement variables in XmlPrime. All variables are stored in the stack. The stack is implemented as an array of IEnumerable instances. Every variable is allocated an index into the stack, and each function call is allocated its own stack in which to place its variables (this is required in order to support recursive functions - each time the function recurses we need a new set of variables). There is also a separate stack for global variables, which is implemented in much the same way.

The simplest way to place a value on the stack is to evaluate it in its entirety, into a List<T> for example. This list is placed on the stack, and then every variable reference can retrieve and enumerate over this value.
If our value is a single item then we can do slightly better by skipping the list.

However this goes against our design philosophy of evaluating as much as possible lazily; if only the first item of the variable is looked at then we still end up computing the whole sequence.

Another way variables can be stored on the stack is by creating a closure (sometimes known as a lazy thunk) around the expression defining the variable. A closure consists of a copy of the current stack frame, and the expression defining the variable. This closure is placed on the stack. When a reference to the variable is evaluated, the closure is retrieved, and the act of enumerating it causes the expression to be evaluated in its own context, as if it had been evaluated when it was first seen. Although this addresses the issue that only as much of the value that is needed is evaluated, it introduces a new problem that if the variable is evaluated more than once then the result is recomputed. To defeat this an extra level of caching is added to the enumerators. A list of cached values is shared between all enumerators over the closure, and every time the enumerator is moved, an attempt is made to retrieve the value from this list before evaluating it.

This extra level of caching adds a performance hit however, and results in the sequence being stored in memory. Some small optimizations can be made though; once we have enumerated over the entire sequence, the closure on the stack can be replaced with its value, simplifying any further accesses to the variable, but any references still being enumerated still pay a performance penalty.

For this reason XmlPrime 0.9 used both ways of evaluating variables. All variables are classed as eager or lazy. A strict variable is one that is always used in its entirety - for example $x in the expression sum($x). In order to sum a sequence, every value in the sequence must be examined. A lazy variable is one that may not be evaluated in its entirety (or may not be evaluated at all). This is known as strictness analysis. It is performed in a separate pass after optimization, as optimization moves variables around which can invalidate judgements made. Special annotations are needed for every expression and function in order to correctly propagate the strictness rules. The analysis can get quite complex as for example in the following case, which illustrates some of the considerations that need to be made:

let $x := ...
let $y := ...
let $z := sum($y)
return
if (...)
then sum ($x)+ $z
else $z

In this case $x is lazy, since if the else branch is hit $x is never evaluated. $z however is always evaluated in both branches and so is eager. $y is eager, but only because $z is eager.

Eager variables are clearly best suited to having their values placed on the stack (as they will be used in their entirety), and lazy variables are better evaluated using a closure, as they may not be evaluated in their entirety.

As I mentioned at the start, this is the behaviour as implemented in XmlPrime 0.9. XmlPrime 1.0 has dramatically expanded the number of ways that variables are placed on the stack, and the strictness analysis is now a lot more sophisticated, giving more detail on how variables are used. Variables can now be stored as unboxed values, and the strictness analysis has been improved for recursive functions. I will explain these differences in the next post.

Friday, 18 September 2009

Unboxed evaluation

This is the third in a series of posts about XmlPrime 1.0 optimizations and performance.

In the previous post I outlined the compilation process used by XmlPrime. I did not however explain how the resultant syntax tree is evaluated.

In XmlPrime 0.9 every expression in the tree had several evaluate methods. Each of these methods took a dynamic context which contains everything needed to evaluate an expression, for example the values of the in-scope variables. These methods returned the result of evaluating the expression. The results returned were all of type XdmItem, and so an XdmItem instance had to be constructed for each item returned from an expression.

Boxing is a term that refers to wrapping a value or object in another object in order to pass it to something else. In .NET, boxing usually refers to wrapping a value type to expose it as a reference type, and happens implicitly in most .NET languages, for example when passing a double to a function that takes an object as an argument.

In XmlPrime 0.9 the values returned were boxed as XdmItems in order to return them, and needed to be unboxed when their contents was examined. In XmlPrime 1.0 we have managed to eliminate this boxing and unboxing in almost all cases at runtime.

XmlPrime already had some rudimentary unboxing for XPathNavigator values, and so the improvements are most noticeable on queries that use a lot of atomic values, for example a lot of arithmetic expressions. Take the following query for example, which finds the sum of the first 10 square numbers.

sum(for $x in 1 to 10 return $x * $x)

In XmlPrime 0.9 each number in the input sequence (1 to 10) is boxed. This item is then placed on the stack. Each reference to $x retrieved the variable from the stack. The multiplication operator took two items as arguments and unboxed them to obtain the values to multiply. The values were multiplied and then boxed again and returned. Finally all these returned values were unboxed, their values summed and the result boxed to return to the outside environment.

This is a total of 21 boxing and 30 unboxing operations for such a simple query.

In XmlPrime 1.0 the only place this boxing occurs is to return the final result from the query. As you can imagine the savings quickly add up.

Aside: Entries on the stack can be sequences of items and can be values or closures (expressions which are evaluated when they are first used). For this reason another kind of boxing still needs to take place when placing items on or retrieving items from the stack. Changing this to avoid first boxing all items that are to be placed on the stack proved suprisingly difficult. More information on how we managed this will be given in a future post.

The implementation details


In XmlPrime 1.0 we changed the evaluation processl; this is probably the biggest change since 0.9. The expressions themselves are no longer responsible for evaluation. Instead another compilation pass occurs (which we refer to as code generation) which transforms the syntax tree into an evaluation tree.

Optimized syntax treeCode generation
Evaluation tree


This code generation phase does not generate code in the traditional sense. Indeed this is something we would like to do in the future, in the vein of XslCompiledTransform, but this is not always practical. For now the evaluation tree consists of pre-written evaluators which consist of methods to evaluate the expression.

In XmlPrime 0.9 the base class for all items was called XdmItem, and all values passed around were wrapped in objects derived from XdmItem. All nodes in XmlPrime were represented as XPathNavigator instances, which had to be wrapped in NavigatorValue (which derives from XdmItem) instances in order to pass them around. The expressions had methods to evaluate themselves as lazy sequences (whose values are returned by an IEnumerator instance), or as single or optional XdmItem instances (which avoids the overhead of an enumerator.

Historically the cost of constructing NavigatorValues (which are effectively XPathNavigators boxed as XdmItems) showed up as a significant performance bottleneck, and so we added alternative methods to evaluate every expression that returned XPathNavigator values instead.

In XmlPrime 1.0 we alleviated the problem somewhat by renaming XdmItem to XPathAtomicValue, and deriving this from XPathItem (which is the base class for XPathNavigator). Since XPathItem is now used as the base class of our hierarchy, it means that XPathNavigator instances never need to be boxed.

The performance boost that was gained from unboxing XPathNavigators in XmlPrime 0.9 was huge, so we wanted to bring this performance enhancement to all items, and not just nodes. XmlPrime uses over 20 different types to represent values, so adding all these evaluation methods (remember that expressions can be evaluated to return a single item, zero or one items, or a sequence of items) would have been impractical. If we added any more methods of evaluation (for example eager evaluation) then this would also be fairly difficult.

The evaluation tree consists of objects implementing IEvaluator<T> which provides methods for evaluating a single item, zero or one items, or a lazy sequence of items. The interface is generic in the type of the values returned and so can be used for unboxed evaluation of most expressions. A lot of the evaluator implementations themselves are implemented as generic classes (for example the evaluator for fn:subsequence), which means that unboxed evaluation is used in almost all cases where the static type of an expression allows it.

Another advantage to seperating out the evaluation tree from the syntax tree is that it has lowered the bar for specialized versions of expressions, as now only a new evaluator class needs to be written, rather than an entire (rather complicated) expression class. When adding a new expression class we also had to ensure that it would not prevent any other optimizations. Adding a new evaluator has none of these difficulties. This has meant that we have been able to add more specialized versions of expressions to be used at run time (for example a typeswitch expression for when all the types being tested against are subtypes of node()?) without harming other optimizations.

Tuesday, 15 September 2009

Extension methods for XPath 2.0 in .NET (C#)

.NET provides easy use of XPath with the Select(...) and SelectSingleNode(...) methods on XPathNavigator.

This blog post will show how you can use XmlPrime to add similar methods to provide XPath 2.0 support on XPathNavigator also.

This .NET 3.5 is required as this example uses extension methods.

First download and install XmlPrime, then in you project add a reference to the installed XmlPrime dll.

Now create a class with the following code:
using XmlPrime;
using System.Xml.XPath;

namespace xp.ExtensionMethods
{
public static class XPathNavigatorExtensions
{
public static XPathNavigator SelectSingleNode2(
this XPathNavigator navigator,
string expression)
{
var compiledExpression = XPath.Compile(expression, navigator.NameTable);
var item = compiledExpression.EvaluateToItem((XPathItem)navigator);
return (XPathNavigator)item;
}
}
}

This will add a method to XPathNavigator, so you can then use XPath 2.0 in .NET just by typing:

navigator.SelectSingleNode2("/foo/bar[matches(.,'[a-z]')]")

wherever you would have used navigator.SelectSingleNode(...), but now you can use all the XPath 2.0 functionality; for example regular expression functions.

Implementing an equivalent of navigator.Select(...) and similar functions on XmlNode is equally straightforward.

We plan to include such extension methods in a the XmlPrime library in a future release.

The compilation process

This is the second in a series of posts about performance in XmlPrime 1.0. The previous post in the series can be found here.

In order to better explain some of the optimizations we perform, I will first outline the query compilation process.

XQuery sourceParse
Syntax treeBind & Type Check
Compiled syntax treeOptimize
Optimized syntax tree


Firstly, the XQuery source is parsed to create an syntax tree. The syntax tree is a tree representation of the query, which can be processed further. The syntax tree was constructed mostly following the rules provided in the XQuery 1.0 and XPath 2.0 Formal Semantics specification, although we have started to deviate from the formal semantics to provide better error reporting (by providing more context about where the expression came from) or for increased performance (for example making a general comparison an atomic expression). There is still a lot more we can do in this area (for example not expanding out filter expressions until after type checking has been performed, which reduces the number of expressions we need to create).

The second stage of compilation is binding and type checking. Because of the way namespaces are bound in XQuery we do not necessarily know the namespaces of qualified names as we parse them. Consider the following query:

<e attr1="{$foo:bar}" 
xmlns:foo="http://www.example.org/"
/>

When parsing this query we do not know which namespace the prefix foo refers to until we have parsed the namespace declaration. To avoid parsing the query multiple times, we bind variables and functions in a separate pass. The binding pass works out the in-scope namespaces for any namespace-dependant expressions, and binds variable references and function calls to the variables and functions they refer to.

Once the expression is bound, the static types of all expressions are computed, and where a static type does not match an expected type for an expression, either an error is raised (if the static typing feature is enabled), or an extra run time check is inserted as appropriate.

After binding and type checking, we are left with the compiled syntax tree, which contains all the information we need to run the query.

The final compilation stage is optimization. This stage attempts to transform the compiled syntax tree with the aim to making the end result more efficient. There are many optimizations that XmlPrime performs, too numerous to list here. Since this optimization stage is a large part of what makes XmlPrime so fast, we will outline many of the optimizations we perform in future posts to this blog.

You can see the optimization process in action by using the XmlPrime command line tool. The -e flag indicates that the query plan should be outputted. The query plan is a textual representation of the abstract syntax tree. By using the -c (compile but don't run) and toggling the -o0 (disable optimizations) flag you can see the query plan before and after the optimization pass has taken place.

Monday, 7 September 2009

XmlPrime 1.0.2 Released

XmlPrime 1.0.2 is available now

This version fixes some bugs from the initial release.

Here is the full change log
  • Set default type checking mode to optimistic, as per documentation.
  • Fixed compatibility with XmlDocument
    • Fixed NullReferenceException when using data() caused by XmlDocument returning an IXmlSchemaInfo instance with null SchemaType.
    • Fixed infinite loop caused by XPathNodeIterators over XmlDocuments not consistantly returning false after the end of the node set.
  • Fixed the XQuery version declaration for cultures where '.' is not the decimal point.
  • Fixed normalization rules of general comparisons as per erratum XQ.E18.
  • For increased performance, the command line tool now runs in x86 mode by default.

Friday, 4 September 2009

XmlPrime 1.0 performance enhancements

This is the first post in a series of posts about the performance updates in XmlPrime.

First off, some numbers.

In order to demonstrate the performance changes I have run the 3 samples that we provide, with the command line tool in the XmlPrime 0.9 Technical Preview and the command line tool provided with XmlPrime 1.0. All these tests were run on an x64 machine with the x64 VM (under x86 the performance is much better - we will cover this in a later post).

The times are as reported by the command line tool, so the compile time includes parsing, optimization and compilation; the run time includes the time to evaluate the query and serialize the results.

Raytracer

The times here are from running ppm.xq of the final raytracer at a resolution of 640x480. Here is the command used to run this test:
XQuery -t ppm.xq width=640 height=480
compile(ms)run(ms)
0.91182412186
1.0163307593

Currency

These times were computed with a modified version of currency.svg that takes a source document as a context item, supplied as GBPNoon.xml. This is to avoid network latency from affecting the timings.

The command used to run this query was
XQuery -t -s GBPNoon.xml currencysvg.xq
compile(ms)run(ms)
0.911991344
1.064955

iTunes Artist Graph

The iTunes query used was the one from part 3 of the sample. This because the part 4 query is largely IO bound, as it retrieves a lot of data from Wikipedia.

The command used to run this query is:
XQuery -t iTunes3.xml
compile(ms)run(ms)
0.9145893483
1.08098529

This particular query runs slower in 1.0 than in 0.9 which we were not expecting. The reason behind this is that we changed some of the rules around variable dereferencing when we added support for modules to be compiled seperately. In this case global variables cannot be removed as they are part of the public interface of the query. Variables were being dereferenced in the main module, but their declarations were not being removed. Compounded with the fact that currently all global variables are evaluated eagerly meant that the source document was being validated several times. Suffice it to say after discovering this problem, the dereferencing issue has been fixed, and will be available in the next version available in the next couple of weeks. A run with the latest dev version brings the runtime down to 66884 ms.
[UPDATE (09/09/2009)]: These changes have been incorporated in XmlPrime 1.0.2

So how have we acchieved this performance boost? We have implemented a number of significant optimizations since 0.9. The biggest ones are listed here:
  • Unboxed evaluation of expressions
  • Improved strictness analysis
  • Unboxing variables
  • Static type analysis of recursive functions
  • Strictness and unboxing analysis of function arguments
  • Precompiling the DLL
  • Elimination of some quadratic algorithms during common sub-expression elimination
  • New XDM type system implementation, tighter bound to XML Schema
  • Function specialization
  • Specialization of core expressions
Most of these changes will probably sound quite cryptic, so this shall mark the start of a series of posts on how XmlPrime works under the hood, with in-depth articles on some of the different optimizations performed by XmlPrime.

Welcome

As has been announced elsewhere, XmlPrime 1.0 has been released.

We realise that it has been a long time (almost a year) since the 0.9 technical preview of XmlPrime was released and we have published very little information in the meantime.

Now XmlPrime has had a proper release, we have started this blog to provide information about the new and exciting features we are working on. We will start with a series of technical posts explaining some of the new performance enhancements we have been working on for XmlPrime 1.0.