Monday 6 September 2010

XmlPrime 2.0 Beta - XSLT 2.0 for the .NET Framework!

The question we get asked most often is: "When will XmlPrime support XSLT 2.0?".

Well, we finally have an answer: Now!

We are pleased to announce the XmlPrime 2.0 Beta Program. XmlPrime 2.0 includes a native XSLT 2.0 processor for the .NET Framework.

You can download a copy of XmlPrime 2.0 beta now to test the new functionality. Please leave any comments on our forum.

We expect the beta program to finish at some point in the next month, at which time we intend to release a fully supported version.

XmlPrime 1.5

Last week we released XmlPrime 1.5.

The main improvement in this version is with the performance of recursive user defined functions.

The biggest improvements are for functions that recurse over a sequence. Consider the following user defined function:

declare function local:product2($seq as xs:decimal*,
$product as xs:decimal)
as xs:decimal
{
if (empty($seq))
then $product
else local:product2(subsequence($seq,2), $seq[1] * $product)
};

declare function local:product($seq as xs:decimal*)
as xs:decimal
{
local:product($seq, 1)
};

In previous versions of XmlPrime, the subsequence operation would create a closure and each call would have an increasing number of closures to evaluate the argument. In XmlPrime 1.5.0 subsequence operations are treated as a special case and a closure is no longer required. Look out for a future blog post explaining this optimization in more detail, and how to take advantage of it.

The optimizations in this version particularly help computationally-expensive queries - the raytracer sample shows a 20% speed improvement.

As always, there are a large number of other performance improvements and bug fixes so we highly recommend all our users update to this version. For more details as to what has changed in this version, see the changelog.

As always, XmlPrime 1.5 is available to download for free for trial and non-commercial use.

Tuesday 25 May 2010

Join optimizations in XmlPrime

XmlPrime has "Join Optimizations" listed on its feature list. What exactly are these join optimizations and how are they used?

The join optimizations performed by XmlPrime attempt to reduce the algorithmic complexity of complex queries. In theory you do not need to do anything to take advantage of these join optimizations - if your query contains a join-like construct then it should be optimized to a join. However it is good to be aware of the optimizations that are performed so you can understand why a join has not been spotted, or what joins are possible in XmlPrime.

Tuples


Consider the following query:

for $x in 1 to 10
let $y := $x * 2
return $y

An algebra (at least for our purposes) is a system for modelling a query.
For most of the compilation process we use an algebra in which the previous expression looks something like this: (note that I have made up this syntax for the purpose of explaining the algebra)

For($x)
Range
Constant(1)
Constant(10)
Let($y)
Multiply
VariableReference($x)
Constant(2)
VariableReference($y)

The for expression enumerates over its domain and for each item sets a value of $x and evaluates its return expression (in this case the let expression).

When we perform join optimizations we first transform the query to a different algebra (a tuple algebra). In the tuple algebra, each FLWOR clause is an expression that returns a set of tuples. A tuple assigns values to a set of variables. For example for $x in 1 to 10 returns a sequence of 10 tuples. The first one sets the value of $x to 1, the second tuple sets the value of $x to 2 and so on. Our previous example looks like this (again the notation is made up).

Return
Concatenate
For($x)
Range
Constant(1)
Constant(10)
Let($y)
Multiply
VariableReference($x)
Constant(2)
VariableReference($y)

The Concatenate expression above takes each tuple from the input, and then uses it to generate a sequence of tuples from the right hand input. In this case the value of $x from each input tuple is used to generate a single tuple that defines $y. Each tuple returned from the second input is combined with the tuple from the left input, and the result is combined. In this case for each tuple that defines $x we generate a tuple that defines $y and return a single tuple that defines both $x and $y.

Note that this change of algebra is merely a different way of describing the query, so far this hardly affects how the query is run. In practice "returns a tuple" means "sets some variables" and "combining two tuples" means "setting some variables, and then setting some other variables".

In a future version of XmlPrime we intend to eliminate the first algebra entirely and perform all our optimizations with this join algebra. A number of the new expressions added in XQuery 1.1, eg. group by fit a lot more cleanly with the tuple algebra.

We can perform certain optimizations in this algebra that are not possible (or at least very difficult) in the basic algebra above. These optimizations are the join optimizations.

Products


The simplest join optimization performed by XmlPrime is forming products. A product is constructed when two independent expressions are concatenated. The expressions are independent if none of the variables defined in the first expression are used in the second expression.

Here is an example:

for $row in /rows/row
for $column in /columns/column
return
<intersection row="{$row}" column="{$column}" />

In tuple algebra, this gets converted to something like the following:

Return
Product
For($row)
Step(child::row)
Step(child::rows)
Root
For($column)
Step(child::column)
Step(child::columns)
Root
Element(intersection)
Sequence
Attribute(row)
VariableReference($row)
Attribute(column)
VariableReference($column)

Below is an excerpt from the query plan, showing this product:

product
for $row in
step
$fs:dot_0/child::rows
child::row
with
for $column in
step
$fs:dot_0/child::columns
child::column

The original query plan looks like the columns would be navigated once for each row retrieved from the document. The product prevents this. When the product is evaluated it stores a list of all the tuples from the right hand side, and then each tuple in the left gets combined with each tuple from the right, without reevaluating the inner for loop. In practical terms, all the assignments to $y are recorded and replayed for each value of $x.

For this case the result can be mimicked by storing the definition of $y in a variable, but in the general case the product may not be storing just the values of a single for clause, but a whole set of FLWOR clauses.

Joins


A join is formed when a where clause depends on both sides of a product.

Here is an example query with a join:

for $author in doc("bookshelf.xml")/book/author
for $book in doc("bookshop.xml")/book
where $book/author = $author
return $book

Note that the where expression applies a predicate to a sequence of tuples returning only those which match. Since the two for expressions are independent they form a product, and since the where clause depends on both sides of the product, this is a join.

In this form joins are evaluated no differently to products with a where clause. This is known as a nested loop join. The predicate is applied to the product of the tuples returned from the two for expressions. This gives quadratic performance.


In the query plan a join looks something like the following:

hash join
for $author in doc("bookshelf.xml")/book/author
to
for $book in doc("bookshop.xml")/book
on
$book/author = $author


Hash Join


If he predicate of a join is an equality operator the join then gets converted to a hash join. The expressions on either side of the equality operator are known as the keys. When the right (second) list of tuples is constructed, the right key is evaluated for each tuple. The tuples are stored in a Dictionary mapping from a given key value to a list of tuples that match this key (this dictionary is known as the join index. Once this dictionary is built up then for each left tuple (from the first input), the left key is evaluated. This key is used to look up the matching tuples from the right hand side and and the results are combined and returned. Assuming that only a few tuples from the right hand side match each time then this reduces the complexity from O(n2) to O(n) (assuming constant overhead for a dictionary lookup).

There were many complications in implementing such a join, especially in the general case when the arguments have type xs:anyAtomicType*. If two types are incomparable, a type error should be raised. xs:untypedAtomic values need to be converted to the base type of every type that appears in the other key, and used for comparison. numeric types should be compared as if they had been promoted to the lowest base type. Numeric NaN values never match. If keys are sequences they may match more than one key in the index, but all the matching tuples must be returned once, and in the order they were encountered. Because of all these potential issues, there are several implementations of the hash join in XmlPrime which tackle different combinations of these issues. All the implementations have the same O(n) basic performance characteristics.

Left Outer Joins


A left outer join is similar to a join, but the results are grouped for each input argument. Here is an example of a query with a left outer join:

for $user in doc("users.xml")/user
return
<user name="{$user/@name}">
{
for $comment in doc("comments.xml")/comment
where $comment/@user = $user/@name
return $comment
}
</user>

When converting this query into the join algebra, XmlPrime extracts each seperate FLWOR expression and puts it into a new let clause (with a new variable; in this case we will call it $comments), in order to group all FLWOR clauses together:

for $user in doc("users.xml")/user
let $comments := for $comment in doc("comments.xml")/comment
where $comment/@user = $user/@name
return $comment
return
<user>{$comments}</user>

The following pattern is then spotted as a left outer join.

[ClausesA]
let $a := [ClausesB]
where [Predicate]
return [AggregateExpr]

The above pattern is only a Left outer join if ClausesA and ClausesB are independent, and if Predicate depends
on both ClausesA and ClausesB. Note that in order to reach this condition some (let) clauses may need to be moved from ClausesB to ClausesA and some clauses may need to be moved from ClausesA and concatenated to the entire expression.

The left outer join is evaluated like a standard join, except for each tuple stored from the right hand side we also compute the aggregate expression (or if the join is evaluated lazily, the variables required to compute the aggregate expression) and then for each left hand tuple we find all the tuples that match it. We get (or compute) the aggregate value for these tuples, concatenate (this is normal sequence concatenation, not tuple concatenation) the aggregate values and then store the result in $a.

As with the standard join, both nested-loop and hash variants are supported. Again, in the case of the hash join, assuming only a few tuples match in each case, this reduces an O(n2) operation to O(n).

In the query plan a left outer hash join looks something like the following:

left outer hash join
for $user in doc("users.xml")/user
to
for $comment in doc("comments.xml")/comment
on
$comment/@user = $user/@name
aggregate
$comment
as $comments


Group-by


One particularly common form of the left outer join is the group-by. A group by clause is where you want to group all items in a sequence by a particular key. Here is an example which groups a list of books by their author.

let $books := doc("bookshelf.xml")/books/book
for $author in distinct-values($books/@author)
return
<author name="{$author}">{
$books[@author=$author]
}</author>

Using the join optimizations so far described this will optimize to a left outer join (and inventing a couple of variables, $book and $matching-books):

let $books := doc("bookshelf.xml")/books/book
left outer hash join
for $author in distinct-values($books/@author)
to
for $book in $books
on
$book/@author = $author
aggregate
$book
as
$matching-books
return
<author name="{$author}">{$matching-books}</author>

The group by is formed when you have a left outer join with the following form:

left outer hash join
for $x in distinct-values(keys($sequence))
to
for $y in $sequence
on
$x = key($y)
aggregate
$y
as
$group

In the above plan keys and key are general expressions depending on the variables shown. A left outer join is formed if applying keys to a sequence is the same as applying key to each item in the sequence and concatenating the result. Also with the current implementation key must return zero or one items.

Determining the condition above (applying keys to a sequence is the same as applying key to each item in the sequence) in general is not that easy, and our implementation is not yet perfect, although this improves with every release.

Note also that in practice the left outer join is liable to have many more clauses (in particular common subexpression elimination introduces a lot of let clauses), and these may have to be moved out of the join in order to get it into this form.

If XmlPrime does manage to manipulate the query to be in this form, and all the conditions are satisfied, then the query plan now looks like this:

for $book in doc("bookshelf.xml")/books/book
group by
$book/@author
aggregate
$book
as
$matching-books
return
<author name="{$author}">{$matching-books}</author>

The group by clause combines the join and the distinct values operation. It builds a join index, storing the aggregate values as it goes. Iterating over this join index then provides the values for the aggregate variable. Over the left outer join, this group by reduces the amount of memory used, and ensures that the input expression is evaluated only once. In the best case this can reduce the memory usage by more than half. This is because in the left outer join, the input expression is used twice and so all its values are stored in a variable. Also an additional hash table is built up to perform the distinct-values operation.

Conclusion


Hopefully this post has shed some light on the various join optimizations that are performed in XmlPrime. By examining the query plan of a query you can see which join optimizations have been applied.

Most of the improvements to join optimizations come from rewriting queries to make them closer to the forms above. If you have a query which you believe should contain one of the above optimazations, but does not then please let us know as we are keen to spot as many different ways of writing a join as possible.

Tuesday 11 May 2010

XmlPrime 1.4.0 Released

The release of XmlPrime 1.4.0 has arrived.

The biggest new feature in this version is the ability to write modules as classes in .NET, and call them from XQuery or XPath. Here is an example of a module written in C#

[XdmModule("http://www.mymodule.com/")]
public class MyModule
{
[XdmFunction("green-bottles")
public static XPathAtomicValue GreenBottles(XPathAtomicValue count)
{
var builder = new StringBuilder();
for (var x=count.ValueAsDecimal; x>0; x--)
{
builder.Append(x);
builder.Append(" green ");
builder.Append(x == 1 ? "bottle" : "bottles");
builder.Append(" sitting on the wall\n")

builder.Append(x);
builder.Append(" green ");
builder.Append(x == 1 ? "bottle" : "bottles");
builder.Append(" sitting on the wall\n")

builder.Append("And if one green bottle should accidently fall\n")

builder.Append("There'll be ");
builder.Append(x == 1 ? "no" : x - 1);
builder.Append(" green ");
builder.Append(x == 2 ? "bottle" : "bottles");
builder.Append(" sitting on the wall\n")

if (x != 1)
builder.Append("\n");
}

return XPathAtomicValue.Create(builder.ToString());
}
}

Which can now be called from XQuery as so:

import module namespace my="http://www.mymodule.com/";

<lyrics>{my:green-bottles(10)}</lyrics>

More detailed information about the improved modules support can be found here.

There have also been a number of performance improvements, in particular to join analysis.

For the full list of new features see the change log.

As usual, you can download your free copy now (for trial or non-commercial use).

Friday 1 January 2010

Push-based evaluation

The release of XmlPrime 1.2 has introduced eager evaluation. We use the term eager to refer to an expression whose result is consumed in its entirety. Eager evaluation is an evaluation technique that produces the entire result before returning.

Any expression returning a sequence of items was previously evaluated with lazy evaluation. This method of evaluation returns an IEnumerator<T> instance which computes the values of an expression as they are needed, they are "pulled" from the result of the expression.

In XmlPrime 1.2 when an expression is eager (all its values will be read) the expression is evaluated with a push based strategy. Control flow remains with the expression being evaluated, and the results are "pushed" through to the caller, either by populating a list of values, or by passing the result to a callback function that perfoms further processing on each item.

A second form of eager evaluation has also been added. The purpose of this form of evaluation is to stream the result of a query to the serializer. Previously only complete nodes could be returned. This new method splits nodes into a series of events (very similar to the methods on XmlWriter). These are then passed directly to the serializer.

Both the previous methods treat an item as an atomic value. Historically that meant that for any XML data to be returned, a tree had to be constructed. The performance was worst with constructed nodes wrapping input nodes, like the following:

<results>{//result}</results>

The path expression itself returns a sequence of navigators, but in order to incorporate them in a result tree they have to be copied into an internal representation. This copying process effectively doubles the memory usage of the process, and causes a lot more work than is necessary, especially if the result is then going to be written out and discarded. With the new serialized evaluation, the copying is bypassed and the navigators are passed directly to the serializer.

Being able to write the result directly to the serializer is the most obvious benefit of eager evalution. Eager evaluation however is faster in almost all circumstances where all the values are consumed. To illustrate these benefits I will have to go into more depth with the implementation details.

The final pass of compilation transforms the query in to a tree of IEvalauator instances which provide methods to evaluate the expression. The IEvaluator interface provides all the different evaluation methods.

For this example I will consider the evaluation of a sequence of expressions that are concatenated together. This is implemented as an evaluator class in XmlPrime. The lazy evaluation method returns an instance of the following enumerator:


internal sealed class FlattenSequenceEnumerator<T> : IEnumerator<T>
{
private readonly DynamicContext _context;
private readonly IEvaluator<T>[] _subsequences;
private ISeekableEnumerator<T> _currentEnumerator;
private int _nextIndex;

public FlattenSequenceEnumerator(IEvaluator<T>[] subsequences,
DynamicContext context)
{
_subsequences = subsequences;
_currentEnumerator = _subsequences[0].EvaluateLazySequence(context);
_nextIndex = 1;
_context = context;
}

public override T Current
{
get
{
return _currentEnumerator.Current;
}
}

public override bool MoveNext()
{
while (!_currentEnumerator.MoveNext())
{
if (_nextIndex >= _subsequences.Length)
return false;
_currentEnumerator =
_subsequences[_nextIndex++].EvaluateLazySequence(_context);
}

return true;
}
}


Note: The code has been simplified and reformatted for this blog, and some members have been omitted.

Every time an item is retrieved, the MoveNext method and the Current property is retrieved. Both these calls pass execution to the next enumerator in the chain. However the MoveNext method then needs to check if the enumerator has been exhausted, and if so obtain the next enumerator in the sequence. The overhead of concatenating sequences is two virtual function calls and some state checking for every item returned. Whilst this doesn't sound like much remember that this is just a small part of a tree of evaluations and in more complicated expressions (for example the closures to lazy arguments to recursive functions) this can extend to hundreds of virtual calls for each item returned, from overhead from using enumerators.

Admittedly there are tricks that can be used to reduce the number of virtual calls, for example merging the MoveNext and Current method into one function, or introducing a method that allows a new enumerator to be returned, which would avoid the extra calls for the final argument. Whilst these would reduce the overhead, it would not reduce the fact that the overhead grows proportionally to both the complexity of the expression and the number of items returned.


Now let us consider the eager evaluation methods for this unit:


public override void EvaluateEagerSequence(List<T> items,
DynamicContext context)
{
for (var i=0; i<_evaluators.Length; i++)
_evaluators[i].EvaluateEagerSequence(items, context);
}

public override void EvaluateEagerSequence(Action<T> action,
DynamicContext context)
{
for (var i = 0; i < _evaluators.Length; i++)
_evaluators[i].EvaluateEagerSequence(action, context);
}


The overhead in this case of enumerating the sequence of items is just the cost of enumerating over an array. There are no per-item costs.

So why don't we use this evaluation method all the time? The main advantages of lazy evaluation is that the values are not computed until they are needed, and if they are not needed they are not computed. Lazy evaluation leaves the caller in full control of the program flow, but eager evaluation keeps control of the program flow and only returns it once all the values have been computed.

The costs are inherent in the IEnumerator model, and derive from the fact that the enumerator must be yield control of what it is doing and be able to continue where it left off.

The ability to abort computation from an eager evaluation could be implemented by raising an exception when the last item required has been processed (although this comes at a cost).

The most efficient way to implement the ability to halt and continue processing in eager evaluation is with coroutines. This would combine all the benifits of both eager and lazy evaluation. Support for coroutines have recently been added to Mono. They are not supported by the Microsoft .NET runtime however and will not be in the near future so it may be many years before we can try this approach.