Tuesday, 11 May 2010

XmlPrime 1.4.0 Released

The release of XmlPrime 1.4.0 has arrived.

The biggest new feature in this version is the ability to write modules as classes in .NET, and call them from XQuery or XPath. Here is an example of a module written in C#

[XdmModule("http://www.mymodule.com/")]
public class MyModule
{
[XdmFunction("green-bottles")
public static XPathAtomicValue GreenBottles(XPathAtomicValue count)
{
var builder = new StringBuilder();
for (var x=count.ValueAsDecimal; x>0; x--)
{
builder.Append(x);
builder.Append(" green ");
builder.Append(x == 1 ? "bottle" : "bottles");
builder.Append(" sitting on the wall\n")

builder.Append(x);
builder.Append(" green ");
builder.Append(x == 1 ? "bottle" : "bottles");
builder.Append(" sitting on the wall\n")

builder.Append("And if one green bottle should accidently fall\n")

builder.Append("There'll be ");
builder.Append(x == 1 ? "no" : x - 1);
builder.Append(" green ");
builder.Append(x == 2 ? "bottle" : "bottles");
builder.Append(" sitting on the wall\n")

if (x != 1)
builder.Append("\n");
}

return XPathAtomicValue.Create(builder.ToString());
}
}

Which can now be called from XQuery as so:

import module namespace my="http://www.mymodule.com/";

<lyrics>{my:green-bottles(10)}</lyrics>

More detailed information about the improved modules support can be found here.

There have also been a number of performance improvements, in particular to join analysis.

For the full list of new features see the change log.

As usual, you can download your free copy now (for trial or non-commercial use).

Friday, 1 January 2010

Push-based evaluation

The release of XmlPrime 1.2 has introduced eager evaluation. We use the term eager to refer to an expression whose result is consumed in its entirety. Eager evaluation is an evaluation technique that produces the entire result before returning.

Any expression returning a sequence of items was previously evaluated with lazy evaluation. This method of evaluation returns an IEnumerator<T> instance which computes the values of an expression as they are needed, they are "pulled" from the result of the expression.

In XmlPrime 1.2 when an expression is eager (all its values will be read) the expression is evaluated with a push based strategy. Control flow remains with the expression being evaluated, and the results are "pushed" through to the caller, either by populating a list of values, or by passing the result to a callback function that perfoms further processing on each item.

A second form of eager evaluation has also been added. The purpose of this form of evaluation is to stream the result of a query to the serializer. Previously only complete nodes could be returned. This new method splits nodes into a series of events (very similar to the methods on XmlWriter). These are then passed directly to the serializer.

Both the previous methods treat an item as an atomic value. Historically that meant that for any XML data to be returned, a tree had to be constructed. The performance was worst with constructed nodes wrapping input nodes, like the following:

<results>{//result}</results>

The path expression itself returns a sequence of navigators, but in order to incorporate them in a result tree they have to be copied into an internal representation. This copying process effectively doubles the memory usage of the process, and causes a lot more work than is necessary, especially if the result is then going to be written out and discarded. With the new serialized evaluation, the copying is bypassed and the navigators are passed directly to the serializer.

Being able to write the result directly to the serializer is the most obvious benefit of eager evalution. Eager evaluation however is faster in almost all circumstances where all the values are consumed. To illustrate these benefits I will have to go into more depth with the implementation details.

The final pass of compilation transforms the query in to a tree of IEvalauator instances which provide methods to evaluate the expression. The IEvaluator interface provides all the different evaluation methods.

For this example I will consider the evaluation of a sequence of expressions that are concatenated together. This is implemented as an evaluator class in XmlPrime. The lazy evaluation method returns an instance of the following enumerator:


internal sealed class FlattenSequenceEnumerator<T> : IEnumerator<T>
{
private readonly DynamicContext _context;
private readonly IEvaluator<T>[] _subsequences;
private ISeekableEnumerator<T> _currentEnumerator;
private int _nextIndex;

public FlattenSequenceEnumerator(IEvaluator<T>[] subsequences,
DynamicContext context)
{
_subsequences = subsequences;
_currentEnumerator = _subsequences[0].EvaluateLazySequence(context);
_nextIndex = 1;
_context = context;
}

public override T Current
{
get
{
return _currentEnumerator.Current;
}
}

public override bool MoveNext()
{
while (!_currentEnumerator.MoveNext())
{
if (_nextIndex >= _subsequences.Length)
return false;
_currentEnumerator =
_subsequences[_nextIndex++].EvaluateLazySequence(_context);
}

return true;
}
}


Note: The code has been simplified and reformatted for this blog, and some members have been omitted.

Every time an item is retrieved, the MoveNext method and the Current property is retrieved. Both these calls pass execution to the next enumerator in the chain. However the MoveNext method then needs to check if the enumerator has been exhausted, and if so obtain the next enumerator in the sequence. The overhead of concatenating sequences is two virtual function calls and some state checking for every item returned. Whilst this doesn't sound like much remember that this is just a small part of a tree of evaluations and in more complicated expressions (for example the closures to lazy arguments to recursive functions) this can extend to hundreds of virtual calls for each item returned, from overhead from using enumerators.

Admittedly there are tricks that can be used to reduce the number of virtual calls, for example merging the MoveNext and Current method into one function, or introducing a method that allows a new enumerator to be returned, which would avoid the extra calls for the final argument. Whilst these would reduce the overhead, it would not reduce the fact that the overhead grows proportionally to both the complexity of the expression and the number of items returned.


Now let us consider the eager evaluation methods for this unit:


public override void EvaluateEagerSequence(List<T> items,
DynamicContext context)
{
for (var i=0; i<_evaluators.Length; i++)
_evaluators[i].EvaluateEagerSequence(items, context);
}

public override void EvaluateEagerSequence(Action<T> action,
DynamicContext context)
{
for (var i = 0; i < _evaluators.Length; i++)
_evaluators[i].EvaluateEagerSequence(action, context);
}


The overhead in this case of enumerating the sequence of items is just the cost of enumerating over an array. There are no per-item costs.

So why don't we use this evaluation method all the time? The main advantages of lazy evaluation is that the values are not computed until they are needed, and if they are not needed they are not computed. Lazy evaluation leaves the caller in full control of the program flow, but eager evaluation keeps control of the program flow and only returns it once all the values have been computed.

The costs are inherent in the IEnumerator model, and derive from the fact that the enumerator must be yield control of what it is doing and be able to continue where it left off.

The ability to abort computation from an eager evaluation could be implemented by raising an exception when the last item required has been processed (although this comes at a cost).

The most efficient way to implement the ability to halt and continue processing in eager evaluation is with coroutines. This would combine all the benifits of both eager and lazy evaluation. Support for coroutines have recently been added to Mono. They are not supported by the Microsoft .NET runtime however and will not be in the near future so it may be many years before we can try this approach.

Tuesday, 22 December 2009

XmlPrime 1.2 Released

XmlPrime 1.2 has just been released.

The major new features in this release are push-based evaluation.

Queries can now be evaluated directly to an XmlWriter or directly serialized to a TextWriter or Stream. This prevents an immediate result tree from being constructed, which reduces memory usage, and allows the result to be returned sooner, and quicker.

All expressions returning sequences can now be evaluated via push-based mechanisms. This method of evaluation is more efficient, but only where the entire sequence is consumed. This optimization is applied automatically to any expression that is consumed eagerly.

Expect a post next week with a more detailed analysis of this feature.

See the full change log for more information about the changes in this version, or download the new version now.

Thursday, 17 December 2009

Avoiding StackOverflowExceptions

This post describes several situations that can cause StackOverflowExceptions in XmlPrime, and how to avoid them.

Look at the following fairly innocuous XQuery function:

declare function local:fact($x)
{
if ($x = 0)
then 1
else $x * local:fact($x-1)
};

As you probably expect this function works fine for small values of $x, but as $x grows large then the function will overflow the stack. The stack is overflowed far more quickly than a direct C# implementation of this function, as a function call is made for each expression. In this case the call stack contains one call to prepare an (XmlPrime) stack frame, one call to bind the arguments, one call to evaluate the if expression, and finally a call to evaluate the multiplication operator before recursing. This means that the stack will overflow quite quickly.

So, what can we do about this?

The trick to solving this problem is an optimization called tail recursion. If a function returns the result of a function call as a value, then the runtime can evaluate this new function call in place of the old one, without growing the call stack. Let me try to demonstrate what I mean by this. The factorial function we defined above is not tail recursive, as the value from the recursive function call is not returned directly, but is multiplied by $x after it is returned. We can rewrite the function in order to make it tail recursive, by adding an accumulating argument:

declare function local:fact2($x, $accum)
{
if ($x = 0)
then $accum
else local:fact($x - 1, $accum * $x)
};

declare function local:fact($x)
{
local:fact($x, 1)
};

Note that the value of the recursive call is now returned directly, this means that the tail-recursion optimization can be applied. Evaluating this function is now a lot more like evaluating the following C# function:

int Fact(int x)
{
int accum = 1;
while (x != 0)
{
x = x-1;
accum = accum * x;
}
return accum;
}

As you can see, evaluated in this way the function should no longer trigger a stack overflow exception. Unfortunately though, the function had to be manually rewritten in order to take advantage of this optimization. In the future we would like to try to make this transformation automatically, to widen the set of functions in which tail recursion can be applied.

Does this solve our problem? For this case, yes, but things can get more complicated.

Consider writing our own fold function. fold takes a function f that takes two arguments and returns an accumulated value. fold applies this function to a sequence by applying it to current result and each item in turn to build up a final result. For example if we fold the addition function over a sequence we get the sum of the sequence, and if we fold the multiply function over a sequence we get the product.

In XQuery 1.1, fold can be written like this:

declare function local:fold($func as function(item()*, item()) as item()*,
$seq as item()*,
$accum as item()*)
{
if (empty($seq))
then $accum
else local:fold($func, subsequence($seq, 2), $func($accum, $seq[1]))
};

However with XQuery 1.0, which does not support higher-order functions, we have to fake it by passing around a node to represent the function, and writing an apply function that applies it:

declare function local:apply($func as node(),
$accum as item()*,
$arg as item())
{
typeswitch ($func)
case element(add) return $accum + $arg
case element(multiply) return $accum * $arg
case element(reverse) return ($arg, $accum)
...
default error((), "Function not recognised")
};

declare function local:fold($func as node(),
$seq as item()*,
$accum as item()*)
{
if (empty($seq))
then $accum
else local:fold($func,
subsequence($seq, 2),
local:apply($func, $accum, $seq[1]))
};

local:fold is tail recursive and so we should never get a stack overflow, right?

Unfortanately, this is not the case. The problem occurs because local:apply (or $func in the XQuery case) is not nescassarily strict in the accumulator argument. In the XQuery 1.0 case this is because we might hit the default clause in the local:apply function, and in the XQuery 1.1 case we might have a function like:

declare function last($accum as item()*, $item as item()) as item*
{
$item
};

which when folded over a sequence returns the last item.

Aside: In the XQuery 1.0 version, if you are to call fold with a statically known function, then XmlPrime may specialize the fold function and create a new version inlining the function application. This simplifies the function enough to identify that the argument is strict. This may not always be the case though, so it shouldn't be relied on.

Since $accum may not be evaluated, it is defined lazily. So rather than applying the function and passing the value to the recursive function, a closure is created and the value is computed later.

Consider evaluating the function call

local:fold(<add />, (1,2,3,4,5), 0)

After the first iteration the closure for $accum looks like:

local:apply(<add />, 0, 1)

The next iteration applies the value 2 to $accum, and so the closure for $accum looks like

local:apply(<add />,
local:apply(<add />, 0, 1), 2)

And when the function finally returns $accum it returns a closure that evaluates

local:apply(<add />,
local:apply(<add />,
local:apply(<add />,
local:apply(<add />,
local:apply(<add />, 0, 1), 2), 3), 4), 5)

When $seq is quite large, computing the result of this closure will result in a stack overflow error.

So how can we fix this?

In order to fix this we need to ensure that the argument for the recursive call is evaluated fully. In the general case there is no easy way to fix this beyond ensuring that the argument is used entirely in every eventuality. In order to facilitate this we have added a new xmlprime:eager extension expression that will be available in the upcoming XmlPrime 1.1 release.

To force the argument to be fully evaluated we can annotate the expression like this:

declare namespace xp = "http://www.xmlprime.com/"

declare function local:fold($func as node(),
$seq as item()*,
$accum as item()*)
{
if (empty($seq))
then (# xmlprime:eager #) { $accum }
else local:fold($func,
subsequence($seq, 2),
local:apply($func,
(# xmlprime:eager #) { $accum },
$seq[1])))
};

Now $accum is evaluated fully in both branches. This forces the strictness analysis to identify $accum as a strict variable and hence the function becomes a simple tail recursive function and is more resiliant to stack overflow errors.

This fold function still has a problem that can cause stack overflow errors. The $seq argument is evaluated lazily (as each iteration only looks at the first value of $seq). Whilst technically the function is strict in $seq (as it does inspect every value eventaully), the current analysis engine is not capable of spotting this. It is impossible to always determine the strictness of an argument with 100% certainty as this can be specialized to the halting problem. The value of $seq ends up in a closure that looks something like this:

subsequence(
subsequence(
subsequence(
subsequence(
subsequence(..., 2), 2), 2), 2), 2)

This will eventually cause stack overflow errors.

This case of using the first item of a sequence and recursing with the remainder of the sequence is an important and common one in functional programs. In a future version of XmlPrime we would like to spot this case, read the first few items from the enumerator, and recurse by passing the same enumerator to the next iteration. This would avoid building a closure to evaluate the subsequence operation, and avoid caching the values returned and so should drastically improve performance of this kind of function (probably by a factor of about O(n) where n is the depth of recursion). Unfortunately this feature has not yet been implemented, and so this problem enforces a limit on the number of items in a sequence that can be processed in this manner.

You may be wondering why it is necessary to put the eager extension on the then branch of the if expression. Consider the following query, which reverses a sequence:

local:fold(<reverse />, $seq, ())

This query behaves properly and avoids the problem. However, lets make a simple modification to the query, to return the $nth last item in $seq:

local:fold(<reverse />, $seq, ())[$n]

For large values of $n removing the xmlprime:eager around the then clause this would cause the query to stack-overflow. This is because we do not always use all of the result of the function, and so the variable is evaluated lazily. This causes a tree of enumerators that perform the concatenation to be produced, and can cause a stack overflow when iterating over the sequence.

This case is another case that we would like to fix in a future version of XmlPrime. To fix this we would need to implement a custom MoveNext method that rather than returning a boolean value would return a new IEnumerator to use to enumerate the rest of the sequence (or null if we are at the end of the sequence). This would avoid pushing a new (C#) stack frame for each concatenation in a lazy result.

Monday, 2 November 2009

Using XmlPrime to access SQL databases

One of the uses of XQuery is to query relational databases. This posting is going to demonstrate how one might use XmlPrime to access a SQL database.

To access various data sources you can write a collection resolver implementing the ICollectionResolver interface. To do this you need to implement the following method:
public IXPathNavigable[] Resolve(AnyUri uri,
DocumentSet documentSet,
XmlNameTable nameTable,
XsltWhitespaceSettings whitespaceSettings)

First thing we need is a URI format to indicate we are connecting to the database, for example:

mssql://username:password@server/database/table


We need to convert this URI to a SQL connection string, this can be done as follows:

var connectionString = new StringBuilder();
connectionString.Append("Server=" + uri.Host + ";");
var split = uri.Path.Split('/');
var table = split[2];
connectionString.Append("Database=" + split[1] + ";");
var userInfo = uri.UserInfoItems;
if (userInfo == null)
connectionString.Append("Trusted_Connection=True;");
else
{
connectionString.Append("User ID=" + userInfo[0] +
";Password="+userInfo[1] + ";");
}


We can now connected to the database and generate some XML nodes for our collection. The XML format is going to be:

<row column1="value1" column2="value2"/>
<row column1="value3" column2="value4"/>


We now open the database and get the data from the specified table:

using (var connection = new SqlConnection(connectionString.ToString()))
{
connection.Open();
var command = new SqlCommand("SELECT * FROM " + table, connection);
var collection = new List<IXPathNavigable>();
var doc = new XmlDocument(nameTable);
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
...
}
}
return collection.ToArray();
}


The XML nodes need to be created with the same name table as the query is using, hence an XmlDocument is created with the name table passed into the Resolve method.

Next we need some code to put in the while loop to generate the XML elements for each row. A certain degree of mapping is required between SQL field types and XML types. This code shows a subset of possible field types for demonstration purposes.

var row = doc.CreateElement(table);
for(var i = 0;i<reader.FieldCount;i++) {
if (!reader.IsDBNull(i))
{
var type = reader.GetFieldType(i);
XPathAtomicValue value;
if(type == typeof(Boolean))
value = XPathAtomicValue.Create(reader.GetBoolean(i));
else if(type = typoeof(Byte[])) {
var length = reader.GetBytes(i, 0, null, 0, 0);
var bytes = new byte[length];
reader.GetBytes(i, 0, bytes, 0, (int)length);
value = XPathAtomicValue.CreateBase64Binary(bytes);
}
else if (type == typeof(DateTime))
value = XPathAtomicValue.Create(
new DateTimeWithZone(reader.GetDateTime(i), (TimeSpan?)null)
);
else if (type == typeof(Int32))
value = XPathAtomicValue.Create(reader.GetInt32(i));
else if (type == typeof(String))
value = XPathAtomicValue.Create(reader.GetString(i));
else
// TODO: similar methods will need to be implemented
// for other SQL field types
throw new NotImplementedException();
var attr = d.CreateAttribute(reader.GetName(i));
attr.Value = value.Value;
row.Attributes.Append(attribute);
}
}
collection.Add(row);


So the whole class will be:

using System;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.Text;
using System.Xml;
using System.Xml.XPath;

namespace xp.Connectors
{
public class SimpleSqlCollectionResolver : ICollectionResolver
{
public IXPathNavigable[] Resolve(AnyUri uri,
DocumentSet documentSet,
XmlNameTable nameTable,
XsltWhitespaceSettings whitespaceSettings)
{
if (uri != null && uri.Scheme == "mssql" && uri.Path != null)
{
var connectionString = new StringBuilder();
connectionString.Append("Server=" + uri.Host + ";");
var split = uri.Path.Split('/');
var table = split[2];
connectionString.Append("Database=" + split[1] + ";");
var userInfo = uri.UserInfoItems;
if (userInfo == null)
connectionString.Append("Trusted_Connection=True;");
else
{
connectionString.Append("User ID=" + userInfo[0] +
";Password="+userInfo[1] + ";");
}

using (var connection = new SqlConnection(connectionString.ToString()))
{
connection.Open();
var command = new SqlCommand("SELECT * FROM " + table, connection);
var collection = new List<IXPathNavigable>();
var doc = new XmlDocument(nameTable);
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
var row = doc.CreateElement(table);
for (var i = 0; i < reader.FieldCount; i++)
{
if (!reader.IsDBNull(i))
{
var type = reader.GetFieldType(i);
XPathAtomicValue value;
if (type == typeof(Boolean))
value = XPathAtomicValue.Create(reader.GetBoolean(i));
else if (type == typeof(Byte[]))
{
var length = reader.GetBytes(i, 0, null, 0, 0);
var bytes = new byte[length];
reader.GetBytes(i, 0, bytes, 0, (int)length);
value = XPathAtomicValue.CreateBase64Binary(bytes);
}
else if (type == typeof(DateTime))
value = XPathAtomicValue.Create(new DateTimeWithZone(reader.GetDateTime(i), (TimeSpan?)null));
else if (type == typeof(Int32))
value = XPathAtomicValue.Create(reader.GetInt32(i));
else if (type == typeof(String))
value = XPathAtomicValue.Create(reader.GetString(i));
else
throw new NotImplementedException();
var attr = doc.CreateAttribute(reader.GetName(i));
attr.Value = value.Value;
row.Attributes.Append(attr);
}
}
collection.Add(row);
}
}
return collection.ToArray();
}
}
return null;
}
}
}


This class can now be passed into a DocumentSet which can then be set on the DynamicContextSettings of a query.

var documentSet = new DocumentSet(nameTable,null,
new SimpleSqlCollectionResolver(),
null);
var dynamicSettings = new DynamicContextSettings {DocumentSet = documentSet};


We can then run queries accessing the database by using:

collection('mssql://SERVER/Northwind/Employees')


or if we are accessing multiple tables from the same database a nice syntax is

declare base-uri "mssql://SERVER/Northwind/";
collection('Employees')


Of course, the performance of this will be appalling compared to native SQL queries. In a future release of XmlPrime we aim to offer native SQL support with a number of the XQuery operations mapped onto SQL operations and run on the database, rather than loading whole tables as XML documents.

In another post we'll look at implementing ICollectionTypeResolver so that our query optimizer starts taking advantage of the field static type information when compiling the query.

Wednesday, 28 October 2009

Variables and Strictness analysis in XmlPrime 1.0

In the last post I discussed the different ways in which variables are placed on the stack in XmlPrime 0.9, and the strictness analysis that is performed in order to determine which of these methods is most appropriate for each variable.

In this post I will explain how this mechanism has been expanded for XmlPrime 1.0.

In 0.9 variables were considered either to be eager, or lazy. For XmlPrime 1.0 we redefined these terms slightly. A variable is strict if it is always evaluated at least in part. A variable is eager if whenever it is evaluated it is evaluated in its entirety. If a variable is not eager, it is lazy. This provides us with a middle ground of a non-strict eager variable. $x in the code sample below is an example of a non-strict eager variable.

if (...)
then
sum($x)
else
()

These variables are placed on the stack in closures, but when they are needed they are evaluated in their entirety, and the value on the stack is replaced with the strict version. By evaluating the variable in its entirety, the extra layer of caching that was required for lazy variables in XmlPrime 0.9 is not needed.

Both of these conditions propagate throughout the query in a similar way so the only significant barrier for development was annotating all the built in expressions and functions to more accurately describe how they might use their arguments. This may depend on how the function itself is evaluated.

The biggest change to variables in XmlPrime 1.0 is that they can now be unboxed. Recall that unboxing is one of new optimizations in XmlPrime 1.0. In XmlPrime 0.9, all entries on the stack were instances of IEnumerable<T> for items. In order for unboxing to have a significant effect we needed to extend the stack to allow unboxed values. In XmlPrime 1.0 all entries on the stack implement IStackEnumerable which provides IEnumerable<T> implementations for all types that are used in XmlPrime (although in most cases most of the GetEnumerator methods throw an exception).

Implementations of all the types of closure are generic and can be used to store lists or closures of any unboxed type. The implementations for XdmItem
also allow for evaluation as appropriate unboxed types.

Whilst individual items need not be boxed, the value as a whole still needs to be wrapped in a suitable stack entry. This can be mitigated in some cases, for example variables defined in for expressions now share the same stack entry object for every item in their definition.

As with unboxed expressions, a variable can only be unboxed if its dynamic type is not needed to be preserved anywhere that the variable is used. The strictness analysis was extended with a new property - a variable is unboxable if its type will never be inspected. The strictness analysis pass was extended to propagate this property as well, and so all the information required to identify which type of closure is to be created at runtime is described by the static type of the variable, and these three properties. The combination of these three properties is known as the evaluation strategy.

The techniques above fairly thoroughly cover variables introduced by let clauses. Local variables are also introduced by function arguments, and so a lot of work has gone into these for XmlPrime 1.0.

In XmlPrime 0.9 all function arguments were evaluated eagerly. Clearly this is not always beneficial. The strictness analysis rules have been extended to be applied to user defined function arguments.

The first challenge that is that the evaluation strategy of an argument depends on the evaluation strategy of the function itself. The effect of the strict property is easy to determine - if a function is not strict (might not be evaluated) then neither is its arguments. As it turns out strictness and eagerness of a function will never affect the unboxability of its arguments and vice versa. The strictness rules are applied twice to every function - once assuming that it is evaluated eagerly and unboxed, and once assuming that it is evaluated lazily and boxed. The evaluation strategy for each argument in each case is stored, and these can be combined to determine the evaluation strategy of each argument wherever the function is called.

The other big challenge for computing the evaluation strategy for function arguments comes with recursive (or mutually recursive functions). How do we determine whether $y is strict in the following case for example?

declare function local:f($x, $y)
{
if ($x = 0)
then $y
else f($x - 1, $y)
}

The solution to this problem is to assume that every argument to every recursive function is strict, lazy and not unboxable. After the strictness analysis has been applied, if any of the evaluation strategies have been updated then the analysis is run again for any function that calls it. The analysis is repeated until the stratgies do not change. This algorithm ensures that optimal evaluation strategies are always selected.

Another improvement that has been applied to function calls is that if a function calls another function, then often it is passing local variables as arguments. Using the model described above, these arguments are either fully evaluated, or a lazy closure is created wrapping the previous definition. If a variable is fully evaluated then this causes an extra copy of the variables value to be stored in memory every time the function recurses. Adding an extra lazy thunk causes an extra level of caching to be added to the variable each time it recurses. This latter effect can compound causing successive recursions of a function perform increasingly badly. XmlPrime adds a simple check for this case during code generation - if a local variable is passed as an argument, then the stack entry is simply copied into the new stack frame.

This case (setting a variable to be the value of another variable) only occurs for function arguments, as otherwise it would have been rewritten by the optimizer. Another case that is eliminated by the optimizer is local variables whose definitions are values. When a function is called with a constant argument, the stack entry can be constructed ahead of time, and so no boxing is needed to make the call.

There are still many ways in which argument passing could be much improved. One case is for lazy variables whose definitions are only inspected once (this can only occur in function arguments, as otherwise the variable would have been eliminated). Currently the values are cached as the variable is evaluated, but this is unnecessary if the variable will never be inspected again. Look out for improvements like this in future versions of XmlPrime.

Friday, 9 October 2009

Variables and strictness analysis in XmlPrime 0.9

This is the fourth post in a series about performance and optimizations in XmlPrime. The first post in the series is available here. In this post I will look at how we implement variables in XmlPrime.

XmlPrime 1.0 has introduced a number of improvements to the way we use variables, mainly to increase performance of user defined functions and to unbox variable values. See the previous post for more information about unboxing.

All expressions returning sequences in XmlPrime are (currently) evaluated lazily. This means that every expression returns an enumerator, and the values are computed as they are needed.

This post looks at how we implement variables in XmlPrime. All variables are stored in the stack. The stack is implemented as an array of IEnumerable instances. Every variable is allocated an index into the stack, and each function call is allocated its own stack in which to place its variables (this is required in order to support recursive functions - each time the function recurses we need a new set of variables). There is also a separate stack for global variables, which is implemented in much the same way.

The simplest way to place a value on the stack is to evaluate it in its entirety, into a List<T> for example. This list is placed on the stack, and then every variable reference can retrieve and enumerate over this value.
If our value is a single item then we can do slightly better by skipping the list.

However this goes against our design philosophy of evaluating as much as possible lazily; if only the first item of the variable is looked at then we still end up computing the whole sequence.

Another way variables can be stored on the stack is by creating a closure (sometimes known as a lazy thunk) around the expression defining the variable. A closure consists of a copy of the current stack frame, and the expression defining the variable. This closure is placed on the stack. When a reference to the variable is evaluated, the closure is retrieved, and the act of enumerating it causes the expression to be evaluated in its own context, as if it had been evaluated when it was first seen. Although this addresses the issue that only as much of the value that is needed is evaluated, it introduces a new problem that if the variable is evaluated more than once then the result is recomputed. To defeat this an extra level of caching is added to the enumerators. A list of cached values is shared between all enumerators over the closure, and every time the enumerator is moved, an attempt is made to retrieve the value from this list before evaluating it.

This extra level of caching adds a performance hit however, and results in the sequence being stored in memory. Some small optimizations can be made though; once we have enumerated over the entire sequence, the closure on the stack can be replaced with its value, simplifying any further accesses to the variable, but any references still being enumerated still pay a performance penalty.

For this reason XmlPrime 0.9 used both ways of evaluating variables. All variables are classed as eager or lazy. A strict variable is one that is always used in its entirety - for example $x in the expression sum($x). In order to sum a sequence, every value in the sequence must be examined. A lazy variable is one that may not be evaluated in its entirety (or may not be evaluated at all). This is known as strictness analysis. It is performed in a separate pass after optimization, as optimization moves variables around which can invalidate judgements made. Special annotations are needed for every expression and function in order to correctly propagate the strictness rules. The analysis can get quite complex as for example in the following case, which illustrates some of the considerations that need to be made:

let $x := ...
let $y := ...
let $z := sum($y)
return
if (...)
then sum ($x)+ $z
else $z

In this case $x is lazy, since if the else branch is hit $x is never evaluated. $z however is always evaluated in both branches and so is eager. $y is eager, but only because $z is eager.

Eager variables are clearly best suited to having their values placed on the stack (as they will be used in their entirety), and lazy variables are better evaluated using a closure, as they may not be evaluated in their entirety.

As I mentioned at the start, this is the behaviour as implemented in XmlPrime 0.9. XmlPrime 1.0 has dramatically expanded the number of ways that variables are placed on the stack, and the strictness analysis is now a lot more sophisticated, giving more detail on how variables are used. Variables can now be stored as unboxed values, and the strictness analysis has been improved for recursive functions. I will explain these differences in the next post.