Entity Tricks for Your XPaths (Part I)

Written by Chriztian Steinmeier.
Got comments? I’m @greystate on Twitter.

This is a simple little trick that’ll help improve the readability as well as the maintainability of the XPaths in your preciousssss XSLT files.

The examples use Umbraco, but the techniques can be applied to any XSLT file.

Let’s say you have an XSLT snippet like the following, where you want to render all the <node> elements that doesn’t have the umbracoNaviHide property checked:

<xsl:apply-templates select="$currentPage/node[not(data[@alias = 'umbracoNaviHide'] = 1)]" />

This XPath really says: “Apply templates to all the subnodes of the current page, that aren’t hidden”.

Hey, it’d be cool if you could just write that instead!

Well, it turns out you can come pretty close, by utilizing a couple of (surprisingly) not-so-well-known facts of XSLT:

  1. An XSLT file is actually an XML file (and thus:)
  2. The XSLT is parsed by an XML parser before being handed to the XSLT processor

— One particular task that’s part of that initial parsing step, involves resolving entities, where the parser replaces every occurrence of an ampersand (&) + an identifier + a semicolon (;), with the defined replacement string, e.g., the entity &lt; becomes a less-than character (<) and &amp; becomes an ampersand (&).

There are only five predefined entities in XML (all of which you should know by heart), but you can also create your own custom entities that’ll be parsed and replaced in this step as well.

Creating your own entities

To create your own entity you just need to learn some basic DTD syntax — you put a DOCTYPE at the top of your stylesheet, defining the root element (if you’re using the default XSLTs in Umbraco, it’s already there, defining the non-breaking space — &nbsp):

<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp "&#x00A0;"> ]>

We’ll just add our own entity definition inside, and split it into multiple lines for readability:

<!DOCTYPE xsl:stylesheet [
    <!ENTITY nbsp "&#x00A0;">
    <!ENTITY hidden "data[@alias = 'umbracoNaviHide'] = 1">
]>

You can now substitute the hidden entity anywhere its replacement text occurs, thus the aforementioned apply-templates statement becomes:

<xsl:apply-templates select="$currentPage/node[not(&hidden;)]" />

I like this so much more because it describes the intent instead of the implementation (which, by the way, could change in a future version of Umbraco, *cough* 4.1beta2 *cough*).

Why/how does this work?

Well, this works because the XSLT processor never saw it happen…

You may or may not be aware of the fact, that XSLT is itself ‘just’ XML, so what we can do with XML, we can also do with XSLT (including transforming it with XSLT!).

So whenever an XSLT file is loaded, it’s actually first parsed by an XML parser that checks for all the usual stuff (encoding, well-formedness and so on), before handing the document over to the XSLT processor.

But even before that, the XML parser is responsible for resolving any entities defined, so in our case it finds the instances of &hidden; and instantly substitutes the text data[@alias = 'umbracoNaviHide'] = 1 for them.

When the document is later handed to the XSLT processor, the entities are no longer present — you can see this for yourself by opening the XSLT file directly in Internet Explorer (yes - IE, the bastard child - I just happen to think that IE does the prettiest job of colour coding XML). If you do, you’ll see that the entities have been expanded to the actual text. The browser uses an internal XSLT file to transform XML files for color-coded viewing, therefore, there’s also an initial XML parser step going on, taking care of the entities.

Cool Extra #1

This little trick actually helps you overcome the fact that you can’t use a variable in the match attribute of a template, which is handy for those combos where you apply templates to some special selection, that you also have a tempate for — i.e., now you can write a template to suppress any hidden node, by saying:

<xsl:template match="node[&hidden;]" />

— and have it instantly synced with the apply-templates statement above, regardless of what the property is called.

Cool Extra #2

Using this technique, here’s a refactored version of the “ListSubPagesFromCurrentPage.xslt” file, supplied with Umbraco as one of the standard templates:

Original

<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp "&#x00A0;"> ]>
<xsl:stylesheet 
    version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:msxml="urn:schemas-microsoft-com:xslt" 
    xmlns:umbraco.library="urn:umbraco.library"
    exclude-result-prefixes="msxml umbraco.library">

<xsl:output method="xml" omit-xml-declaration="yes"/>

<xsl:param name="currentPage"/>

<xsl:template match="/">

<!-- The fun starts here -->
<ul>
<xsl:for-each select="$currentPage/node [string(data [@alias='umbracoNaviHide']) != '1']">
    <li>
        <a href="{umbraco.library:NiceUrl(@id)}">
            <xsl:value-of select="@nodeName"/>
        </a>
    </li>
</xsl:for-each>
</ul>

</xsl:template>

</xsl:stylesheet>

Refactored

<!DOCTYPE xsl:stylesheet [
    <!ENTITY nbsp "&#x00A0;">
    <!ENTITY hidden "data[@alias = 'umbracoNaviHide'] = 1">
    <!ENTITY node "node">
]>
<xsl:stylesheet 
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:umbraco="urn:umbraco.library"
    exclude-result-prefixes="umbraco"
>

    <xsl:output method="xml" omit-xml-declaration="yes" />

    <xsl:param name="currentPage"/>

    <xsl:template match="/">
        <ul>
            <xsl:apply-templates select="$currentPage/&node;" />
        </ul>
    </xsl:template>

    <!-- Template for a single node -->
    <xsl:template match="&node;">
        <li>
            <a href="{umbraco:NiceUrl(@id)}">
                <xsl:value-of select="@nodeName" />
            </a>
        </li>
    </xsl:template>

    <!-- No output for hidden nodes -->
    <xsl:template match="&node;[&hidden;]" />

</xsl:stylesheet>

My changes include:

—Now, why on earth would I define an entity for “node”, just to have it output “node”? Doesn’t make sense, or …?

Well, what happens if we replace the two entity-definitions with this:

<!ENTITY hidden "umbracoNaviHide = 1">
<!ENTITY node "*[@isDoc]">

? - Yep, you guessed it—instant 4.1 compatibility, out-of-the-box!

—Now’s a good time to go read Part II…