View on GitHub

NEH Institute materials

July 2017

Home | Admin | Week 1 | Week 2 | Week 3 | Misc

XPath navigation

Why XPath?

XPath is a technology for traversing an XML document to perform tasks like “find all speeches by Hamlet” or “find all chapters that contain only one paragraph”. Developers use XPath to explore their XML and as a means to interact with it, but XPath is also important as an ancillary technology used by XSLT (to transform XML), XQuery (to … er … query XML), and Schematron (to validate XML against constraint rules). Everything interesting that we can do with XML starts from XPath!

We’ll use XPath in our Institute to examine what it means to interact with documents that have been modeled as ordered trees. It turns out that some traversals are easier than others, not only because of the distance and route, but also because XPath deals more idiomatically with some types of traversal than others.

As you will learn, XPath depends on XML’s tree structure. Working with XPath, then, is a good way to understand how your XML file is structured and to get a better grip at the relationship between the different nodes in the XML tree. The syntax of XPath is similar to file paths, because the file system is also a tree:

[Folders File system]

Still, you can also traverse your document using XPath in ways that completely ignore the tree structure, for example *distinct-values*(//speaker) returns all the values of the speaker elements in the document.

XPath components

Overview

XPath consists of three principal types of components:

Path expressions: walking the tree

Open hamlet.xml in your XML editor of choice. Note that the Institute uses <oXygen/> XML Editor and the following instructions may differ if you use a different editor.

You find the file hamlet.xml in schedule/week_2 in your local copy of the Institute repo. In the upper left of your <oXygen/> editor, configure the XPath browser box to use XPath 3.0. You’ll type XPath expressions into the browser box and hit Return to navigate within the document.

This document is a TEI-encoded edition of a play. To navigate within an XML document you need to know how it has been marked up. If it’s marked up according to the TEI guidelines, you can rely on what you know about TEI. If not, skim through the document in <oXygen/> to explore how it tags different components.

Path basics:

Sample paths:

XPath functions

Filtering with predicates

Your turn

Here are some XPath expressions you can use for practice (they get progressively more complex or advanced)—or you can invent your own!

Further reading