View on GitHub

NEH Institute materials

July 2017

Home | Admin | Week 1 | Week 2 | Week 3 | Misc

XQuery 3: eXist-db and webapps

We continue with XQuery and get into how to create webapps in eXist-db.

httpclient

The httpclient module provides functions for interacting with resources using a REST API. The following example connects to a server at Clarin and returns information about technical contact persons. The httpclient:get() function takes three parameters: the URL to process, a Boolean value that indicates whether the HTTP state (cookies, credentials, etc.) should persist for the life of the query (we set this to false()), and optional request headers (in this case, the empty sequence). To construct the URLs dynamically we append different numbers to a base URL of the Clarin REST interface http://www.clarin.eu/restxml/ using the XPath concat() function, and then cast the string to a URI with the XPath xs:anyURI() function. The httpclient:get() function returns a connection, to which we append the XPath path expression //cmd:TechnicalContact/cmd:Person to navigate to and return the information we need.

xquery version "3.0";
declare namespace cmd="http://www.clarin.eu/cmd/";
<techContacts>{
	for $center-pos in ("1", "3", "4", "5", "6", "10", "11", "13", "20", "24", "25")
		(: (1 to 29) :)
	return
		httpclient:get(
    		xs:anyURI(concat("https://centres.clarin.eu/restxml/", $center-pos)),
    		false(),
    		())//cmd:TechnicalContact//cmd:Person
}</techContacts>

Index configuration

eXist-db constructs persistent indexes that support quick search and retrieval, much as a back-of-the-book index in a printed volume helps readers find specific content without having to look at every page. eXist-db will execute XQuery scripts with or without index support, but unindexed queries will be slow. For professional-quality results, you must configure indexes. In addition to the official Configuring database indexes documentation, see the indexing section of Tuning the database for guidelines.

Types of indexes

eXist-db constructs a structural index and an xml:id index for all XML documents automatically, but the other index types listed below have to be configured explicitly.

Location of configuration files

New documents are automatically indexed according to whatever indexes are in place when the documents are uploaded, but existing documents are not automatically reindexed when you update collection.xconf. To apply a revised configuration to existing documents, you must call xmldb:reindex("/db/project/...") in XQuery or use the Java admin client to reindex.

Sample index configuration

The index configuration file is typically called collection.xconf, and you’ll need to read Configuring database indexes to learn how to configure it. As a brief illustration, though, in the following example, we configure:

<collection xmlns="http://exist-db.org/collection-config/1.0">
    <index xmlns:tei="http://www.tei-c.org/ns/1.0" 
           xmlns:xs="http://www.w3.org/2001/XMLSchema-datatypes">
        <lucene>
            <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
            <analyzer id="ws" class="org.apache.lucene.analysis.core.WhitespaceAnalyzer"/>
            <text qname="tei:w"/>
            <text qname="tei:head" analyzer="ws"/>
            <text qname="tei:p"/>
            <text qname="tei:lg"/>
            <text qname="tei:cell"/>
            <text qname="@atMost"/>
            <text field="age" qname="@atLeast"/>
            <ignore qname="tei:sic"/>
        </lucene>
        <range>
	        <create qname="@n" type="xs:string"/>
	    </range>
        <ngram qname="tei:head"/>
        <ngram qname="tei:p"/>
    </index>
</collection>

Namespace must be declared and referenced correctly.

New range index

The new range index is used by:

The new range index requires you to specify a datatype, e.g., xs:integer, xs:decimal, or other numerical types; xs:string; xs:dateTime, or other date-time types; or xs:boolean. If you define an index with a specific type, all values in the indexed collections must be valid instances of this type.

Lucene full-text index

The following example applies the Lucene full-text index to <para> elements anywhere and to <section> elements that are children of <book> elements. Within <para> elements, <note> descendants are not indexed and <prefix> descendants are treated as part of their parents. We specify <inline> to handle situations like <para>... <prefix>un</prefix>clear ...</para>. By default, the Lucene indexer assumes that words end at element boundaries, and therefore would not recognize “unclear” as an indexed word. By specifying that <prefix> is inline, we can force the system to index “unclear” as a single word.

...
<lucene>
  <text qname=”para”>
    <ignore qname=”note”/>
    <inline qname=”prefix”/>
  </text>
  <text match=”//book/section”/>
</lucene>
...

We use the Lucene full-text index in queries with ft:query(), as in the following example:

xquery version "3.1";
declare namespace tei="http://www.tei-c.org/ns/1.0";
declare variable $ham as document-node() := doc('/db/apps/shakespeare/data/ham.xml');
ft:query($ham//tei:sp,'son')

If we have defined a Lucene full-text index on <sp> elements in this play, the query will return the <sp> elements that contain the word “son”. If we have not defined the index, though, it will return an empty sequence (not an error!). The query target above is expressed as a string, but it may also be expressed as an XML fragment using Lucene-specific markup (see https://exist-db.org/exist/apps/doc/lucene.xml#D3.23.24 for more details):

xquery version "3.1";
declare namespace tei="http://www.tei-c.org/ns/1.0";
declare variable $ham as document-node() := doc('/db/apps/shakespeare/data/ham.xml');
let $query := <query><term>son</term></query>
return ft:query($ham//tei:sp, $query)

The preceding query finds all speeches that contain “son”. The following query finds those that contain either “son” or “daughter”:

xquery version "3.1";
declare namespace tei="http://www.tei-c.org/ns/1.0";
declare variable $ham as document-node() := doc('/db/apps/shakespeare/data/ham.xml');
let $query := 
    <query>
        <bool>
            <term>son</term>
            <term>daughter</term>
        </bool>
    </query>
return ft:query($ham//tei:sp, $query)

Lucene supports wildcard queries. The following finds all <p> elements that contain words that begin with the characters “ha”:

$data-collection//tei:p[ft:query(.,<query><wildcard>ha*</wildcard></query>)]

(Queries that begin with a wildcard can be slow and require special handling, and are best avoided.)

Lucene supports several string-query types:

All of these have XML counterparts, e.g.:

<bool>
  <term occur=”must”>caspian</term>
  <term occur=”should”>sea</term>
  <term occur=”not”>tibet</term>
</bool>
<near>caspian sea</near>
<phrase>caspian sea</phrase>
<bool>
  <near occur="must">caspian sea</near>
  <term occur="must">tibet</term>
</bool>

Webapps

Recap of typeswitch() function

The typeswitch() function mimics the declarative template architecture of XSLT. The example below adds a UUID to all elements in Hamlet. It works by applying the add-uid-attributes-to-fragment() to the document (see the last line of the code). That function tests the datatype of its argument, and it treats the document node in one way (applies itself to all of the children), elements another way (adds a UUID and then applies itself to all of the children), and other nodes (attributes, text, comments, etc.) by just returning them. You can think of each case value as comparable to an XSLT template that matches a specific type of node.

xquery version "3.1";
declare namespace tei="http://www.tei-c.org/ns/1.0";
declare function local:gen-uuid($prefix as xs:string?) as xs:string {
	concat($prefix, if (empty($prefix)) then "" else "-", util:uuid())
};
declare function local:add-uid-attributes-to-fragment($item as item()) as item() {
	typeswitch($item)
		case document-node() return
			for $child in $item/node()
        	return local:add-uid-attributes-to-fragment($child)
    	case element() return
      		element {node-name($item)} {
        	$item/@*,
        	if ($item[not(@uid)]) 
        		then attribute {"uid"} {local:gen-uuid(name($item))} else (),
       	for $child in $item/node()
        	return local:add-uid-attributes-to-fragment($child)
      		}
    	default return $item
};
let $hamlet := doc('/db/apps/shakespeare/data/ham.xml')
return local:add-uid-attributes-to-fragment($hamlet)

See https://en.wikibooks.org/wiki/XQuery/Typeswitch_Transformations for more information.

It is also possible to use XSLT within XQuery. For example, you could use XQuery to retrieve information from the database with its original markup, assemble it into an XML document, and then—within eXist-db—pipe it through XSLT to transform it. eXist-db has long supported XSLT with the transform:transform() function; XPath 3.1 introduces transform() as a regular (core) function, but it has not been incorporated into eXist-db as of v. 4.1, which therefore still requires the version that is in the transform: namespace.

Serialization

declare option exist:serialize "method=html5 enforce-xhtml=yes";

Serialization control how query output is sent to the browser. Serialization parameters are ignored in the output of “Eval” in eXide, but you can configure the output rendering there in a dropdown list within eXide itself.

See https://exist-db.org/exist/apps/doc/xquery.xml#serialization for more information.

Request

It is possible to pass values from a web form into eXist-db and use them as input into a query. The first argument to the request:get-parameter() function is the name of the parameter in the web form from which to retrieve the value, and the second is a default value to use if no value is passed in from the form. The following example:

declare variable $title as xs:string := request:get-parameter("title", "Hamlet");

retrieves the value of the title input parameter from a web form and assigns it to a variable $title in the XQuery. If no title value is passed from the form, the value “Hamlet” is used instead.

For information about related functions, see the documentation for the request and response modules.

Structure your app

The EXPath packaging system is an effort to standardize XQuery extension modules across implementations. It includes:

eXist-db application repository

eXist-db is not just an XQuery processor and not just an XML database. It can also serve as a content management system (CMS), which makes it possible to build and distribute an entire application, including data and the means to interact with the data, as an eXist-db application package. The eXist-db application repository:

eXide builds in support for app development, and can generate an app template that guides the provision of data, metadata, and functionality. Inside eXide, select Application → New application. Choose a name, fill in the metadata, and an application skeleton, ready for development, will be installed inside /db/apps/. The collection structure is:

eXist applications implement the Model-View-Controller concept:

For more information and tutorial instruction see Getting started with web application development (text) and Getting started with app development in eXist-db 2.0 (video).

HTML templating

<div class="templates:surround?with=templates/page.html&amp;at=content">

calls

declare function templates:surround($node as node(), 
	$params as element(parameters)?, $model as item()*)
declare function templates:surround($node as node(), 
	$params as element(parameters)?, $model as item()*)

Parameters:

URL Rewriting

Typical tasks: