XPath

(Much material below was taken from the excellent tutorials at:
http://www.zvon.org/xxl/XPathTutorial/General/examples.html and
http://www.tei-c.org.uk/Talks/OUCS/2005-02/talk-access.pdf)

What is XPath?

It is a syntax for accessing elements of an XML document.

XPath: You Already Understand Paths

Assume we have a document that contains:

<?xml version="1.0" encoding="UTF-8"?>
<person sex="m" age="78">
    <persName>
<foreName>Hans</foreName>
<surname>von H�lsen</surname>
    </persName>
    <birth date="1890-04-05"/>
    <death date="1968-04-14"/>
    <nationality code="Z_"/>
</person>

What do these paths get us?

/person/persName/surname
/person/@sex
/person/birth/@date
/person/persName/foreName/../../death/@date
//surname
persName/forename

If an XPath starts with a / then it is an absolute path to an element. If an XPath starts with // then all elements in the document that fulfill the criteria will be selected

Terminating an XPath with something in square brackets allows you to branch or filter that path based on the next element.

To run an XPath expression, just type an expression into the box and hit the ENTER key. The results will be returned in the lower pane. Click on the results to highlight them in the main document window.

Another example

Select a root element:

/aaa

Select all elements, regardless of location:

//bbb

Select an exact path:

/aaa/bbb/eee/fff

Select elements within an element:

/aaa/bbb/eee/fff/*

/aaa/bbb/ccc/ddd/*

//fff/*

Select an element at a certain position:

/aaa/bbb/eee/fff[2]

//fff/*[1]

Note this behavior!

Select the last element:

/aaa/bbb/ccc/ddd[last()]

Note that no first() exists, for obvious reasons

Select attributes:

//@id

Note we are NOT selecting elements!

Select elements with attributes:

//ggg[@id]

//ddd[@*]

Drops anything w/o an attribute.

Query the value of attributes:

//ddd[@function='F2']

//ddd[@function='*']

Note that this fails!

//ddd[@type=C]

Fails!

Selecting elements by count of sub-elements:

//*[count(ggg)=2]

Combining paths using the | operator:

//ccc | //hhh

//hhh | /aaa/bbb/ccc/ddd/ggg

//hhh | //*[count(ggg)=2]

Children, descendents vs. parents, ancestors:

/child::aaa/bbb == /aaa/bbb

/descendant::*

/descendant::fff/*

//ggg/parent::*

//ccc/ancestor::*

Siblings:

//ccc/following-sibling::*

//ggg/preceding-sibling::*

 <?xml version="1.0" encoding="UTF-8"?> 
 <aaa> 
 <bbb> 
 <ccc> 
 <ddd type="A"></ddd>
 <ddd type="B"></ddd> 
 <ddd type="C" f="F1">eat</ddd> 
 <ddd type="C" f="F2">sleep</ddd> 
 <ddd type="C" f="F2"> 
 <ggg id="1"></ggg> 
 <ggg id="2" seq="1"></ggg> 
 </ddd> 
 <ddd></ddd> 
 </ccc> 
 <eee> 
 <fff id="3"> 
 <ggg></ggg> 
 <ggg></ggg> 
 </fff> 
 <fff id="4"> 
 <hhh></hhh> 
 </fff> 
 <fff id="5"></fff> 
 </eee> 
 <ggg /> 
 </bbb> 
 </aaa>

References

Explore

In the Lucy Larcom diary entry TEI document, select the following:

Possible Solutions; note that alternative taggings of this document would not work with these expressions!