It is a syntax for accessing elements of an XML document.
Assume we have a document that contains:
<?xml version="1.0" encoding="UTF-8"?>
<person sex="m" age="78">
<persName>
<foreName>Hans</foreName>
<surname>von H�lsen</surname>
</persName>
<birth date="1890-04-05"/>
<death date="1968-04-14"/>
<nationality code="Z_"/>
</person>
What do these paths get us?
/person/persName/surname /person/@sex /person/birth/@date /person/persName/foreName/../../death/@date //surname persName/forename
If an XPath starts with a / then it is an absolute path to an element. If an XPath starts with // then all elements in the document that fulfill the criteria will be selected
Terminating an XPath with something in square brackets allows you to branch or filter that path based on the next element.
To run an XPath expression, just type an expression into the box and hit the ENTER key. The results will be returned in the lower pane. Click on the results to highlight them in the main document window.
Select a root element: /aaa Select all elements, regardless of location: //bbb Select an exact path: /aaa/bbb/eee/fff Select elements within an element: /aaa/bbb/eee/fff/* /aaa/bbb/ccc/ddd/* //fff/* Select an element at a certain position: /aaa/bbb/eee/fff[2] //fff/*[1] Note this behavior! Select the last element: /aaa/bbb/ccc/ddd[last()] Note that no first() exists, for obvious reasons Select attributes: //@id Note we are NOT selecting elements! Select elements with attributes: //ggg[@id] //ddd[@*] Drops anything w/o an attribute. Query the value of attributes: //ddd[@function='F2'] //ddd[@function='*'] Note that this fails! //ddd[@type=C] Fails! Selecting elements by count of sub-elements: //*[count(ggg)=2] Combining paths using the | operator: //ccc | //hhh //hhh | /aaa/bbb/ccc/ddd/ggg //hhh | //*[count(ggg)=2] Children, descendents vs. parents, ancestors: /child::aaa/bbb == /aaa/bbb /descendant::* /descendant::fff/* //ggg/parent::* //ccc/ancestor::* Siblings: //ccc/following-sibling::* //ggg/preceding-sibling::* |
<?xml version="1.0" encoding="UTF-8"?> <aaa> <bbb> <ccc> <ddd type="A"></ddd> <ddd type="B"></ddd> <ddd type="C" f="F1">eat</ddd> <ddd type="C" f="F2">sleep</ddd> <ddd type="C" f="F2"> <ggg id="1"></ggg> <ggg id="2" seq="1"></ggg> </ddd> <ddd></ddd> </ccc> <eee> <fff id="3"> <ggg></ggg> <ggg></ggg> </fff> <fff id="4"> <hhh></hhh> </fff> <fff id="5"></fff> </eee> <ggg /> </bbb> </aaa> |
In the Lucy Larcom diary entry TEI document, select the following:
Possible Solutions; note that alternative taggings of this document would not work with these expressions!