A Tag Finder is used to find a tag on an HTML/XML page. Tag Finders are used in steps, where they define how to find the tag(s) to which the step should be applied. The list of Tag Finders of the current step is located in the "Finders" tab in the Step View. Steps that work on spreadsheet content use Range Finders rather than Tag Finders.
In understanding how to use Tag Finders, the concept of a tag path is important. A tag path is a compact text representation of where some tag is located on a page. Consider this tag path:
This tag path refers to an <a>-tag inside a <div>-tag inside a <body>-tag inside an <html>-tag.
A tag path can match more than one tag on the same page. For example, the above tag path will match all of the <a>-tags on this page, except the third one:
<html> <body> <div> <a href="url...">Link 1</a> <a href="url...">Link 2</a> </div> <p> <a href="url...">Link 3</a> </p> <div> <a href="url...">Link 4</a> <a href="url...">Link 5</a> <a href="url...">Link 6</a> </div> </body> </html>
You can use indexes to refer to specific tags among tags of the same type at that level. Consider this tag path:
This tag path refers to the first <a>-tag in the second <div>-tag in a <body>-tag inside an <html>-tag. So, on the page above, this tag path would only match the "Link 4" <a>-tag. Note that indexes start from 0. If no index is specified for a given tag on a tag path, the path matches any tag of that type at that level, as we saw in the first tag path above. If the index is negative, the matching tags are counted backwards, i.e. starting with the last matching tag which corresponds to index -1. Consider this tag path:
This tag path refers to the second-to-last <a>-tag in the last <div>-tag in a <body>-tag inside an <html>-tag. So, on the page above, this tag path would only match the "Link 5" <a>-tag.
You can use an asterisk ('*') to mean any number of tags of any type. For example, the tag path
refers to an <a>-tag located anywhere inside a <table>-tag, which itself can be located anywhere inside an <html> tag. There is an implicit asterisk in front of any tag path, so you can simply write "table" instead of "*.table" to refer to any table tag on the page. The only exception is tag paths starting with a punctuation mark ('.'), which means that there is no implicit asterisk in front of the tag path, so the tag path must match from the first (i.e. top-level) tag of the page.
With asterisks, you can create tag paths that are more robust against changes in the page, since you can leave out insignificant tags that are liable to change over time, such as layout related tags. However, using asterisks also increases the risk of accidentally locating the wrong tag.
You can provide a list of possible tags by separating them with '|', as in this tag path:
This tag path refers to an <a> tag inside a <p>-, <div>-, or <td>-tag located anywhere inside an <html> tag.
In a tag path, text on a page is referred to just as any other tag, using the keyword "text". Although text is not technically a tag, it is treated and viewed as such in a tag path. For example, consider this HTML:
<html> <body> <a href="url...">Link 1</a> <a href="url...">Link 2</a> </body> </html>
The tag path "html.body.a.text" would refer to the text "Link 2".
A Tag Finder can be configured using the following properties.
As an example, if you set the Tag Path property to "table", the Attribute Name property to "align", the Attribute Value property to Fixed Text where the text must be "center", and the Tag Pattern property to ".*Business News.*", then the Tag Finder would locate the first <table>-tag that is center aligned and that contains the text "Business News".