Design Studio has six step actions for extracting content from a tag in an HTML page:
The Extract action is used to extract text content from the tag, optionally including the HTML tags.
The Extract URL action is used to extract a URL from a tag attribute containing a URL, and making that URL absolute.
The Extract Tag Attribute action is used to extract the value of a tag attribute.
The Extract Target action is used to extract binary data such as images and PDF files, but it handles any kind of binary data.
The Extract Form Parameter action is used to extract a form parameter from a form URL in the found tag and then store its value in a variable.
The Extract Selected Option action is used to extract the selected option from a <select>-tag and then store this in a variable.
Often you need to reformat (or normalize) the extracted content, and the Extract and Extract Tag Attribute actions allow you to do this by configuring a list of data converters.
There are also two actions to extract data from various binary data formats, e.g. PDF or Flash. These are different from the ones above in that they extract the data and produce a HTML page that contains the data in some structured form that lets your robot access the data. These actions are however used in an initial step before the actual data extraction, in which you may loop over the produced HTML and extract text from this.
The Extract Text from PDF action is used to extract text from a PDF document contained as binary data in a selected attribute.
The Extract from Flash action is used to extract data from a Flash object in a found tag.