logo   AtomServer, Using the AutoTagger


Chris Berry, Bryon Jacob. Updated 05/01/08

This document describes specific details about using the AtomServer AutoTagger.

For a further, detailed description of the actual protocol, either
This document does not explain the underlying concepts behind AtomServer; REST, Atom, and OpenSearch. That information can be found in the AtomServer General Introduction document. It is highly recommended that you read this document first.

Nor does this document explain the basics of XML, namespaces, syndicated feeds, and the GET, POST, PUT, and DELETE requests in HTTP, as well as HTTP's concept of a "resource." For more information about those things, see the Additional resources section of this document.

Contents


Auto Tagging

AtomServer provides the ability to automatically applying Atom Categories to entries as they are written to the ContenttStorage. This is called "Auto tagging". AtomServer is built so that you can wire in any AutoTagger., you simply implement the EntryAutoTagger interface. But the most common AutoTagger is the built-in XPathAutoTagger. 

XPathAutoTagger

This EntryAutoTagger implementation provides for tagging Entries based on matches against XPATH expressions.

You can configure the XPathAutoTagger in your Worksapce Bean in Spring in the straightforward manner - you can set "namespaceMap" as a map from prefixes to namespace URIs, and the "actions" property to a list of the inner classes on this class for each of the actions that can be performed. They are:

delete all (DeleteAllAction)
clears all of the categories for the entry
delete scheme (DeleteSchemeAction)
clears all of the categories in the given scheme for the entry
match (XPathMatchException)
evaluate an XPATH expression against the Entry Contents, and write a category for each match
when an XPATH expression is matched, the TEXT CONTENT of the nodes matched (can be either elements or attributes) are stored in a variable called $. Then, the termPattern is evaluated to set the category's term, and the optional labelPattern is evaluated to set the label.

To make configuration easier, this class defines a little scripting language that can be set via the "script" bean property. Scripts are defined by the following grammar (nonterminals in all caps, terminals in title casing, character and string literals enclosed in single quotes, and parentheses, asterisks, and question marks for grouping, aggregation, and optional components.

 nonterminals:
SCRIPT ==> STATEMENT (';'* STATEMENT ';'*)*
STATEMENT ==> NAMESPACE | DELETEALL | DELETESCHEME | MATCH
NAMESPACE ==> 'namespace' Prefix '=' Uri
DELETEALL ==> 'delete' 'all'
DELETESCHEME ==> 'delete' ('scheme')? {Scheme}
MATCH ==> 'match' '"'Xpath'"' '{'Scheme'}' Termpattern ('['LabelPattern']')?
terminals:
Prefix ==> namespace prefix to use in XPATH expressions
Uri ==> namespace URIs
Xpath ==> the XPATH expression to match
Scheme ==> category schemes
Termpattern ==> the replacement pattern to use for generating category terms
Labelpattern ==> the replacement pattern to use for generating category labels
Keywords (delete, match, namespace, etc.) are not case sensitive. the quoted string for the XPATH expression can contain double quotes if they are escaped with backslash (i.e. \"). For example, a script could look like:
 NAMESPACE widgets = http://schemas.foo.com/widgets/v1/rev0;
DELETE SCHEME {urn:foo.brands};
MATCH "//widgets:brand" {urn:foo.brands}$;
MATCH "//widgets:brand[@isMaster='true']" {urn:foo.brands}MASTER:$[Entry has master brand $]

Additional resources

You may find the following third-party documents useful:

    * Overview of Atom from IBM
    * HTTP 1.1 method definitions; specification for GET, POST, PUT, and DELETE
    * HTTP 1.1 status code definitions
    * Atom Syndication Reference (from Atom-enabled)
    * Getting to know the Atom Publishing Protocol (from IBM)