Views
Full Text Search Query Syntax
From alfrescowiki
--briana 10:10, 25 May 2010 (BST)
The Alfresco Full Text Search (FTS) query text can be used standalone or embedded in CMIS-SQL using the contains() predicate function. The CMIS specification supports a subset of Alfresco FTS. The full power of Alfrecso FTS can not be used and maintain portablity between CMIS repositories.
Alfresco Full Text Search is exposed directly by Share which adds its own template also used as its default field. The examples provided on the Search page apply to Alfrecso FTS. For exmaple, PATH and TYPE queries are the same. The main difference is the required escaping and that the leading "@" for properties is optional. "@" and ":" do not need to be escaped; the FTS parser can work it out.
The default template (see Full_Text_Search_Query_Syntax#Query_Templates) for Share V3.4 is:
%(cm:name cm:title cm:description ia:whatEvent ia:descriptionEvent lnk:title lnk:description TEXT)
Embedded in CMIS-SQL, only CMIS-SQL style property identifiers (cmis:name) and aliases, CMIS-SQL column aliases and the special fields listed can be used to identify fields. The SQL query defines tables and table aliases after "from" and "join" clauses. If the SQL query references more than one table, the contains() function must specify a single table to use by its alias. All properties in the embedded FTS query are to this table and all column aliases used in the FTS query must refer to the same table. For a single table, the table alias is not required as part of the contains() function.
Used standalone, fields can also be identified using prefix:local-name and {uri}local-name styles.
Search for a single term
Single terms are tokenized before the search according to the appropriate data dictionary definition(s). If no field is specified, the default TEXT field is used. This is a shortcut for searching all properties of type content. If the appropriate data dictionary definition(s) for the field support both FTS and untokenized search, then the FTS search will be used. FTS will include synonyms if the analyzer generates them. Terms can not contain whitespace.
banana
TEXT:banana
Both these queries will find any nodes with the word "banana" in any property of type d:content.
Search for a phrase
Phrases are enclosed in double quotes. Any embedded quotes may be escaped using \". If no field is specified, then the default TEXT field will be used, as with searches for a single term. The whole phrase will be tokenized before the search according to the appropriate data dictionary definition(s).
"big yellow banana"
Search for an exact term
To search for an exact term, prefix the term with "=". This ensures the term will not be tokenized. If both FTS and ID base search are supported for a specified or implied property, then exact matching will be used where possible.
=running
will match "running" but will not be tokenized. If you are using stemming, it may not match anything. For an untokenized property, all will be well.
Term expansion
To force tokenization and term expansion, prefix the term with "~". For a property with both ID and FTS indexes where the ID index is the default, force the use of the FTS index.
~running
Conjunctions
Single terms, phrases, etc can be combined using "AND" in upper, lower, or mixed case. If not otherwise specified, the search fragments will be ANDed together by default.
big yellow banana big AND yellow AND banana TEXT:big TEXT:yellow TEXT:banana TEXT:big and TEXT:yellow and TEXT:banana
All search for nodes that contain the terms "big", "yellow", and "banana" in any content.
Disjunctions
Single terms, phrases, etc can be combined using "OR" in upper, lower, or mixed case.
big OR yellow OR banana TEXT:big OR TEXT:yellow OR TEXT:banana
Negation
Single terms, phrases, etc can be combined using "NOT" in upper, lower, or mixed case or prefixed with "!" or "-"
yellow NOT banana yellow !banana yellow -banana
NOT yellow banana -yellow banana !yellow banana
Specifically indicate optional, mandatory, and excluded elements of a query
Sometimes AND and OR are not enough. If you want to find documents that must contain the term "car", score those with the term "red" higher, but not match those just containing "red", you are stuck...
| "|" | The field, phrase, group, etc, is optional; a match increases the score. |
|---|---|
| "+" | The field, phrase, group, etc, is mandatory (this differs from Google - see "=") |
| "-", "!" | The field, phrase, group, etc, must not match |
Returning to our example:
+car |red
finds documents that contain the term "car", score those with the term "red" higher, but does not match those just containing "red".
At least one element of a query must match (or not match) for there to be any results.
All AND and OR constructs can be expressed with these operators.
Fields
Search specific fields rather than the default. Terms, phrases, etc can all be preceded by a field. If not, the default field TEXT is used.
field:term field:"phrase" =field:exact ~field:expand
Fields fall into three types:
- Property fields evaluate the search term against a particular property
- Data type fields evaluate the search term against all properties of the given type
- Special fields are described as follows:
| Description | Type | Example | FTS | CMIS contains() |
|---|---|---|---|---|
| Fully qualified property | Property | {http://www.alfresco.org/model/content/1.0}name:apple
| Yes | No |
| Fully qualified property | Property | @{http://www.alfresco.org/model/content/1.0}name:apple
| Yes | No |
| CMIS style property | Property | cmis:name:apple | Yes (case insensitive) | Yes (case sensitive) |
| Prefix style property | Property | cm:name:apple | Yes | Yes |
| Prefix style property | Property | @cm:name:apple | Yes | No |
| TEXT | Special | TEXT:apple | Yes | 3.4.3 onwards |
| ID | Special | ID:"NodeRef" | Yes | 3.4.3 onwards |
| ISROOT | Special | ISROOT:T | Yes | 3.4.3 onwards |
| TX | Special | TX:"TX" | Yes | 3.4.3 onwards |
| PARENT | Special | PARENT:"NodeRef" | Yes | 3.4.3 onwards |
| PRIMARYPARENT | Special | PRIMARYPARENT:"NodeRef" | Yes | 3.4.3 onwards |
| QNAME | Special | QNAME:"app:company_home" | Yes | 3.4.3 onwards |
| CLASS | Special | CLASS:"qname" | Yes | 3.4.3 onwards |
| EXACTCLASS | Special | EXACTCLASS:"qname" | Yes | 3.4.3 onwards |
| TYPE | Special | TYPE:"qname" | Yes | 3.4.3 onwards |
| EXACTTYPE | Special | EXACTTYPE:"qname" | Yes | 3.4.3 onwards |
| ASPECT | Special | ASPECT:"qname" | Yes | 3.4.3 onwards |
| EXACTASPECT | Special | EXACTASPECT:"qname" | Yes | 3.4.3 onwards |
| ALL | Special | ALL:"text" | Yes | 3.4.3 onwards |
| ISUNSET | Special | ISUNSET:"property-qname" | Yes | 3.4.3 onwards |
| ISNULL | Special | ISNULL:"property-qname" | Yes | 3.4.3 onwards |
| ISNOTNULL | Special | ISNOTNULL:"property-qname" | Yes | 3.4.3 onwards |
| TAG | Special | TAG:"SomeTag" | 3.4.3 onwards | 3.4.3 onwards |
| Fully qualified data type | Data Type | {http://www.alfresco.org/model/dictionary/1.0}content:apple
| Yes | No |
| prefixed data type | Data Type | d:content:apple | Yes | No |
Note: Most of the special fields not available in CMIS can be expressed in other ways. Type constraints by from, aspects by join, is not null, ID = ..., IN_FOLDER(), ...
Wildcards
Wild cards are supported in both terms, phrases, and exact phrases using "*" to match zero, one or more characters, and "?" to match a single character. "*" may appear on its own and implies Google style "anywhere after" Wildcard pattern can be combined with the "=" prefix for identifier based pattern matching.
TEXT:app?e TEXT:app* TEXT:*pple appl? *ple =*ple "ap*le" "***le" "?????"
All the previous examples find the term apple.
Ranges
Inclusive ranges can specified in Google style. There is an extended syntax for more complex ranges. Unbounded ranges can be defined using MIN and MAX for numeric and date types and "\u0000" and "\FFFF" for text (anything that is invalid will do ....)
Range queries are not supported for all properties of type d:content or d:mltext. Properties of type d:text are supported. However, if the d:text property is tokenised then you may get unexpected results (unless the property is both tokenised and untokenised in the index).
TEXT by default does not support range queries as it expands to all properties of type d:content.
| Lucene | Description | Example | |
|---|---|---|---|
| [#1 TO #2] | #1..#2 | The range #1 to #2 inclusive
#1 <= x <= #2 | 0..5 [0 TO 5] |
| <#1 TO #2] | The range #1 to #2 including #2 but not #1
#1 < x <= #2 | <0 TO 5] | |
| [#1 TO #2> | The range #1 to #2 including #1 but not #2
#1 <= x < #2 | [0 TO 5> | |
| <#1 TO #2> | The range #1 to #2 exclusive
#1 < x < #2 | <0 TO 5> |
my:text:apple..banana my:int:[0 TO 10] my:float:2.5..3.5 my:float:0..MAX mt:text:[l TO "\uFFFF"]
Fuzzy matching
Fuzzy matching is not currently implemented. The default lucene implementation is Levenshtein Distance which is expensive to evaluate.
Postfix terms with "~float"
apple~0.8
Proximity
Google style proximity is supported. To specify proximity for fields, you must use grouping.
big * apple TEXT:(big * apple) big *(3) apple TEXT:(big *(3) apple)
Currently unimplemented:
- "phrase"~3
- phrase slop
Boosts
Query time boosts allow matches on certain parts of the query to influence the score more than others. All query elements can be boosted: terms, phrases, exact terms, expanded terms, proximity (only in filed groups), ranges, and groups.
term^2.4 "phrase"^3 term~0.8^4 =term^3 ~term^4 cm:name:(big * yellow)^4 1..2^2 [1 TO 2]^2 yellow AND (car OR bus)^3
Grouping
Groupings of terms are made using "(" and ")". Groupings of all query elements are supported in general. Groupings are also supported after a field - field group. The query elements in field groups all apply to the same field and can not include a field.
(big OR large) AND banana title:((big OR large) AND banana)
Spans and positions
Spans and positions are not currently implemented. Positions will depend on tokenization. Anything more detailed than one *(2) two seems arbitrarily dependent on the tokenization. An identifier and pattern matching, or dual FTS and ID tokenization, may well be the answer in these cases.
- Initial ideas
- term[^] - start
- term[$] - end
- term[position]
These are of possible future use, but excluded for now.
- Lucene surround extensions
- and(terms etc)
- 99w(terms etc )
- 97n(terms etc)
Escaping
- Any character may be escaped using "\" in TERMS, IDS (field identifiers) and PHRASES
- Java unicode escape sequences are supported
- Whitespace can be escaped in terms and IDs; the following are supported:
- cm:my\ content:my\ name
Ordering and Control
- Ordering and paging
- This is not part of the query language but is available in the API
- constraints for field expansion and available fields
- context - type/aspect, name space(s)
- Set default conjunction or disjunction
Mixed FTS ID behavior
- priority as defined on properties in the data dictionary
- explicit priority by prefixing with "=" for id pattern matches
- "~" can be used to force tokenization
Operator precedence
Operator precedence is SQL-like (not Java-like). When there is more than one logical operator in a statement, and they are not explicitly grouped using parentheses, NOT is evaluated first, then AND, and finally OR.
- Operator precedence from highest to lowest
- "
- [, ], <, >
- ()
- ~ (prefix and postfix), =
- ^
- +, |, -
- NOT, !
- AND
- OR
AND and OR can be combined with +, |, - with the following meanings:
AND (no prefix is the same as +)
- big AND dog
- big and dog must occur
- +big AND +dog
- big and dog must occur
- big AND +dog
- big and dog must occur
- +big AND dog
- big and dog must occur
- big AND |dog
- big must occur and dog should occur
- |big AND dog
- big should occur and dog must occur
- |big AND |dog
- both big and dog should occur, and at least one must match
- big AND -dog
- big must occur and dog must not occur
- -big AND dog
- big must not occur and dog must occur
- -big AND -dog
- both big and dog must not occur
- |big AND -dog
- big should occur and dog must not occur
OR (no prefix is the same as +)
- dog OR wolf
- dog and wolf should occur, and at least one must match
- +dog OR +wolf
- dog and wolf should occur, and at least one must match
- dog OR +wolf
- dog and wolf should occur, and at least one must match
- +dog OR wolf
- dog and wolf should occur, and at least one must match
- dog OR |wolf
- dog and wolf should occur, and at least one must match
- |dog OR wolf
- dog and wolf should occur, and at least one must match
- |dog OR |wolf
- dog and wolf should occur, and at least one must match
- dog OR -wolf
- dog should occur and wolf should not occur, one of the clauses must be valid for any result
- -dog OR wolf
- dog should not occur and wolf should occur, one of the clauses must be valid for any result
- -dog OR -wolf
- dog and wolf should not occur, one of the clauses must be valid for any result
APIs
Embedded in CMIS contains()
- strict queries
SELECT * FROM Document WHERE CONTAINS('\'zebra\)
SELECT * FROM Document WHERE CONTAINS('\'quick\)
- Alfrecso extensions
SELECT * FROM Document D WHERE CONTAINS(D, 'cmis:name:\'Tutorial\)
SELECT cmis:name as BOO FROM Document D WHERE CONTAINS('BOO:\'Tutorial\)
Search Service
ResultSet results = searchService.query(storeRef, SearchService.LANGUAGE_FTS_ALFRESCO, "quick");
SearchService.LANGUAGE_FTS_ALFRESCO = "fts-alfresco"
Node Browser
Alfrecso FTS is supported in the node browser.
JavaScript
search
{
query: string, mandatory, in appropriate format and encoded for the given language
store: string, optional, defaults to 'workspace://SpacesStore'
language: string, optional, one of: lucene, xpath, jcr-xpath, fts-alfresco - defaults to 'lucene'
templates: [], optional, Array of query language template objects (see below) - if supported by the language
sort: [], optional, Array of sort column objects (see below) - if supported by the language
page: object, optional, paging information object (see below) - if supported by the language
namespace: string, optional, the default namespace for properties
defaultField: string, optional, the default field for query elements when not explicit in the query
onerror: string optional, result on error - one of: exception, no-results - defaults to 'exception'
}
sort
{
column: string, mandatory, sort column in appropriate format for the language
ascending: boolean optional, defaults to false
}
page
{
maxItems: int, optional, max number of items to return in result set
skipCount: int optional, number of items to skip over before returning results
}
template
{
field: string, mandatory, custom field name for the template
template: string mandatory, query template replacement for the template
}
For example:
var def =
{
query: "cm:name:test*",
language: "fts-alfresco"
};
var results = search.query(def);
Templates
Alfresco FTS is not supported in FreeMarker.
Query Templates
The FTS query language supports query templates. These are intended to help when building application specific searches.
A template is a query as defined previously, but with additional support to specify template substitution.
- %field
- insert the parse tree for the current ftstest and replace all references to fields in the current parse tree with the supplied field.
- %(field1, field2) -- the comma is optional
- %(field1 field2)
- create a disjunction, and for each field add the parse tree for the current ftstest to the disjunction and replace all references to fields in the current parse tree with the current field from the list.
| Name | Template | Example Query | Expanded Query |
|---|---|---|---|
| t1 | %cm:name | t1:n1 | cm:name:n1 |
| t1 | %cm:name | t1:"n1" | cm:name:"n1" |
| t1 | %cm:name | ~t1:n1^4 | ~cm:name:n1^4 |
| t2 | %(cm:name, cm:title) | t2:"woof" | (cm:name:"woof" OR cm:title:"woof") |
| t2 | %(cm:name, cm:title) | ~t2:woof^4 | (~cm:name:woof OR ~cm:title:woof)^4 |
| t3 | %cm:name AND my:boolean:true | t3:banana | (cm:name:banana AND my:boolean:true) |
Templates may refer to other templates.
nameAndTitle -> %(cm:name, cm:title) nameAndTitleAndDesciption -> %(nameAndTitle, cm:description)
Literals
Everything is really a term or a phrase. The string representation you type in will be transformed to the appropriate type for each property when executing the query. For convenience, there are numeric literals but string literals may also be used.
String literals for phrases may be enclosed in double quotes or single quotes. Java single character and \uXXXX based escaping are supported within these literals.
Integer and decimal literals conform to the Java definitions.
Dates as any other literal can be expressed as a term or phrase. Dates are in the format ...... Any or all of the time may be truncated. All of the date must be present.
The date type also supports NOW and may be extended in the future to support some date math - perhaps akin to what is used by Solr.
In range queries, strings/term/phrases that do not parse to valid type instance for the property are treated as open ended.
test:integer[ 0 TO MAX] - matches anything positive
Date ranges do not currently respect the truncated resolution that may be presented in range queries. Follow: [ALF-695]