CMIS Query Language

From alfrescowiki

Jump to: navigation, search

Back to CMIS.


Contents

Introduction

The CMIS standard defines a query language based on the SQL-92 SELECT statement. To support this, a read-ony relational view of the CMIS data model is also defined.

This page is based on cmis-spec-v1.0.

It is written as a reference, drawing together the relevant information from different parts of the CMIS specification. There are simple examples for most types of query.

There are two parts to understanding the CMIS query language:

  • the relational model that defines the available tables, their columns, which tables are queryable, and which columns can be used in WHERE and ORDER BY clauses of the query; and
  • the query language, and if you already have experience of SQL-92, how it differs from SQL-92.

The CMIS query language is generally available in 3.3. Earlier releases (3.1 and 3.2) contain earlier versions with partial compliance to early versions of the CMIS specification. For v1.0 of the CMIS specification, use Alfresco 3.3 or higher.

Relational view

For all object types, the CMIS data model defines the attributes queryName, queryable, and includedInSuperType. The queryName of a queryable object type is used to identify the object type in the FROM clause of query statement. Non-queryable object types are excluded from the relational view and can not appear in the FROM clause of a query statement. The attribute includedInSuperType determines if an object sub-type is included in a query for any of its super-types. So it may be that all sub-types are not included in the query for a type. If an object type is not includedInSuperType, a direct query for the type is still supported if it is defined as queryable. For example, Alfresco internally models renditions as a sub-type of cmis:document. Renditions are marked as not includedInSuperType and so will not appear in queries for cmis:document.

SELECT * FROM cmis:document
- selects all properties for all documents but does not include thumbnails (cm:thumbnail). 

SELECT * FROM cm:thumbnail
- includes cm:thumbnail and any sub-types that are includedInSuperType

All object-type property definitions have the attributes queryName, queryable, and orderable. The property queryName is used to identify the related column in the relational table view for the object type. If a property is queryable then it may be identified using its queryName in the WHERE clause of the query statement. Only single valued properties may be orderable. Orderable properties can appear in the ORDER BY clause of the query statement. Any property may be specified in the select list for the query. However, not all properties are mandatory for the base CMIS types.

The Document, Folder, Relationship and Policy object types do not have a common super-type although there are properties that apply to them all. It is not possible to query for one of these common properties, such as cmis:name, across all types; only types descended from one of the four base types.

The relational view of the CMIS data model is made up of a collection of virtual tables, each named after the queryName, for each queryable object-type definition. Each virtual table has virtual columns, named after the queryName, for each property defined or inherited by the object-type. If an object-type does not inherit a property defined on one of its super-types, the property appears in the virtual table as a column filled with NULLs. Unset properties are treated as null. Multi-valued properties are represented by a column that contains a single list value for each row. The virtual table for an object type includes a row for each object of that type or descended from that type and for which includedInSuperTypeQuery is true.

Content can only be queried via the CONTAINS() predicate function.

cmis:document

M
Indicates there is a mapping within Alfresco for the property and it can be selected.
Most properties are not required according to the specification.
U
Indicates the property is 'unmapped' in Alfresco.
Unmapped properties will be set to null when selected.
Q
Indicates the property is queryable in Alfresco.
O
Indicates the property is orderable in Alfresco.

According to the specification, the virtual table cmis:document has the following columns:

  • cmis:name (MQO)
  • cmis:objectId (MQO)
  • cmis:baseTypeId (MQ)
  • cmis:objectTypeId (MQO)
  • cmis:createdBy (MQO)
  • cmis:creationDate (MQO)
  • cmis:lastModifiedBy (MQO)
  • cmis:lastModificationDate (MQO)
  • cmis:changeToken (U)
  • cmis:isImmutable (M)
  • cmis:isLatestVersion (M)
  • cmis:isMajorVersion (M)
  • cmis:isLatestMajorVersion (M)
  • cmis:versionLabel (M)
  • cmis:versionSeriesId (M)
  • cmis:isVersionSeriesCheckedOut (M)
  • cmis:versionSeriesCheckedOutBy (M)
  • cmis:versionSeriesCheckedOutId (M)
  • cmis:checkinComment (M)
  • cmis:contentStreamLength (MQO)
  • cmis:contentStreamMimeType (MQO)
  • cmis:contentStreamFileName (MQO)
  • cmis:contentStreamId (M)

cmis:folder

  • cmis:name (MQO)
  • cmis:objectId (MQO)
  • cmis:baseTypeId (MQ)
  • cmis:objectTypeId (MQO)
  • cmis:createdBy (MQO)
  • cmis:creationDate (MQO)
  • cmis:lastModifiedBy (MQO)
  • cmis:lastModificationDate (MQO)
  • cmis:changeToken (U)
  • cmis:parentId (MQO)
  • cmis:path (M)
  • cmis:allowedChildObjectTypeIds (U)

cmis:relationship

Alfresco does not support queries for this type.

cmis:policy

Alfresco does not support queries for this type.

Query structure

As with SQL-92, the CMIS Query is made up of:

  • A mandatory SELECT clause
  • A mandatory FROM clause
  • An optional WHERE clause
  • An optional ORDER BY clause

The CMIS query language does not support GROUP BY or HAVING.

SELECT clause

The SELECT clause identifies which virtual columns to return in the result set. It can be either a comma-separated list of one or more queryNames of properties that are defined by queryable object types or * for all virtual columns. The Alfresco relational view includes all single valued and multi-valued properties for *. All columns for an aliased table can be referred to as ALIAS.*.

The function SCORE() may also appear in the select list.

FROM clause

The FROM clause determines which virtual table(s) to use to fullfill the query and implies which virtual columns are available in the SELECT, WHERE and ORDER BY clauses. If more than one table is specified table aliases must be used to avoid any ambiguity. Only the queryName of queryable object-types may appear in the FROM clause.

The Alfresco implementation currently does not support JOIN except as an extension to support aspects. This extension is Alfresco specific.


Simple select examples

To select all properties for all documents:

SELECT * FROM cmis:document

To select all properties for all folders using a tables alias:

SELECT F.* FROM cmis:folder F

To select specific properties for all documents:

SELECT cmis:name, cmis:objectId FROM cmis:document

To select specific properties for all folders using a table alias and assigning aliases for the slected columns:

SELECT F.cmis:name AS name, F.cmis:objectId AS id FROM cmis:folder AS F


WHERE clause

Only rows that satisfy all the constraints specfied in the WHERE clause are returned from the query.

The WHERE clause is made up of a slightly restricted SQL-92 <search condition>. Compared with SQL-92, the CMIS Query <boolean test> maps to SQL-92 <boolean primary>. So OR, AND, NOT and parenthesised expressions work as you would expect. The SQL-92 <boolean test> is not supported. In Alfresco, predicates evaluate to TRUE or FALSE: UNKNOWN is not supported.


CMIS Query supports the following SQL-92 predicates:

  • comparison predicate
  • in predicate
  • like predicate
  • null predicate
  • quantified comparison predicate


It does not support the SQL-92:

  • between predicate
  • exists predicate
  • unique predicate
  • match predicate
  • overlaps predicate

It extends SQL-92 to support additional predicates:

  • quantified in predicate
  • text search predicate
  • folder predicate

Each predicate form is discussed in detail in the following sections.


comparison predicate

All SQL comparisons (equals =, not equals <>, less than <, greater than >, less than or equals <=, and greater than or equals >=) are supported but not for all data types. Comparison predicates are only supported for single valued properties.

The BNF for the CMIS 1.0 query statement mandates that a value expression (column reference or SCORE()) should be on the left side of the comparison and a literal on the right side.

String
Strings collation is case sensitive and space/pad sensitive
Collation is the default Java String collation.
"woof" <> "woof "
Supports =, <>
Additionally, Alfresco supports <, >, <=, >=
Alfresco MLText properties will select the locale specific value: predicates will evaluate against all locales.
SELECT * from cmis:document where cmis:name =  'test'
SELECT * from cmis:document where cmis:name <> 'test'
SELECT * from cmis:document where cmis:name <  'test'
SELECT * from cmis:document where cmis:name >  'test'
SELECT * from cmis:document where cmis:name <= 'test'
SELECT * from cmis:document where cmis:name >= 'test'
Decimal
Integer
Supports =, <>, <, >, <=, >=
SELECT * FROM some:type WHERE some:integer = 0
SELECT * FROM some:type WHERE some:decimal = 0.2
Boolean
Supports =
SELECT * FROM some:type WHERE some:boolean = TRUE
SELECT * FROM some:type WHERE some:boolean = false


DateTime
Date collation ignores time by default (see below)
Supports =, <>, <, >, <=, >=
The comparison is chronological
SELECT * FROM cmis:document WHERE cmis:creationDate = TIMESTAMP '2010-03-31T12:52:12.123Z'
SELECT * FROM cmis:document WHERE cmis:creationDate < TIMESTAMP '2010-04-01T00:00:00.000+00:00'

ID
Supports =, <>
SELECT * from cmis:document where cmis:objectId = 'ID-1'
SELECT * from cmis:document where cmis:objectId <> 'ID-1'


URI
Not supported in Alfresco

IN predicate

IN is supported for single valued properties of type:

  • String
  • Integer
  • Decimal
  • DateTime
  • ID
SELECT cmis:name FROM cmis:folder where cmis:name IN ('test', 'tube', 'Company Home')
SELECT * FROM some:type WHERE some:integer IN (0, 1, 2, 3)
SELECT * FROM some:type WHERE some:decimal IN (0.1, 0.2)
SELECT * FROM cmis:document WHERE cmis:creationDate IN ( TIMESTAMP '2010-03-31T12:52:12.123Z' )
SELECT cmis:objectId FROM cmis:folder where cmis:objectId IN ('ID-1', 'ID-2')


In is not supported for properties of type:

  • Boolean
  • URI

LIKE predicate

LIKE is only supported for single-valued properties of type String. The behaviour is as SQL-92. However, see the later section on escaping.

SELECT cmis:name FROM cmis:document WHERE cmis:name LIKE '%test%'
SELECT cmis:name FROM cmis:document WHERE cmis:name LIKE 'test'
SELECT cmis:name FROM cmis:document WHERE cmis:name LIKE 't%t'
SELECT cmis:name FROM cmis:document WHERE cmis:name LIKE 't__t' 
- will all match cmis:name set to 'test'
SELECT cmis:name FROM cmis:document WHERE cmis:name LIKE 't\%t'
SELECT cmis:name FROM cmis:document WHERE cmis:name LIKE 't\_\_t'
- will match cmis:name set to 't%t' and 't__t' respectively.

NULL predicate

CMIS does not support null values for properties. Properties may be set or not set. The null predicate tests if the property is set or unset. This applies to both single valued and multi-valued properties. For multi-valued properties if the property is set it is NOT NULL: the values of the multi-valued properties are not important. This differs from what you would expect from the SQL-92 specification.

SELECT * from cmis:document where cmis:name IS NULL
SELECT * from cmis:document where cmis:name IS NOT NULL

quantified comparison predicate

The quantified comparison predicate only applies to multi-valued properties: it can not be used for single valued properties. Only the equality operator is supported. The only quantifier supported is ANY (ALL and SOME are not supported).

SELECT * FROM ext:doc WHERE 'test' = ANY ext:multiValuedStringProperty
where ext:multiValuedStringProperty is a multi-valued String property
SELECT * FROM ext:doc WHERE 1 = ANY ext:multiValuedIntegerProperty

Compared with the SQL-92 <quantified comparison predicate> the <row value constuctor> must be a literal and the <table subquery> has been replaced with the CMIS query language <multi-valued-column reference>.

quantified in predicate

This is an extension to SQL-92 and defines a new IN predicate for use only with multi-valued properties. The only qualifier support is ANY. The predicate is true if any of the values of a multi-valued property match any of those in the IN list. The quantified IN predicate is only supported for multi-valued properties of types with a data type that supports IN, as described above.

SELECT * from ext:doc WHERE ANY ext:multiValuedStringProperty IN ('test', 'tube')

text search predicate

Text search predicates are defined using the CONTAINS() predicate function. This is a CMIS extension to SQL-92. The relevance of a row can be found using the SCORE() function.

The CONTAINS() function accepts two arguments:

  • optional qualifier
  • full text search expression

The qualifier, if provided, must be the name or alias of one of the tables in the FROM clause. The full text search expression should then only match objects of this type. A qualifier must be provided if there is any ambiguity as to which object a full text search expression should match - i.e. there is a JOIN.

The full text search expression is a String literal containing the search expression. The simplest expression is a single term.

SELECT * FROM cmis:document WHERE CONTAINS('test')
SELECT * FROM cmis:document D WHERE CONTAINS(D, 'test')
 

Phrases may also be used but the quotes have to be escaped:

SELECT * FROM cmis:document WHERE CONTAINS('\'test tube\'')
SELECT * FROM cmis:document D WHERE CONTAINS(D, '\'test tube\'')

Both terms and phrases can be preceded with '-' for negation.

SELECT * FROM cmis:document WHERE CONTAINS('-test')
SELECT * FROM cmis:document D WHERE CONTAINS(D, '-test')
SELECT * FROM cmis:document WHERE CONTAINS('-\'test tube\'')
SELECT * FROM cmis:document D WHERE CONTAINS(D, '-\'test tube\'')

Terms and phrases separated by white space are AND'ed together, those separated by OR are OR'ed.

SELECT * FROM cmis:document WHERE CONTAINS('-test tube')
SELECT * FROM cmis:document WHERE CONTAINS('-test OR tube')

Note: Negation has higher precedence than OR, which in turn has higher precedence than implicit AND.

For strict CMIS, it is not possible to escape single quotes in terms or phrases. In non-strict mode, Single quotes can be escaped within phrases but not terms.

SELECT * FROM cmis:document WHERE CONTAINS('\'bob\\\'s\'')
- non-strict only

There may only be one CONTAINS() function in a query and it may only be ANDed with the result of all other predicates.


The SCORE() function does not have any arguments. It returns a decimal value between 0.0 (no relevance) and 1.0 (complete relevance) with respect to the CONTAINS() function specified in the query. SCORE() can only be used in conjunction with CONTAINS(): CONTAINS() may be used without SCORE(). SCORE() can only be used in the select clause of a query. If no alias is provided for SCORE() it will be aliased to SEARCH_SCORE. The SCORE() function can not be used in the WHERE or ORDER BY clauses of the query. The SCORE() function may be referred to by alias in the ORDER BY clause.

SELECT SCORE() FROM cmis:document WHERE CONTAINS('\'test\'')
SELECT SCORE() FROM cmis:document WHERE CONTAINS('\'test\'') ORDER BY SEARCH_SCORE
SELECT SCORE() myscore FROM cmis:document WHERE CONTAINS('\'test\'') ORDER BY myscore

folder predicate

Folder predicate is supported by two predicate functions IN_FOLDER and IN_TREE() which have the same arguments. An optional qualifier with the same meaning as the optional qualifier in the CONTAINS() predicate function and a folder id. The folder id must be the id of a folder in the repository. IN_FOLDER() matches the immediate children of a folder, IN_TREE matches any object beneath the folder.

SELECT cmis:name FROM cmis:document WHERE IN_FOLDER('folder id')
SELECT cmis:name FROM cmis:folder F WHERE IN_TREE(F, 'folder id')

More complex search conditions

SELECT D.*, SCORE() FROM cmis:document D 
   WHERE CONTAINS('test') AND (IN_FOLDER('F-1') OR IN_FOLDER('F-2')) AND D.cmis:name LIKE 't%'
   ORDER BY SEARCH_SCORE DESC, D.cmis:name ASC

ORDER BY clause

The order by clause is a comma separated list of one or more column names. Columns can be referred to directly using their queryName or using their alias as defined in the query. All the properties specified must be orderable. Only columns that are in the SELECT clause may be used for ordering. Collation for properties and property types are repository-specific. You should not rely on this for portability.

Ordering (and querying) may not work as expected for cmis:name if you upgraded from an Alfresco install prior to 3.2 Community Edition. A FULL reindex will resolve this. Before adding content, ordering will be based on all the tokens generated from the content and may fail. Once any new content is added, this will sort as expected. However, all all old content will be unordered at the start or end of the sequence (treated as null).

Ordering may not work as expected for datetime. By default, sort will only use the date. This can be resolved by using the date time analyzer in place of the date analyzer. If you have old data, a FULL reindex will be required.

Date time resolution can be enabled by overriding the property:

d_dictionary.datatype.d_datetime.analyzer=org.alfresco.repo.search.impl.lucene.analysis.DateTimeAnalyser

in preference to the default in alfresco/model/dataTypeAnalyzers.properties

Alfresco specific query capabilities

capabilityQuery
bothcombined - full text content and properties can be used to filter rows
capabilityJoin
none - JOIN is not supported
Note cmis-alfresco supports extensions to join with aspects (a self join)
capabilityPWCSearchable
true - checked out documents are included in the search
capabilityAllVersionsSearchable
false - only the latest versions of documents are searchable.

Differences between cmis-strict and cmis-alfresco

If you use the CMIS API, you will always get the strict version of the language.

To use the alfresco extensions, you can use the SearchService direct (the select list is ignored). Alternatively, you can use the CMIS QueryService direct and control the behavior using the options on the SearchParameter object. You can control a whole lot more than strict vs non-strict.

CMIS SQL and Alfresco extensions are also available via the "Node Browser" and the JavaScript API.

In general, cmis-alfresco will not be portable to other CMIS repositories and is not available via standard CMIS APIs (as it does not conform to the standard).

cmis-alfresco

cmis-alfresco additionally supports:

  • full FTS query language for the CONTAINS() query
SELECT * from cmis:document WHERE CONTAINS('cmis:name:\'test*\'')
- phrase match against cmis:name

SELECT * from cmis:document WHERE CONTAINS('~cmis:name:test')
- FTS match against cmis:name

SELECT * from cmis:document WHERE CONTAINS('~cmis:name:test*')
- FTS wildcard against cmis:name

SELECT * from cmis:document D WHERE CONTAINS(D, 'cmis:name:\'test*\'')
- phrase wildcard against cmis:name

SELECT * from cmis:document WHERE CONTAINS('cmis:createdBy:\'ad*\'')
- phrase wildcard against cmis:createdBy 

SELECT * from cm:ownable WHERE CONTAINS('cm:owner:\'ad*\'')
- phrase wildcard against cm:owner 


  • the default connective for Alfresco FTS can be AND or OR
  • Inner joins to node aspects
    • a self join for aspects applied to Alfresco types
SELECT D.*, O.* FROM cmis:document AS D JOIN cm:ownable AS O ON D.cmis:objectId = O.cmis:objectId

cmis-alfresco relaxes the following restrictions:

  • SCORE() may be used without CONTAINS()
  • There can be more then one CONTAINS() - SCORE() is the overall score
  • The restrictions for IN, comparisons and LIKE are dropped
    • mvps can use svp syntax
  • You can order by unselected columns
  • queryable and orderable are not checked
  • escaping of any character is allowed in CONTAINS()
  • identifiers may be double-quoted key words
  • the parser accepts query parameters as defined in early version of the specification, but there is no implementation to support it

compliance

Alfresco does not comply fully with the following:

  • CONTAINS() can appear anywhere in the WHERE clause - not just ANDed with the result of all other predicates
  • comparison predicates do not have to have the column reference on the LHS and the literal on the right
  • Alfresco supports NULL property values
    • these can not be distinguished from unset

Alfresco extended relational mapping

Extended mapping is supported for Alfrecso types derived from cm:content and cm:folder, which are mapped to cmis:document and cmis:folder, respectively. It is not possible to refer to cm:content and cm:folder directly: you must use cmis:document and cmis:folder.

The behavior of extended types is governed by their data dictionary definitions for indexing. All properties may be selected. If a property is indexed, it is queryable. If it is untokenized, both tokenized and untokenized, or of one of the orderable types, then it is orderable.

The orderable types are:

  • d:boolean
  • d:datetime
  • d:double
  • d:float
  • d:integer
  • d:long
  • d:path
SELECT * FROM CM:OWNABLE
- Aspects live a half-life as policies in the non-strict world

SELECT cm:thumbnailName from cm:thumbnail 
- In the strict world you can refer to subtypes of cm:content and cm:folder and their properties as you would expect,
  using prefix:localName style notation

Literals

The implementation maps all literals to String before conversion to the required type. As a result, a String literal representing a number is no different from a(n unquoted) numeric literal.

String Literals

String literals are enclosed in single quotes. Escaping does not follow SQL-92 escaping. Two single quotes within a string literal do not represent a quote character: '''' is not a single quote literal.

CMIS defines backslash-based escaping with the available escape characters, determined by context.

'bob\'s'      is used in place of           'bob''s'
SELECT * from cmis:document WHERE cmis:name = 'bob\'s'

Basic escaping:

  • \\ represents \
  • \' represents '

In addition to basic escaping, in LIKE expressions

  • \% represents %
  • \_ represents _

In addition to basic escaping, in CONTAINS() phrases double escaping is required (non-strict only)

  • CONTAINS('\'bob\\\'s\'')

numeric literals

As SQL-92 <signed numeric literal> (but implemeted the Java way ...)

decimal integers
0
100
-123
+123
exact decimal floating point literals
10.
10.0
-10.0
+10.0
.3
approximate floating point literals
1E-2
1.E+2
-1E-2
+1.0E+2
.1E2

Note: Alfresco currently accepts "e" in place of "E" which does not conform to SQL-92.

Note: String literal may also be used.

datetime literals

Based on SQL-92 <timestamp literal> with 'T' replacing <space> in <timestamp string>, <time value> extended to include milli-seconds, and <time zone interval> modified to include 'Z'.

TIMESTAMP '2010-04-01T12:15:00.000Z'  
TIMESTAMP '2010-04-01T12:15:00.000+01:00' 
TIMESTAMP '2010-04-01T12:15:00.000-02:00'
In Java SimpleDateFormat terms:
yyyy-MM-dd'T'HH:mm:ss.SSS<time zone> 
with the <time zone> as Z, +hh:mm, or -hh:mm which does not quite match up :-)
Also, you can use:
org.alfresco.util.CachingDateFormat.getCmisSqlDatetimeFormat()

boolean literals

  • TRUE
  • true
  • FALSE
  • false

Note: String literals may also be used. Alfresco also currently accepts mixed case literals.

Upgrading to 3.2 or 3.3 from previous versions of Alfresco and CMIS query

The cm:name Alfresco property is mapped to cmis:name. We introduced dual tokenization for cm:name to support CMIS query exact matching and ordering. This is an index change compared with previous versions of Alfresco. If you upgrade from previous versions of Alfresco you will have to fully rebuild your indexes to get the correct behaviour for ordering and quantified predicates using cmis:name.

Clean 3.3 install will behave correctly.

If the index field to support correct ordering is not present it will order by token which can throw odd errors. (e.g. "Search failed due to: java.lang.RuntimeException: there are more terms than documents in field "@{http://www.alfresco.org/model/content/1.0}name"). If you add new stuff, then the new stuff will be ordered and the old stuff will be at one end or the other unordered.

Alfresco does not have a patch to rebuild the index for this support; it is a customer decision if they require ordering then they rebuild.

For the same reason, comparison predicates may fail if the index has not been rebuilt.

Configuring DateTime resolution

To get full DateTime resolution for quantified predicates and ordering the analyzer for date time in Alfresco needs to changed from date to date time. This will also require an index rebuild if the change is not made at installation time.

Note: this was raised as a Jira in https://issues.alfresco.com/jira/browse/ALF-5378

In classes/alfresco/model/dataTypeAnalyzers.properties

change:

d_dictionary.datatype.d_datetime.analyzer=org.alfresco.repo.search.impl.lucene.analysis.DateAnalyser

into:

 d_dictionary.datatype.d_datetime.analyzer=org.alfresco.repo.search.impl.lucene.analysis.DateTimeAnalyser
Personal tools
Download and go
© 2014 Alfresco Software, Inc. All Rights Reserved. Legal | Privacy | Accessibility