Content Store Configuration

From AlfrescoWiki

Jump to: navigation, search

Back to Server Configuration


Contents

[edit] Introduction

In Alfresco, the content binaries are stored separately from the metadata, which is always found in the database. The primary metadata that acts as a reference to the binaries takes the form contentUrl=store://.........|mimetype=...(etc). The abstraction that takes care of mapping the store://... part of the reference to a physical location is the ContentStore interface.

[edit] Basics

[edit] Default Configuration

The default configuration beans are:

  • <configRoot>/alfresco/content-services-context.xml
    • fileContentStore
    • deletedContentStore
    • deletedContentBackupListener
    • contentStoreCleaner
    • contentService
  • <configRoot>/alfresco/scheduled-jobs-context.xml
    • contentStoreCleanerTrigger

Classes that can be configured in:

  • org.alfresco.repo.content.filestore.FileContentStore
  • org.alfresco.repo.content.cleanup.DeletedContentBackupCleanerListener


[edit] Content Binaries' Lifecycle

[edit] Content Writes

  1. A request is made to the ContentStore for a ContentWriter.
  2. The ContentStore creates the appropriate storage for the new content. This will either be free space or a copy of an existing binary, depending on the type of access required by the client.
  3. The ContentWriter hooks appropriate listeners to the content streams.
  4. The ContentService wires the ContentWriter up to the current transaction.
  5. The client writes to the stream using one of several convenience methods or the raw NIO Channel.
  6. Upon stream closure, the metadata is written to the database.

[edit] Content Reads

  1. A request is made to the ContentStore for a ContentReader.
  2. The ContentStore opens the underlying NIO Channel.
  3. The client reads the content using methods on the ContentReader.

[edit] Copying, Moving and Versioning Files

Once a write Channel has been closed, the content is never modified by any high-level processes. Moving, copying and versioning a file merely affects the content metadata. It is possible to end up with several references to the same underlying raw binary content.

[edit] Deleting Files

When a file node (or anything containing a reference to raw content) is permanently deleted, there is just one less reference to the raw content. When there are no more references to some raw content, it is called orphaned. Were nothing further done, the content stores would just irreversably fill up with content.

[edit] Cleaning up Orphaned Content (Purge)

Once all references to a content binary have been removed from the metadata, the content is said to be orphaned. Orphaned content can be deleted or purged from the content store while the system is running. Identifying and either sequestering or deleting the orphaned content is the job of the contentStoreCleaner.

In the default configuration, the contentStoreCleanerTrigger fires the contentStoreCleaner bean.

  <bean id="contentStoreCleaner" class="org.alfresco.repo.content.cleanup.ContentStoreCleaner" >
     ...
     <property name="protectDays" >
        <value>14</value>
     </property>
     <property name="stores" >
        <list>
           <ref bean="fileContentStore" />
        </list>
     </property>
     <property name="listeners" >
        <list>
           <ref bean="deletedContentBackupListener" />
        </list>
     </property>
  </bean>
  • protectDays

Use this property to dictate the minimum time that content binaries should be kept in the contentStore. In the above example, if a file is created and immediately deleted, it will not be cleaned from the contentStore for at least 14 days. The value should be adjusted to account for backup strategies, average content size and available disk space. Setting this value to zero will result in a system warning as it breaks the transaction model and it is possible to lose content if the orphaned content cleaner runs whilst content is being loaded into the system. If the system backup strategy is just to make regular copies, then this value should also be greater than the number of days between successive backup runs.

  • store

This is a list of ContentStore beans to scour for orphaned content.

  • listeners

When orphaned content is located, these listeners are notified. In this example, the deletedContentBackupListener copies the orphaned content to a separate deletedContentStore.

Note that this configuration will not actually remove the files from the file system but rather moves them to the designated deletedContentStore, usually contentstore.deleted. The files can be removed from the deletedContentStore via script or cron job once an appropriate backup has been performed.

[edit] Eager Content Cleanup

If you have an appropriate backup strategy, usually involving a ReplicatingContentStore, then the content can be removed after a day and need not be sent to a backup deletedContentStore. In your custom configuration context, override the contentStoreCleaner bean as follows:

  <bean id="contentStoreCleaner" class="org.alfresco.repo.content.cleanup.ContentStoreCleaner" >
     <property name="dictionaryService">
        <ref bean="dictionaryService" />
     </property>
     <property name="nodeDaoService" >
        <ref bean="nodeDaoService" />
     </property>
     <property name="avmNodeDAO">
     	  <ref bean="avmNodeDAO"/>
     </property>
     <property name="transactionService" >
        <ref bean="transactionComponent" />
     </property>
     <property name="protectDays" >
        <value>1</value>
     </property>
     <property name="stores" >
        <list>
           <ref bean="fileContentStore" />
        </list>
     </property>
  </bean>


[edit] Changing the ContentStore Implementation

The ContentService deals with a single ContentStore injected into the store property. Assuming an alternative implementation of a ContentStore is written (com.x.y.MyDBStore), then the fileContentStore bean must be overridden as follows:

  <bean id="fileContentStore" class="com.x.y.MyDBStore">
     ... properties ...
  </bean>


[edit] Content Replication

We mentioned earlier that there are hooks put onto the content write stream, so that any number of tasks can be performed, in the same transaction, when the stream is closed. It is possible to replicate content between the primary fileContentStore and any number of secondary content stores upon stream closure. The component that handles this is the org.alfresco.repo.content.replication.ReplicatingContentStore.

For example, let us assume that your server has a fast, big, local disk to store content on /var/alfresco/content-store. However, for backup purposes, the content is best stored on a network filesystem accessible on /share/alfresco/content-store. In order to keep storage costs down, really old content is archived to a tape drive that is accessible on /tape/alfresco/content-store-archives.

   <bean id="localDriveContentStore" class="org.alfresco.repo.content.filestore.FileContentStore">
      <constructor-arg>
         <value>/var/alfresco/content-store</value>
      </constructor-arg>
   </bean>
   <bean id="networkContentStore" class="org.alfresco.repo.content.filestore.FileContentStore">
      <constructor-arg>
         <value>/share/alfresco/content-store</value>
      </constructor-arg>
   </bean>
   <bean id="tapeDriveContentStore" class="org.alfresco.repo.content.filestore.FileContentStore">
      <constructor-arg>
         <value>/tape/alfresco/content-store-archives</value>
      </constructor-arg>
      <property name="readOnly">
         <value>true</value>
      </property>
   </bean>
   <bean id="fileContentStore" class="org.alfresco.repo.content.replication.ReplicatingContentStore" >
      <!-- the preferred store for reads and writes -->
      <property name="primaryStore">
         <ref bean="localDriveContentStore" />
      </property>
      <!-- fastest to slowest, including any read-only stores -->
      <property name="secondaryStores">
         <list>
            <ref bean="networkContentStore" />
            <ref bean="tapeDriveContentStore" />
         </list>
      </property>
      <!-- enable content missing from the primary store to be pulled in from the secondary stores -->
      <property name="inbound">
         <value>true</value>
      </property>
      <!-- enable replication from the primary to the secondary stores -->
      <property name="outbound">
         <value>true</value>
      </property>
      <!-- This is required for proper transactional behaviour during outbound replication -->
      <property name="retryingTransactionHelper">
         <ref bean="retryingTransactionHelper"/>
      </property>
      <!-- set this to force outbound replication to be asynchronous -->
      <!-- Not normally used.  See class javadocs.
      <property name="outboundThreadPoolExecutor">
         <ref bean="threadPoolExecutor" />
      </property>
      -->
   </bean>