Difference between revisions of "Content Store Configuration"

From alfrescowiki

Jump to: navigation, search
(Marking as a page that needs work)
 
Line 1: Line 1:
[[Category:Administration]][[Category:Replication]][[Category:Content Store]]
+
{{Needs Work}}
 +
 
 
<i>Back to [[Server Configuration]]</i>
 
<i>Back to [[Server Configuration]]</i>
 
----
 
----
Line 172: Line 173:
  
 
See [[Content_Store_Replication]] for configuration of a replicator.
 
See [[Content_Store_Replication]] for configuration of a replicator.
 +
 +
[[Category:Administration]][[Category:Replication]][[Category:Content Store]][[Category:Page Needs Work]]

Latest revision as of 16:09, 15 July 2014

Warning: Page Needs Work
This page has been identified as one that needs specific attention to maintain, expand, correct, or improve. The specific needs should be listed on the discussion page. Please consider making the requested changes.

Back to Server Configuration


Introduction

In Alfresco, the content binaries are stored separately from the metadata, which is always found in the database. The primary metadata that acts as a reference to the binaries takes the form contentUrl=store://.........|mimetype=...(etc). The abstraction that takes care of mapping the store://... part of the reference to a physical location is the ContentStore interface.

Basics

Default Configuration

The default configuration beans are:

  • <configRoot>/alfresco/content-services-context.xml
    • fileContentStore
    • deletedContentStore
    • deletedContentBackupListener
    • contentStoreCleaner
    • contentService
  • <configRoot>/alfresco/scheduled-jobs-context.xml
    • contentStoreCleanerTrigger

Classes that can be configured in:

  • org.alfresco.repo.content.filestore.FileContentStore
  • org.alfresco.repo.content.cleanup.DeletedContentBackupCleanerListener

Content Binaries' Lifecycle

Content Writes

  1. A request is made to the ContentStore for a ContentWriter.
  2. The ContentStore creates the appropriate storage for the new content. This will either be a new' binary or a copy of an existing binary, depending on the type of access required by the client.
  3. The ContentWriter hooks appropriate listeners to the content streams.
  4. The ContentService wires the ContentWriter up to the current transaction.
  5. The client writes to the stream using one of several convenience methods or the raw NIO Channel.
  6. Upon stream closure, the metadata is written to the database in the context of the current transaction.

Content Reads

  1. A request is made to the ContentStore for a ContentReader.
  2. The ContentStore opens the underlying NIO Channel.
  3. The client reads the content using methods on the ContentReader.

Copying, Moving and Versioning Files

Once a write Channel has been closed, the content is never modified by any high-level processes. Moving, copying and versioning a file merely affects the content metadata. It is possible to end up with several references to the same underlying raw binary content.

Content Binaries and Transactions

Because binaries are not modified, it means that writes to the filesystem do not become visible until the metadata has been committed to the database. In the event of transaction failure or rollback, the metadata will be left in the pre-transaction state i.e. referencing the older binary; the newer content binary will be left in an orphaned state for later cleanup.

Deleting Files

When a file node (or anything containing a reference to raw content) is permanently deleted, there is just one less reference to the raw content. When there are no more references to some raw content, it is called orphaned. Were nothing further done, the content stores would just irreversibly fill up with content.

Cleaning up Orphaned Content (Purge)

Once all references to a content binary have been removed from the metadata, the content is said to be orphaned. Orphaned content can be deleted or purged from the content store while the system is running. Identifying and either sequestering or deleting the orphaned content is the job of the contentStoreCleaner.

In the default configuration, the contentStoreCleanerTrigger fires the contentStoreCleaner bean.

  <bean id="contentStoreCleaner" class="org.alfresco.repo.content.cleanup.ContentStoreCleaner" >
     ...
     <property name="protectDays" >
        <value>14</value>
     </property>
     <property name="stores" >
        <list>
           <ref bean="fileContentStore" />
        </list>
     </property>
     <property name="listeners" >
        <list>
           <ref bean="deletedContentBackupListener" />
        </list>
     </property>
  </bean>
  • protectDays

Use this property to dictate the minimum time that content binaries should be kept in the contentStore. In the above example, if a file is created and immediately deleted, it will not be cleaned from the contentStore for at least 14 days. The value should be adjusted to account for backup strategies, average content size and available disk space. Setting this value to zero will result in a system warning as it breaks the transaction model and it is possible to lose content if the orphaned content cleaner runs whilst content is being loaded into the system. If the system backup strategy is just to make regular copies, then this value should also be greater than the number of days between successive backup runs.

  • store

This is a list of ContentStore beans to scour for orphaned content.

  • listeners

When orphaned content is located, these listeners are notified. In this example, the deletedContentBackupListener copies the orphaned content to a separate deletedContentStore.

Note that this configuration will not actually remove the files from the file system but rather moves them to the designated deletedContentStore, usually contentstore.deleted. The files can be removed from the deletedContentStore via script or cron job once an appropriate backup has been performed.

Eager Content Cleanup

If you have an appropriate backup strategy, usually involving a ReplicatingContentStore, then the content can be removed after a day and need not be sent to a backup deletedContentStore. In your custom configuration context, override the contentStoreCleaner bean as follows:

  <bean id="contentStoreCleaner" class="org.alfresco.repo.content.cleanup.ContentStoreCleaner" >
     <property name="dictionaryService">
        <ref bean="dictionaryService" />
     </property>
     <property name="nodeDaoService" >
        <ref bean="nodeDaoService" />
     </property>
     <property name="avmNodeDAO">
     	  <ref bean="avmNodeDAO"/>
     </property>
     <property name="transactionService" >
        <ref bean="transactionComponent" />
     </property>
     <property name="protectDays" >
        <value>1</value>
     </property>
     <property name="stores" >
        <list>
           <ref bean="fileContentStore" />
        </list>
     </property>
  </bean>


Changing the ContentStore Implementation

The ContentService deals with a single ContentStore injected into the store property. Assuming an alternative implementation of a ContentStore is written (com.x.y.MyDBStore), then the fileContentStore bean must be overridden as follows:

  <bean id="fileContentStore" class="com.x.y.MyDBStore">
     ... properties ...
  </bean>


Content Caching

See CachingContentStore

Content Replication

We mentioned earlier that there are hooks put onto the content write stream, so that any number of tasks can be performed, in the same transaction, when the stream is closed. It is possible to replicate content between the primary fileContentStore and any number of secondary content stores upon stream closure. The component that handles this is the org.alfresco.repo.content.replication.ReplicatingContentStore.

For example, let us assume that your server has a fast, big, local disk to store content on /var/alfresco/content-store. However, for backup purposes, the content is best stored on a network filesystem accessible on /share/alfresco/content-store. In order to keep storage costs down, really old content is archived to a tape drive that is accessible on /tape/alfresco/content-store-archives.

   <bean id="localDriveContentStore" class="org.alfresco.repo.content.filestore.FileContentStore">
      <constructor-arg>
         <value>/var/alfresco/content-store</value>
      </constructor-arg>
   </bean>
   <bean id="networkContentStore" class="org.alfresco.repo.content.filestore.FileContentStore">
      <constructor-arg>
         <value>/share/alfresco/content-store</value>
      </constructor-arg>
   </bean>
   <bean id="tapeDriveContentStore" class="org.alfresco.repo.content.filestore.FileContentStore">
      <constructor-arg>
         <value>/tape/alfresco/content-store-archives</value>
      </constructor-arg>
      <property name="readOnly">
         <value>true</value>
      </property>
   </bean>
   <bean id="fileContentStore" class="org.alfresco.repo.content.replication.ReplicatingContentStore" >
      <!-- the preferred store for reads and writes -->
      <property name="primaryStore">
         <ref bean="localDriveContentStore" />
      </property>
      <!-- fastest to slowest, including any read-only stores -->
      <property name="secondaryStores">
         <list>
            <ref bean="networkContentStore" />
            <ref bean="tapeDriveContentStore" />
         </list>
      </property>
      <!-- enable content missing from the primary store to be pulled in from the secondary stores -->
      <property name="inbound">
         <value>true</value>
      </property>
      <!-- enable replication from the primary to the secondary stores -->
      <property name="outbound">
         <value>true</value>
      </property>
      <!-- This is required for proper transactional behaviour during outbound replication -->
      <property name="retryingTransactionHelper">
         <ref bean="retryingTransactionHelper"/>
      </property>
      <!-- set this to force outbound replication to be asynchronous -->
      <!-- Not normally used.  See class javadocs.
      <property name="outboundThreadPoolExecutor">
         <ref bean="threadPoolExecutor" />
      </property>
      -->
   </bean>

See Content_Store_Replication for configuration of a replicator.

Personal tools
© 2014 Alfresco Software, Inc. All Rights Reserved. Legal | Privacy | Accessibility