Cluster Configuration V2.1.3 and Later
From AlfrescoWiki
Back to Server Configuration
Contents |
[edit] Introduction
This page aims to describe some of the components and configuration options for running multiple server instances. Sample configurations or configuration snippets that arise from real-world solutions will be given. There will be many combinations of configurations and levels of complexity left out. Feel free to contribute further examples.
The configurations here apply to Alfresco V2.1.3 onwards, and does not apply to WCM installations (which cannot be clustered in the 2.x series).
NOTE: This document assumes knowledge of how to extend the server configuration. (Repository Configuration) The following naming convention applies:
- <configRoot> - the location of the default configuration files for a standard Alfresco installation (e.g. file <configRoot>/alfresco/application-context.xml)
- <extConfigRoot> - the externally located configuration location (e.g. file <extConfigRoot>/alfresco/extension/custom-repository-context.xml)
[edit] High Availability Components
[edit] Client Session Replication
This is supported in Alfresco V2.2.0 onwards.
[edit] Content Store Replication
The underlying content binaries are distributed by either sharing a common content store between all machines in a cluster or by replicating content between the clustered machines and a shared store(s).
[edit] Index Synchronization
The indexes provide searchable references to all nodes in Alfresco. The index store is transaction-aware and cannot be shared between servers. The indexes can be recreated from references to the database tables. In order to keep the indexes up to date (indicated by boxes ‘Index A’ and ‘Index B’) a timed thread updates each index directly from the database. When a node is created (this may be a user node, content node, space node, etc) metadata is indexed and the transaction information is persisted in the database. When the synchronization thread runs, all other indexes in the cluster are updated using the transaction information stored in the database.
[edit] Cache Replication
The Level 2 cache provides out-of-transaction caching of java objects inside the alfresco system. Alfresco only provides support for EHCache. This guide describes the synchronisation of EHCache across clusters. Using EHCache does not restrict the Alfresco system to any particular application server, so it is completely portable.
[edit] Database Synchronisation
It is possible to have co-located databases which synchronize with each other. This guide describes the setup for a shared database. If you wish to use co-located databases then refer to your database vendor’s documentation to do this.
[edit] Scenarios
[edit] Simple Repository Clustering
- Single, shared content store
- Sticky client sessions
In this scenario, we have a single repository database and filesystem (for the content store) and multiple web app servers accessing the content simultaneously. This configuration does not guard against repository filesystem or database failure, but allows multiple web servers to share the web load, and provides redundancy in case of a web server failure. Each web server has local indexes (on the local filesystem).
For this example we will utilize a hardware load balancer to balance the web requests among multiple web servers. The load balancer must support 'sticky' sessions so that each client always connects to the same server during the session. The filesystem and database will reside on separate servers, which allows us to use alternative means for filesystem and database replication. The configuration in this case will consist of L2 Cache replication and index synchronization.
[edit] L2 Cache Replication
We will use EHCache replication to update every cache in the cluster with changes from each server. This is accomplished by overriding the default EHCache configuration in <extConfigRoot/alfresco/extension/ehcache-custom.xml The complete configuration is given in the sample extension file ehcache-custom.xml.sample.cluster.
ehcache-custom.xml
<ehcache>
<diskStore
path="java.io.tmpdir"/>
<cacheManagerPeerProviderFactory
class="net.sf.ehcache.distribution.RMICacheManagerPeerProviderFactory"
properties="peerDiscovery=automatic,
multicastGroupAddress=230.0.0.1,
multicastGroupPort=4446"/>
<cacheManagerPeerListenerFactory
class="net.sf.ehcache.distribution.RMICacheManagerPeerListenerFactory"
properties="port=40001, socketTimeoutMillis=2000"/>
...
</ehcache>
[edit] Lucene Index Synchronization
The above configuration has been pulled into internal context files. Properties changing the reindex behaviour are defined and available for overriding in the general repository properties file:
<conf>/alfresco/repository.properties
# Set the frequency with which the index tracking is triggered.
# By default, this is effectively never, but can be modified as required.
# Examples:
# Once every five seconds: 0/5 * * * * ?
# Once every two seconds : 0/2 * * * * ?
# See http://quartz.sourceforge.net/javadoc/org/quartz/CronTrigger.html
index.tracking.cronExpression=? ? ? ? ? ? 2099
index.tracking.adm.cronExpression=${index.tracking.cronExpression}
index.tracking.avm.cronExpression=${index.tracking.cronExpression}
# Other properties.
index.tracking.maxTxnDurationMinutes=60
index.tracking.reindexLagMs=1000
index.tracking.maxRecordSetSize=1000
The triggers for ADM (document management) and AVM (web content management, where applicable) index tracking are combined into one property for simplicity. These can be set separately, if required. The following properties should typically be modified in the clustered environment:
<ext-conf>/alfresco/extension/custom-repository.properties:
index.tracking.cronExpression=0/5 * * * * ? index.recovery.mode=AUTO
Setting the recovery mode to AUTO will ensure that the indexes are fully recovered if missing or corrupt, and will top the indexes up during bootstrap in the case where the indexes are out of date. This happens frequently when a server node is introduced to the cluster. AUTO will ensure that backup, stale or no indexes can be used for the server.
NOTE: The index tracking relies heavily on the approximate commit time of transactions. This means that machines in a cluster need to be time-synchronized, the more accurately the better. The default configuration only triggers tracking every 5 seconds and enforces a minimum age of transaction of 1 second. This is controlled by the property index.tracking.reindexLagMsFor example, if the clocks of the machines in the cluster can only be guaranteed to within 5 seconds, then the tracking properties might look like this:
<ext-conf>/alfresco/extension/custom-repository.properties:
index.tracking.cronExpression=0/5 * * * * ? index.recovery.mode=AUTO index.tracking.reindexLagMs=10000
[edit] Local and Shared Content Stores
[edit] Scenario
- Local content store replicated to shared store
Read the Content Store Configuration as an introduction.
This scenario extends the previous examples by showing how to replicate the content stores to have a content store local to each machine that replicates data to and from a shared location. This may be required if there is high latency when communicating with the shared device, if the shared device doesn't support fast random access read/write file access.
[edit] Content Replication
Assume that both Server A and Server B (and all servers in the cluster) store their content locally in /var/alfresco/content-store. The Shared Backup Store is visible to all servers as /share/alfresco/content-store. The following configuration override must be applied to all servers: <ext-config>/alfresco/extension/replicating-content-services-context.sample:
<bean id="localDriveContentStore" class="org.alfresco.repo.content.filestore.FileContentStore">
<constructor-arg>
<value>/var/alfresco/content-store</value>
</constructor-arg>
</bean>
<bean id="networkContentStore" class="org.alfresco.repo.content.filestore.FileContentStore">
<constructor-arg>
<value>/share/alfresco/content-store</value>
</constructor-arg>
</bean>
<bean id="fileContentStore" class="org.alfresco.repo.content.replication.ReplicatingContentStore" >
<property name="primaryStore">
<ref bean="localDriveContentStore" />
</property>
<property name="secondaryStores">
<list>
<ref bean="networkContentStore" />
</list>
</property>
<property name="inbound">
<value>true</value>
</property>
<property name="outbound">
<value>true</value>
</property>
<property name="retryingTransactionHelper">
<ref bean="retryingTransactionHelper"/>
</property>
</bean>
[edit] Read-only Clustered Server
[edit] Scenario
- Some servers are read-only
It is possible to bring a server up as part of the cluster, but force all transactions to be read-only. This effectively prevents any database writes.
[edit] Custom Repository Properties
<ext-config>/alfresco/extension/custom-repository.properties:
# the properties below should change in tandem server.transaction.mode.default=PROPAGATION_REQUIRED, readOnly server.transaction.allow-writes=false #server.transaction.mode.default=PROPAGATION_REQUIRED #server.transaction.allow-writes=true
[edit] Verifying the Cluster
This section addresses the steps required to start the clustered servers and test the clustering after the the necessary configuration changes have been made to the servers.
[edit] Installing and Upgrading on a Cluster
[edit] Clean Installation
- Start Alfresco on one of the clustered machines and ensure that the bootstrap process proceeds normally.
- Test the system's basic functionality:
- Login as different users
- Access some content
- Perform a search
- Modify some properties
[edit] Upgrades
- Upgrade the WAR and config for all machines in the cluster.
- Start Alfresco on one of the clustered machines and ensure that the upgrade proceeds normally.
- Test the system's basic functionality.
- Start each machine in the cluster one
[edit] Testing the Cluster
There are a set of steps that can be done to verify that clustering is working for the various components involved. You will need direct web client access to each of the machines in the cluster. The operation is done on machine M1 and verified on the other machines Mx. The process can be switched around with any machine being chosen as M1.
[edit] Cache Clustering
- M1: Login as admin.
- M1: Browse to the Guest Home space, locate the tutorial PDF document and view the document properties.
- Mx: Login as admin.
- Mx: Browse to the Guest Home space, locate the tutorial PDF document and view the document properties.
- M1: Modify the tutorial PDF's description field, adding 'abcdef'.
- M2: Refresh the document properties view of the tutorial PDF document.
- M2: The description field must have changed to include 'abcdef'.
[edit] Index Clustering
- ... Repeat Cache Clustering.
- M1: Perform an advanced search of the description field for 'abcdef' and verify that the tutorial document is returned.
- Mx: Search the description field for 'abcdef'. As long as enough time was left for the index tracking (10s or so), the document must show up in the search results.
[edit] Content Replication / Sharing
- M1: Add a text file to Guest Home containing 'abcdef'.
- M2: Perform a simple search for 'abcdef' and ensure that the new document is retrieved.
- Mx: Refresh the view of Guest Home and verify that the document is visible. This relies on index tracking, so it may take a few seconds for the document to be visible.
- Mx: Open the document and ensure that the correct text is visible.
- Mx: Perform a simple search for 'abcdef' and ensure that the new document is retrieved.
[edit] If It Isn't Working ...
The following log categories can be enabled to help track issues in the cluster.
- log4j.logger.net.sf.ehcache.distribution=DEBUG: Check that heartbeats are receieved from from live machines.
- log4j.logger.org.alfresco.repo.node.index.IndexTransactionTracker=DEBUG: Remote index tracking for ADM.
- log4j.logger.org.alfresco.repo.node.index.AVMRemoteSnapshotTracker=DEBUG: Remote index tracking for AVM.
If cache clustering isn't working, the EHCache website describes some common problems: EHCache Documentation. The remote debugger can be downloaded as part of the EHCache distribution files and executed:
> java -jar ehcache-1.3.0-remote-debugger.jar Command line to list caches to monitor: java -jar ehcache-remote-debugger.jar path_to_ehcache.xml Command line to monitor a specific cache: java -jar ehcache-remote-debugger.jar path_to_ehcache.xml cacheName
Back to Server Administration Guide



