Forums / Install & configuration / Clustering techniques

Clustering techniques

Author Message

Sankar Melethat

Sunday 26 July 2009 9:09:35 pm

Hello,

We have a question on ezPublish clustering mode. From the research we have found that , ez allows clustering by storing the cache files inside the database and have a reverse proxy setup in apache for image files.

In our scenario this would not scale because we are dealing with huge amount of static files (not only images). So storing everything at the DB level is out of question.

Our scenario is that we have 2 webservers of Apache for ez installation and configuration. Please find our approach below.

1) Install a file server.

2) Have a file system mount from the file server to both the web servers.

3) Have the install of apache on both the webservers but point the docroot to the mounted file system.

4) In this way, the cache, storage and other directories would be shared across both the webservers.

5) The instances of ezPublish on both the webservers would point to the same DB.

Is this a good approach when we are dealing with large number of files? I have heard that NFS mount and PHP does not go well together and ez Publish versions before 4.1 faced a lot of issues due to this. Is this resolved in ezPublish 4.1.x?

Can we have this approach. Please validate and provide your thoughts.

Thomas Koch

Sunday 26 July 2009 11:44:36 pm

YMC uses an alternative strategy for clustering eZ Publish. Have a look at
http://www.ymc.ch/weblog/some_due_modifications_to_the_ez_cluster
Please ask my collegue Daniel Beyer for technical details.

---
Thomas Koch | http://koch.ro
YMC - eZ Publish in Switzerland | http://ymc.ch

Paul Borgermans

Monday 27 July 2009 12:47:48 am

Hello Sankar

For clustering mode with lots of "static" content (I presume you mean pdf, images, ...), a new cluster handler is available in trunk (hence will be part of ez publish 4.2, release end of september).

This uses NFS (or any distributed file system like GFS, ..) as a transport between cluster nodes. The caches are still served by each node individually from a local storage.

The meta data on whether files are expired and so, is handled by a DB layer to avoid latencies and subsequent cache inconsistencies

You can read more here (specification):

http://pubsvn.ez.no/nextgen/trunk/doc/specifications/trunk/db_nfs_cluster_handler/dbnfsclusterhandler.txt

Regards
Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Gaetano Giunta

Monday 27 July 2009 1:29:48 am

Best practices for using NFS mounts and clusters of web servers with eZ Publish are:

- get a hardware-based NFS server, not a linux-based pc (eg. emc, netapp, hp, etc...)
- make sure locking-over-nfs has been enabled
- have somebody in-house knowledgeable about nfs setup and tuning
- minimize the usage of files over nfs shares as much as you can, since eZ Publish will generate a big amount of nfs calls (nb: nfs calls != bandwidth) for dealing with cache
That might include:
-- store ezp files and settings locally, use rsync to deploy them
-- store ezp (and apache) log files locally, use simlynks to get a working setup

Principal Consultant International Business
Member of the Community Project Board

Ivo Lukac

Monday 27 July 2009 5:21:23 am

The best solution is to use SAN box and file system with integrated locking like gfs.

Best => most expensive of course :)

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

Björn Dieding@xrow.de

Monday 27 July 2009 4:57:59 pm

I voted SAN on GFS in the past.

Now I vote NFS on XFS, eZ 4.2.

Fibrechannel / SAN is just too expensive.

Looking for a new job? http://www.xrow.com/xrow-GmbH/Jobs
Looking for hosting? http://hostingezpublish.com
-----------------------------------------------------------------------------
GMT +01:00 Hannover, Germany
Web: http://www.xrow.com/

Alexandre Bulté

Thursday 30 July 2009 5:52:31 am

@Gaetano

What do you mean exactly by "locking-over-nfs has been enabled"? Is it a specific nfs server conf?

From what you say, I assume it's a bad idea to share all the eZPublish instance (not only storage and cache) on the NFS share?

Thanks.

Gaetano Giunta

Thursday 30 July 2009 6:12:55 am

NFS versions up to 3 by default does not support locking - it was not built into the protocol.
You can overcome this limitation by usage of external locking mechanisms, and yes, it might be enabled or disabled.
Or you can use nfs 4, which has locking built in. But the hw/os will usually dictate the version of nfs you can use.

http://nfs.sourceforge.net/ is a good source of nfs information.

About sharing the complete eZP install or not: ymmv.

Knowing that nfs traffic can easily become the bottleneck in such a configuration, I'd play it safe and try to limit the number of files mounted over nfs to the strict minimum. But you might prefer a simplified administration/deployment scheme instead. If you can afford it, test your setup by executing the 'worst case scenario': delete the complete cache dirs (by hand: real cold hard delete) while using ab to simulate the peak load moment of your site.

Principal Consultant International Business
Member of the Community Project Board

Paul Borgermans

Saturday 26 September 2009 6:58:53 am

The above observations in the post by Gaetano are not relevant for the new nfs/dfs cluster system introduced in eZ Publish 4.2 (and installed with success for very large media sites as a backport to previous versions)

In the new cluster handler, NFS based locking is not used at all and it will work with other distributed file systems too. File meta-data is handled in a dedicated DB table instead.

The basic idea implemented is that NFS is used as a transport for cache files, images, .... between cluster nodes. The cluster nodes still serve the files from their local file system.

If you need more clarifications, just ask ;)

Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans