Using MySQL 5 cluster with eZ publish

Author Message

Daniel Beyer

Monday 19 December 2005 4:24:19 am

As eZ publish is used for more and more business critical applications, it's very important to have it installed in a failsafe an redundant environment. To achieve such an environment you need -beside other things- a redundant and failsafe database, like MySQL 5 cluster.

So this topic is dedicated to experiences with running eZ publish. on a MySQL 5 cluster.

Any input is welcome, so let's share your information.

Daniel Beyer
_________________________________
YMC AG
Kreuzlingen, Switzerland
web: www.ymc.ch
____________________________________

Daniel Beyer

Monday 19 December 2005 4:56:19 am

Last wekend we set up a MySQL 5 cluster for testing eZ publish on it.

At the moment the primary problem seems getting eZ publish running on MySQL 5 cluster at all.
I reported this problem at the bug section right here on ez.no. See:
http://ez.no/bugs/view/7580

The secondary problem we discovered is the speed of MySQL 5 cluster together with eZ publish. But this might not be a problem of eZ publish at all, but one of the MySQL 5 cluster - or more likely missing ressources of the used hardware.

Here is a overview of the clusters setup:
Two servers for the MySQL-cluster:
-Pentium 4, 3 GhZ, Single-CPU
-1 GB RAM
-SATA-Raid
-Debian GNU Linux
-MySQL 5.0.16
-Gigabit LAN

One Webserver:
-Xeon 3.6 GhZ Irwindale, Dual-CPU
-2 GB-RAM
-SCSI Ultra320 (caching, battery powered) Raid
-Debian GNU Linux
-PHP 4.3.10-16
-Apache 2.0.54
-Gigabit LAN

At the moment one of the two cluster servers is used as the nbd-mng node, which doesn't make sense for a redundant solution, but should not affect the speed of the cluster. Both servers are running ndb-nodes and mysql-servers. The webserver sends its mysql requests via TCP and a MySQL-clients (v4.0.24) to the clusters mysql-servers, which shouldn't be a problem at all.

The two nodes are called matrix-01 and matrix-02. The webserver's name is zeus.
The eZ publish installation used is a standard eZ publish 3.6.4 (with ezodcsm) and has around 15k of objects.

Loading the content-overview-page in the admin take about 33s. This is because the 953 mysql requests takes 32s of the time, meaning request are served with a average respond time of about 36 ms by the mysql-cluster. Using a local mysql server on zeus the same installation with the same data basis, the same page takes less than 2 seconds to load.
I actually don't know why the cluster is this slow, but entering one of the 953 mysql-queries made by eZ publish on the local mysql-console of either matrix-01 or matrix-02 takes the same time as when it's made by eZ publish.

I discoverd that enormous traffic is exchanged by the two ndb-nodes. Here is an overview:

zeus <-> matrix-01 : 4 MB Traffic (zeus uses matrix-01 as SQL-server)

matrix-01 <-> matrix02: 337 MB Traffic

From matrix-01's point of view:
To   matrix-02: 12 MB
From matrix-02: 325 MB

To    zeus: 1.8 MB
From zeus: 2.2 MB

It doesn't change anything if I use matrix-02 as the mysql-server, except the traffics direction. I wonder why this much traffic is exchanged between the nodes, as eZ publish does only "SELECT"-queries by loading the page. Could there be a configuration error of the mysql cluster?

Does anyone other has experiences with mysql 5 cluster and eZ publish?

Daniel Beyer
_________________________________
YMC AG
Kreuzlingen, Switzerland
web: www.ymc.ch
____________________________________

Bertrand Dunogier

Wednesday 21 December 2005 10:38:23 am

As far as I know about MySQL Cluster, it requires LOTS of RAM to run properly. And when I say lots, it's not 2 GB... on mysql.com, the suggest something like 16 GB... So maybe you have a bottleneck here ;-)

We are going to setup a big website using MySQL Cluster in a few days, I really hope it's going to work correctly. I'll post infos about that later one !

Gabriel Ambuehl

Wednesday 21 December 2005 10:55:19 am

I dont think it necessarily needs a lot of RAM. It just needs enough RAM to fit the whole DB in RAM, together. For a site with a few MB, 1GB per server should be plenty.

Visit http://triligon.org

Daniel Beyer

Thursday 22 December 2005 2:16:24 am

Hi,

I actually don't think it's the RAM. I configured the hole thing to work with the single GB on each machine without using the swap. But just to be sure it isn't the missing RAM we already ordered 4 other GB of RAM, so each machine wil have 3 GB next week.

Btw: MySQL Cluster 5.0 is know to have problems with joins, doing a full search on the table for each single join. Maybe that's the problem...

Below is the result of atop 1.14 during a eZ publish page load:

PRC | sys 3050 ms | user 7980 ms | #thr     106 | #exits     0 | #zombie    0 |
CPU | sys     16% | user     52% | irq       6% | idle    123% | wait      2% |
cpu | sys      8% | user     28% | irq       5% | idle     57% | cpu000 w  1% |
cpu | sys      7% | user     24% | irq       1% | idle     66% | cpu001 w  1% |
MEM | tot  883.7M | free    8.5M | cache 184.8M | buff   42.4M | slab   28.4M |
SWP | tot    2.6G | free    2.6G | vmcom   1.3G | swin       0 | swout      0 |
DSK |         sda | busy      4% | read       0 | write     69 | avio    5 ms |

Daniel Beyer
_________________________________
YMC AG
Kreuzlingen, Switzerland
web: www.ymc.ch
____________________________________

Gabriel Ambuehl

Thursday 22 December 2005 2:26:25 am

Could it be that the join is done across the cluster? That would explain the heavy traffic, I guess?

Visit http://triligon.org

Daniel Beyer

Thursday 22 December 2005 3:45:37 am

Hi Gabriel.

Yes, joins are causing the traffic. That because every full search on the database is done across the cluster, which in this case need to exchange the db between the nodes.

Some might now think about turning only a single node on, which is exactly what came in my mind. If you do so, no mor heavy traffic is exchanged, but the speed won't increase much (it's about 5 seconds faster, which is the time the giga-net needs to transport the data). So it can be said, that in our setup the bottleneck can't be the network.

BTW.: The response time using the myisam engine on a clusters mysqld is just great. So the bottleneck isn't the link between the webserver and the cluster, either. But using myisam on a clusters mysqld won't give you any redundancy, as it is stored on a single machine.

Daniel Beyer
_________________________________
YMC AG
Kreuzlingen, Switzerland
web: www.ymc.ch
____________________________________

Gabriel Ambuehl

Thursday 22 December 2005 4:11:55 am

You can use master slave replication with normal tables though (which could be safer considering the data still resides on the hd then).

Visit http://triligon.org

Daniel Beyer

Thursday 22 December 2005 5:16:04 am

You're right, replication should be used for business critical sites today. But it lacks of redundancy and easy scalability, as - in common - you only have a single master. If the master goes down you probably won't lose data, but your eZ publish isn't avalible anymore. And if your master becomes to slow for it's job, you need to upgrade the hardware for this single machine. In a cluster you just have to add an other machine.

Those facts are why we want to use a cluster, as it is the only realy redundant and easy-scaling solution.

Daniel Beyer
_________________________________
YMC AG
Kreuzlingen, Switzerland
web: www.ymc.ch
____________________________________

Gabriel Ambuehl

Thursday 22 December 2005 6:34:51 am

AFAIK, ezpublish already knows about the concept of slave servers. This means you can have alll writes go to the master and fetch reads from a number of slaves. Theoreticaly (I haven't tested it), this should make it possible that the site at least stays up in read only mode when the master goes down.

But you're right, proper clustering would likely be more stable. I've even seen some reports about MySQL two way replication but I dont know how safe that is. As long as there's exactly ONE master writing at any given time, it might work without trouble.

Visit http://triligon.org

Daniel Beyer

Thursday 22 December 2005 7:46:02 am

If you enable the use of a slave server in eZ publish, every read-query will be send to it and every write-query will be send to the master. If you enable more than one slave in eZ publish only one random slave is used. If your master goes down, eZ publish won't answer anymore (someone correct me if this behavior changed recently). So simple master/slaves replication won't give redundancy.

There are setups for having multiple masters, meaning a slave is turned into a new master if the primary master goes down. But with replication you always have a single master (otherwise it would be a cluster). I personaly don't trust the master-switching thing, as there is always a risk sth. goes wrong with it. That could lead eg. to an inconsistent database, which might is much more worse than having a site down for a while.

I hope mysql is working on the join issue, as I think this is the current problem with the weak perfomance.

Daniel Beyer
_________________________________
YMC AG
Kreuzlingen, Switzerland
web: www.ymc.ch
____________________________________

Gabriel Ambuehl

Thursday 22 December 2005 10:27:18 am

Personally I wouldn't trust the cluster seeing that currently the DBs live in memory...

Visit http://triligon.org

Daniel Beyer

Friday 23 December 2005 10:59:10 am

Hi Gabriel,

you're right again. Having the DB only in the memory is a problem. But this hopefully will change with 5.1. But I don't think it's a problem with the right hardware. Todays servers normaly make use of ECC memory, and so smaller mistakes in the memory can be handeled without any data corruption. The servers we plan to use for the cluster in future are from the "zeus"-type. And those machine are capable of memory minoring. This means memory is handel in a RAID-1 style: one hole bank falls out -> the other takes over. But beneath the redundancy in a single machine (RAID, Dual-Power, Dual-CPU, Dual-Anything), you can plug as many machines in the cluster as you want to. So the redundacy inside a single machine might isn't anymore the main issue. The only thing you have to worry about, is a complete powerfail of the hole cluster (no power, no more data in the RAM of any machine).

So personally I think the cluster is the right thing in future. But it seems like the hole cluster thing needs some more development (on mysql's side).

Thanks a lot for your input so far and have a merry Christmas!

Daniel Beyer
_________________________________
YMC AG
Kreuzlingen, Switzerland
web: www.ymc.ch
____________________________________

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 18 2025 05:16:33
Script start
Timing: Jan 18 2025 05:16:33
Module start 'layout'
Timing: Jan 18 2025 05:16:33
Module start 'content'
Timing: Jan 18 2025 05:16:33
Module end 'content'
Timing: Jan 18 2025 05:16:33
Script end

Main resources:

Total runtime0.0166 sec
Peak memory usage2,048.0000 KB
Database Queries3

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0060 589.4453152.6250
Module start 'layout' 0.00600.0026 742.070339.4453
Module start 'content' 0.00860.0060 781.5156109.4609
Module end 'content' 0.01470.0019 890.976654.3047
Script end 0.0166  945.2813 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.002615.6316140.0002
Check MTime0.00116.4469140.0001
Mysql Total
Database connection0.00074.422310.0007
Mysqli_queries0.002715.973130.0009
Looping result0.00000.084710.0000
Template Total0.00148.710.0014
Template load0.00074.465410.0007
Template processing0.00074.165510.0007
Override
Cache load0.00052.924310.0005
General
dbfile0.00158.731380.0002
String conversion0.00000.060340.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 1
 Number of unique templates used: 1

Time used to render debug report: 0.0004 secs