Sharding ez publish database

Author Message

Remigijus Kiminas

Tuesday 02 June 2009 10:01:38 pm

Hellow,

I'm wondering is it possible to implement sharding in ez publish database model. I mean splitting main tables into smaller one, like youtube, facebook and many others does. Idea is simple instead of storing all record's in one monolitic database split records acroos smaller tables, databases.

Why this is needed ?
It would give unlimited scalability. Like currently i realy don't know how could single mysql server handle database with millions of records of content object attributes...

How can this be archieved ?
Some ideas there. Actualy in one of extension i implemented range sharding it's quite easy.
http://blog.maxindelicato.com/2008/12/scalability-strategies-primer-database-sharding.html
If it would be implemented, i thik ez publish would become just perfect :)

Any ideas ?

---------------------------------------------
Remigijus Kiminas

Christian Rößler

Wednesday 03 June 2009 12:37:02 am

Hy,

it would be a simple thing to try out partitioning.
http://dev.mysql.com/doc/refman/5.1/en/partitioning-overview.html

instead of storing millions of ezcontentobject_attributes in one physical table, with partitioning you are able to partition the table into 'virtually' multiple ones, each one holding a subset of all the data.
Nothing has to be changed on the ezpublish-side, as the manipulated/partitioned table looks like any other table, but the (ie. mysql) DBMS takes care of managing the data... pretty complex thing, try to read in. But shurely will make things a bit faster :)

Partitioning: store ezco_attrib from id a to c in part A, from id d to e in part B ... and so on.
It's like partitioning a harddisk...

----------

The second thing you could try is your mentioned sharding. Sharding has to be implemented in the model part of mvc. So ezpublish needs to me modified. This is a more complex part and nearly not possible, as it breaks a lot of code/logic...

----------

A third solution would be to use clustering-feature or master-slave feature.
Master-slave feature is simple to activate as it seems to be active-code in ezpublish. one db-server is used for read-operations, the other one for write-operations which get replicated on the 'read-only'-server.

But you seemed to be interested in sharding - so solution A is an option (partititoning) and the sharding feature itself is nice but very complicated to implement on such a complex system as ezpublish. Also remember that ezP already exists. Sharding is more easy to implement when beginning a new project. Thus you don't have to take care for any upgrades/downgrade issues...

One thing that came in my mind right now: memcache. that is such a thing that would significantly improve performance, but also needs alteration of ezPublish-models (persistent db layer)...

just my 2 cents.

if you intend to write such modifications, let me know. I'm interested in it (not needing it, but extremely interested how you'll solve it)

Christian

Hannover, Germany
eZ-Certified http://auth.ez.no/certification/verify/395613

Gaetano Giunta

Wednesday 03 June 2009 12:48:47 am

Maybe not as cheap as mysql to install, configure or maintain, but Oracle has had table-partitioning and server-clustering (rac) for ages.
They also claim that their handling of blobs is excellent, which should make it a good platform for eZ Publish "cluster mode".
It might be worth a try, if you're going to have a huge eZ Publish installation and money is not a problem.

I thing in general it is a good idea to let the db do the scaling instead of pushing more complexity into the web layer...

Principal Consultant International Business
Member of the Community Project Board

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 18 2025 04:08:50
Script start
Timing: Jan 18 2025 04:08:50
Module start 'layout'
Timing: Jan 18 2025 04:08:50
Module start 'content'
Timing: Jan 18 2025 04:08:50
Module end 'content'
Timing: Jan 18 2025 04:08:50
Script end

Main resources:

Total runtime0.0121 sec
Peak memory usage2,048.0000 KB
Database Queries3

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0047 589.0313152.6250
Module start 'layout' 0.00470.0022 741.656339.4453
Module start 'content' 0.00690.0034 781.101693.3203
Module end 'content' 0.01030.0018 874.421934.3047
Script end 0.0120  908.7266 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.002217.8758140.0002
Check MTime0.00108.6639140.0001
Mysql Total
Database connection0.00075.695810.0007
Mysqli_queries0.001915.508830.0006
Looping result0.00000.098510.0000
Template Total0.001512.710.0015
Template load0.00075.786510.0007
Template processing0.00086.894110.0008
Override
Cache load0.00043.563310.0004
General
dbfile0.00021.395480.0000
String conversion0.00000.057240.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 1
 Number of unique templates used: 1

Time used to render debug report: 0.0001 secs