How to speed up import of 1M objects?

Author Message

zurgutt -

Wednesday 22 December 2010 3:38:19 pm

I have to migrate lots of content from one ez installation to other (4.0 -> 4.3). It is not a straight upgrade, there are custom scripts to convert objects to new classes etc.

Problem is, there is nearly a million objects, so while export runs at reasonable speed, the import/publish operations are slow and by my estimates would take days to finish.

I can dedicate a server for this operation and tune it specificly. It is a reasonably fast box with Xeon E5520@2.27GHz and 12G of ram.

Can you suggest any specific tuneups or tricks to temporarily speed up insert/publish operations for the duration of import?

Certified eZ developer looking for projects.
zurgutt at gg.ee

Jérôme Vieilledent

Wednesday 22 December 2010 10:03:20 pm

Hi Zurgutt

SQLIImport tunes up some performance settings for imports such as :

  • View cache deactivation (only for the script)
  • Delayed indexing

Once the import process is over, a cleanup cronjob runs to clear the cache and trigger indexing.

If you're not using this extension, maybe you should consider it. You could do your transformation stuffs in your important handler :)

Ivo Lukac

Thursday 23 December 2010 4:41:51 am

I second everything what Jerome wrote. With additional few notes:

1. most important thing is to spread nodes over lot of parent nodes. We had lot of bad experience with importing thousands of objects under same node as single publish is a bit slower with every new sibling. I didn't have time to investigate why is that, maybe it can be avoided somehow...

2. to reduce single publish try to hack temporary "publish" operation definition in kernel/content/operation_defintion.php and remove every method that is not crucial, like:
post_publish, remove-temporary-drafts, create-notification, register-search-object, generate-object-view-cache, clear-object-view-cache, pre_publish.
Maybe even some others. You need to know exactly what you are doing, of course. Try different hacks with couple of thousands and measure the single average publish time....

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

gilles guirand

Thursday 23 December 2010 1:22:55 pm

I agree,

@Ivo : When you tell "hack" : you mean execute a specific static PHP method and/or unset some INI values before importing datas, i guess :) ?

--
Gilles Guirand
eZ Community Board Member
http://twitter.com/gandbox
http://www.gandbox.fr

Ivo Lukac

Tuesday 28 December 2010 3:18:31 am

"

I agree,

@Ivo : When you tell "hack" : you mean execute a specific static PHP method and/or unset some INI values before importing datas, i guess :) ?

"

No, with hack I mean go to kernel/content/operation_defintion.php and comment out some parts of publish method :) temporary just for importing

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

Ivo Lukac

Tuesday 28 December 2010 5:43:44 am

Aditionaly, it could be lucrative performance wise to hack out some features (e.g. browserecent, etc), but generally I think those should be possible to disable through ini settings.

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 30 2025 16:39:19
Script start
Timing: Jan 30 2025 16:39:19
Module start 'layout'
Timing: Jan 30 2025 16:39:19
Module start 'content'
Timing: Jan 30 2025 16:39:19
Module end 'content'
Timing: Jan 30 2025 16:39:19
Script end

Main resources:

Total runtime0.0333 sec
Peak memory usage4,096.0000 KB
Database Queries3

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0076 589.2734151.2109
Module start 'layout' 0.00770.0050 740.4844220.7109
Module start 'content' 0.01270.0185 961.19531,006.4922
Module end 'content' 0.03110.0021 1,967.687537.9766
Script end 0.0332  2,005.6641 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00309.1435140.0002
Check MTime0.00123.4690140.0001
Mysql Total
Database connection0.00133.932110.0013
Mysqli_queries0.00339.854530.0011
Looping result0.00000.050910.0000
Template Total0.00154.510.0015
Template load0.00113.426710.0011
Template processing0.00031.028510.0003
Override
Cache load0.00072.224810.0007
General
dbfile0.003510.567780.0004
String conversion0.00000.027240.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 1
 Number of unique templates used: 1

Time used to render debug report: 0.0001 secs