Forums / General / How to speed up import of 1M objects?

How to speed up import of 1M objects?

Author Message

zurgutt -

Wednesday 22 December 2010 3:38:19 pm

I have to migrate lots of content from one ez installation to other (4.0 -> 4.3). It is not a straight upgrade, there are custom scripts to convert objects to new classes etc.

Problem is, there is nearly a million objects, so while export runs at reasonable speed, the import/publish operations are slow and by my estimates would take days to finish.

I can dedicate a server for this operation and tune it specificly. It is a reasonably fast box with Xeon E5520@2.27GHz and 12G of ram.

Can you suggest any specific tuneups or tricks to temporarily speed up insert/publish operations for the duration of import?

Certified eZ developer looking for projects.
zurgutt at gg.ee

Jérôme Vieilledent

Wednesday 22 December 2010 10:03:20 pm

Hi Zurgutt

SQLIImport tunes up some performance settings for imports such as :

  • View cache deactivation (only for the script)
  • Delayed indexing

Once the import process is over, a cleanup cronjob runs to clear the cache and trigger indexing.

If you're not using this extension, maybe you should consider it. You could do your transformation stuffs in your important handler :)

Ivo Lukac

Thursday 23 December 2010 4:41:51 am

I second everything what Jerome wrote. With additional few notes:

1. most important thing is to spread nodes over lot of parent nodes. We had lot of bad experience with importing thousands of objects under same node as single publish is a bit slower with every new sibling. I didn't have time to investigate why is that, maybe it can be avoided somehow...

2. to reduce single publish try to hack temporary "publish" operation definition in kernel/content/operation_defintion.php and remove every method that is not crucial, like:
post_publish, remove-temporary-drafts, create-notification, register-search-object, generate-object-view-cache, clear-object-view-cache, pre_publish.
Maybe even some others. You need to know exactly what you are doing, of course. Try different hacks with couple of thousands and measure the single average publish time....

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

gilles guirand

Thursday 23 December 2010 1:22:55 pm

I agree,

@Ivo : When you tell "hack" : you mean execute a specific static PHP method and/or unset some INI values before importing datas, i guess :) ?

--
Gilles Guirand
eZ Community Board Member
http://twitter.com/gandbox
http://www.gandbox.fr

Ivo Lukac

Tuesday 28 December 2010 3:18:31 am

"

I agree,

@Ivo : When you tell "hack" : you mean execute a specific static PHP method and/or unset some INI values before importing datas, i guess :) ?

"

No, with hack I mean go to kernel/content/operation_defintion.php and comment out some parts of publish method :) temporary just for importing

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

Ivo Lukac

Tuesday 28 December 2010 5:43:44 am

Aditionaly, it could be lucrative performance wise to hack out some features (e.g. browserecent, etc), but generally I think those should be possible to disable through ini settings.

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

eZ debug

Timing: Jan 18 2025 02:20:14
Script start
Timing: Jan 18 2025 02:20:14
Module start 'content'
Timing: Jan 18 2025 02:20:14
Module end 'content'
Timing: Jan 18 2025 02:20:14
Script end

Main resources:

Total runtime0.0154 sec
Peak memory usage2,048.0000 KB
Database Queries4

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0050 588.8594180.8438
Module start 'content' 0.00500.0046 769.7031102.5703
Module end 'content' 0.00960.0057 872.273478.6875
Script end 0.0153  950.9609 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.002113.8803120.0002
Check MTime0.00116.8589120.0001
Mysql Total
Database connection0.00074.663410.0007
Mysqli_queries0.002113.586140.0005
Looping result0.00000.079020.0000
Template Total0.005334.310.0053
Template load0.00085.468510.0008
Template processing0.004428.748410.0044
Override
Cache load0.00063.876910.0006
General
dbfile0.00127.5277100.0001
String conversion0.00000.031030.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1pagelayout.tpl<No override>extension/sevenx/design/simple/templates/pagelayout.tplEdit templateOverride template
 Number of times templates used: 1
 Number of unique templates used: 1

Time used to render debug report: 0.0001 secs