eZFind 2.0 very slow

Author Message

H-Works Agency

Tuesday 22 September 2009 1:50:24 am

Hello,

We installed eZFind on a medium size website (300.000 contents, 2,5 millions in content attributes table).

We tried to enable eZFind on this site and its a complete failure :

- Indexation take like 7 or 8 hours
- the site become so slow it can't be used

Its not a Solr problem because direct query on the server is very quick.

Are those problems related to the fact my DB is in MYISAM format ?

What are the potential aspects of ezpublish connector to Solr which can cause such performance problems ?

Thanx for any help !

EZP is Great

Sander van den Akker

Tuesday 22 September 2009 2:34:39 am

Do you have many attributes of type ezkeyword (which automatically relates objects to each other)? This is a known performance hog because of infinite loops.

eZ Publish certified developer
http://auth.ez.no/certification/verify/392313

H-Works Agency

Tuesday 22 September 2009 2:46:56 am

Uhm yes we have a big quantity of related objects & ezkeywords....is there a way to correct this bug/problem ?

EZP is Great

Paul Borgermans

Tuesday 22 September 2009 5:11:11 am

Hi Martin

What ezp version are you using? MyISAM is in every aspect a bad choice and not supported

What the relation is with ez find and the performance problem:

- if it is for indexing
- or browsing/searching
- or both?

otoh, the slow indexing may be due to:
- not using the dedicated Solr indexing script ( extension/ezfind/bin/php/updatesearchindexsolr.php )
- a bug in ez find 2.0 where teh spell checker is rebuilt on every "commit" to Solr

...

But I need to know more details to help you out on this

Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

H-Works Agency

Tuesday 22 September 2009 6:23:21 am

Hi Paul & thanx for your answer,

Ok for the table engine, we're gonna switch to InnoDB ! (Though i had a very bad experience with this engine with a corrupted ezcontentobject_attribute table which could never be repaired = restore = data lost).

eZFind indexing is reaching 100% although i find it very slow (5 to 6 hours) with provided indexing script (extension/ezfind/bin/php/updatesearchindexsolr.php).

Once indexed search is 10 times slower than with the default search engine, and the whole site is slow.

The problem seems to be related to the modification/publication of objects, it looks like everytime someone is modifying something ezfind is working on something.

When making a "top" while trying to display a page we identified a "java" SolR process taking 120% of CPU.

Given that we don't understand what its doing.

There is a few points about eZFind we don't understand :

- How new objects are indexed ?
- Do we have to remove our cron job "updatesearchindex" and install eZFind one instead ?
- After indexation this script is always restarting from scratch so how can it index new contents ?

I hope this is clear enough for you to leed me on a few solutions.

Thanx anyway for your time.

Martin

EZP is Great

Ivo Lukac

Tuesday 22 September 2009 6:55:49 am

First of all enable DelayedIndexing.

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

Sander van den Akker

Tuesday 22 September 2009 7:01:41 am

I have the same problem. When eZ Find is enabled, publishing content is much slower. By disabling the 'searchable' flag on some class attributes I managed to reduce the response time. However, this is not always an option.

You might also want to read this thread: http://ez.no/developer/forum/extensions/ez_find/ezfind2_indexing_speed_incredibly_low_er/re_ezfind2_indexing_speed_incredibly_low_er__2

eZ Publish certified developer
http://auth.ez.no/certification/verify/392313

Paul Borgermans

Tuesday 22 September 2009 9:42:36 am

A few things to improve the speed and large sites:

- in [IndexOptions] set OptimizeOnCommit=disabled. Then configure an additional cron that calls the script extension/cronjobs/ezfoptimizeindex.php once a day or so (during low traffic hours)

This is most probably going to solve most performance problems

For additional performance gains:

- patch the solr.war file with the one from trunk in extension/ezfind/java/webapps/solr.war
- check that the spellcheck configuration is set to rebuild upon optimize in extension/ezfind/java/solr/conf/solrconfig.xml (this is the default, but may not work until you patched solr.war)

This otherwise also causes high CPU load, as the spell chek index would otherwise be built on every object update (and with several 100k of objects, that takes a lot of time)

Abut delayed indexing: this should not be much of a concern with the above changes, but in any case, you should use the ezfind provided cronjob ezfindexcontent.php (to be run rather regularly, like every 10 minutes) in combination with the daily optilize cronjob

hth
Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Paul Borgermans

Tuesday 22 September 2009 9:47:35 am

One additional note for delayed indexing:

There is a bug in eZPublish 4.1.3 that causes delayed indexing only to work when edits are done once between runs of the ezfindexcontent.php cronjob.

Please use the patched file from svn:

http://pubsvn.ez.no/nextgen/release/4.1.4/kernel/content/ezcontentoperationcollection.php

to avoid this bug as well

hth
Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Carlos Revillo

Wednesday 23 September 2009 12:35:14 pm

I second Paul about optimizeoncommit setting. We had this problem until we disable it. from then, we have delayed indexing disabled but good perfomance too.

H-Works Agency

Tuesday 11 January 2011 9:25:22 am

Great !!! Disabling this variable resolved my problem.

Thanx Paul.

EZP is Great

H-Works Agency

Thursday 27 January 2011 3:59:09 am

Hello everyone,

I have still a problem on this : In fact my speed problem were totally resolved with the modifications in ezfind.ini mentionned above but the solr index isn't updated after running :

php runcronjobs.php -q -s siteaccess ezfindexcontent 2>&1

My question is : What exact cronjob script is supposed to run to update the solr index every 10 minutes ?

Documentation says "run a cronjob if you delay indexing" but never mention explicitly which cronjob.

When i read the ezfindexcontent code i seems to wait for lines inside ezpending_actions sql table to know which objects to process.

Given this my second question is : Why this table is always empty after an admin content publish when "OptimizeOnCommit=disabled" ?

My third question is : Do i need to manually insert a line in ezpending_actions to fire solr index update ?

I find no documentation on this, any help would be great.

Cheers

EZP is Great

Gaetano Giunta

Saturday 29 January 2011 2:37:38 am

q: "What exact cronjob script is supposed to run to update the solr index every 10 minutes ?"

a: the ezfindexcontent one

The pending actions table should be filled up by eZP automatically at the time objects are published (thsi only happens if delayedindexing is enabled, otherwise they get indexed immediately). So you do not need to touch it.

optimizeoncommit is a setting that only affects sending optimize commands to solr, it has no impact on who/when the indexation takes place.

So I recommend that to debug you

- make sure have no cronjobs running at all

- publish an object

- look at ezpendingactions table. No lines in there => you have actually not enabled delayedindexing correctly

- run the ezfindexobject cronjob, and check that the line in ezpendingactions is gone

Principal Consultant International Business
Member of the Community Project Board

H-Works Agency

Saturday 29 January 2011 3:02:23 am

Thank you Gaetano for this clear debugging process.

It works after setting DelayedIndexing=enabled in site.ini.

Thanx !

EZP is Great

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.