Forums / General / Exclude objects from indexing

Exclude objects from indexing

Author Message

Andrew Kelly

Tuesday 26 June 2007 2:27:13 am

Hello all,

Occasionally I'd like to exclude objects from being indexed so that they can not and will not be part of any search results. Is this at all possible and if so, how?
Also, how can I remove object from currently existing indexes?

Andy

André R.

Tuesday 26 June 2007 2:44:53 am

One way would be to create a new class where all attributes are marked as not indexed, and then use the 'change class' extension to convert the existing once.

You might have to republish the objects you convert.

eZ Online Editor 5: http://projects.ez.no/ezoe || eZJSCore (Ajax): http://projects.ez.no/ezjscore || eZ Publish EE http://ez.no/eZPublish/eZ-Publish-Enterprise-Subscription
@: http://twitter.com/andrerom

Andrew Kelly

Tuesday 26 June 2007 3:00:41 am

Hi André,

thanks for such a quick reply.

Unfortunately, this isn't a viable alternative. The class is well defined and heavily used, and changing things would create a lot of unnecessary work.
An analogy as an example of what I need would be this:
"Folder objects" are used heavily throughout the site, and the title and description should always be indexed and findable through searches. However, 3 such folders are completely meaningless (and in fact stupid) when they show up in a search and should be removed from search results.
That's not exactly the case here, but the analogy is completely analog.
Just blocking those item through the search template seems a fine idea, until they are part of return set large enough to require pagination, which then messes everything up.
The cleanest solution for me is to treat the items as non-existent in the context of a search.

Unless and until I'm informed of a better aproach, I think what I'm going to do is turn on delayed indexing, and then hack the reindexing script to skip items based on certain criteria.

So, I'm still open for a better solution on how NOT to index a normally indexable object, as well as a solution for removing individual objects which have already been indexed.

Andy

André R.

Tuesday 26 June 2007 3:07:00 am

Well thats not a hack if you create a copy of the 'delayed index script', and do what you want.
If you want to do it cleaner you can add a check box attribute on the class called something like 'Do not add to search engine'. And then look for that one in the indexing script.

EDIT: This is of course if you want your editors to control this, so you don't have to be bothered with it again..

eZ Online Editor 5: http://projects.ez.no/ezoe || eZJSCore (Ajax): http://projects.ez.no/ezjscore || eZ Publish EE http://ez.no/eZPublish/eZ-Publish-Enterprise-Subscription
@: http://twitter.com/andrerom

Andrew Kelly

Tuesday 26 June 2007 3:26:39 am

Yes,
I've been thinking about the checkbox approach, but I think this will happen far too infrequently to worry about.

So, is there a scripted way to remove already indexed objects, or am I going to need to modify the DB directly?

(and thanks again for your assistance in all this)
Andy

Andrew Kelly

Wednesday 27 June 2007 3:36:39 am

Forgive the self follow-up,

I just wanted to present a resolution in case anybody else ever has the same issue come up.

I was able to handle everything within cronjobs/indexcontent.php

To stop an object from being added to the search index, I wrapped the two lines:

eZSearch::removeObject( $object );
eZSearch::addObject( $object );

in a conditional statement, and didn't execute them if certain criteria were met. Basically I grabbed the datamap of the object, and checked the value of an attribute.

To remove already indexed objects, I cheated a little bit. I commented out the line:

eZSearch::addObject( $object );

and made sure that the only entries in ezpending_actions were for the object IDs of what needed to be removed.

Andy

dexen deVries

Wednesday 27 June 2007 3:56:23 am

IIRC, search results include only objects to which you have Read permission. Not sure, but ~90%. Thus, if you restrict access to some objects, you should be safe there.

André R.

Wednesday 27 June 2007 7:16:30 am

I think he still wanted them to be accessible, just not indexable.

But this could also be solved by using the limitation pram on search. Basically searching with a user with less rights then anonymous, for instance not rights to read content in 'Not indexable' section. But the limitation param is not well documented, so your a bit on your own there..

eZ Online Editor 5: http://projects.ez.no/ezoe || eZJSCore (Ajax): http://projects.ez.no/ezjscore || eZ Publish EE http://ez.no/eZPublish/eZ-Publish-Enterprise-Subscription
@: http://twitter.com/andrerom