Search for 'seb' instead of 'sebastiaan'

Author Message

Sebastiaan van der Vliet

Wednesday 29 September 2010 2:31:14 am

I'm using the 2.1.0-final version of eZ find. I want to search for part of a word/name instead of the complete word, e.g., when I search for 'seb', I also want to find 'sebastiaan'.It is possible using the wildcard (*), but I want to keep things simple for the end users.

In ezfind/java/solr/conf/schema.xml I made a copy of the field info for:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">

and called it:

<fieldType name="text_staff" class="solr.TextField" positionIncrementGap="100">

In this new fieldtype called 'text_staff' I then changed:

<analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

to

<analyzer type="index">
        <tokenizer class="solr.NGramTokenizerFactory" minGramSize="3" maxGramSize="15" />

I define a new field in the <fields> section:

<field name="ezf_staff_text" type="text_staff"  indexed="true"  stored="true" multiValued="true" termVectors="true"/>

I then added the following line to the copyFields:

<copyField source="attr_firstname_s" dest="ezf_staff_text"/>
<copyField source="attr_lastname_s" dest="ezf_staff_text"/>

Finally, in the file ezfind\classes\ezfezpsolrquerybuilder.php I change the following code around line 262 from:

 $highLightFields = $queryFields;
        $queryFields[] = eZSolr::getMetaFieldName( 'name' );
        $queryFields[] = eZSolr::getMetaFieldName( 'owner_name');

to

$highLightFields = $queryFields;
        $queryFields[] = eZSolr::getMetaFieldName( 'name' );
        $queryFields[] = eZSolr::getMetaFieldName( 'owner_name' );
        $queryFields[] = eZSolr::getFieldName( 'ezf_staff_text' );

And it works. However, it seems like a lot of changes to get the search for partial words working. Am I overdoing it? Did I miss something easier?

Thanks,
Sebastiaan

Certified eZ publish developer with over 9 years of eZ publish experience. Available for challenging eZ publish projects as a technical consultant, project manager, trouble shooter or strategic advisor.

Matthieu Sévère

Wednesday 29 September 2010 4:32:19 am

This is a smart workaround. But as you said quite complicated, I'm also interesting to see if there is a simpler solution.

--
eZ certified developer: http://ez.no/certification/verify/346216

Ivo Lukac

Wednesday 29 September 2010 6:38:51 am

Why not just copying into "ezf_df_text" solr field, then you don't need to hack ezfind code....

Changing schema.xml is normal ;)

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

Sebastiaan van der Vliet

Wednesday 29 September 2010 7:05:56 am

Hi Ivo, I don't like changing the default/standard Solr field settings. For one reason, I am not sure what it would do to the size of the entire index if all text fields are indexed like the staff_text. I also think that using the ezf_df_text solr field would still require adding the line below to ezfezpsolrquerybuilder.php.

$queryFields[] = eZSolr::getFieldName( 'ezf_df_text' );

Certified eZ publish developer with over 9 years of eZ publish experience. Available for challenging eZ publish projects as a technical consultant, project manager, trouble shooter or strategic advisor.

Paul Borgermans

Wednesday 29 September 2010 11:21:53 am

"

Hi Ivo, I don't like changing the default/standard Solr field settings. For one reason, I am not sure what it would do to the size of the entire index if all text fields are indexed like the staff_text. I also think that using the ezf_df_text solr field would still require adding the line below to ezfezpsolrquerybuilder.php.

$queryFields[] = eZSolr::getFieldName( 'ezf_df_text' );
"

Hi all

Ngram tokenisation is something that can inflate your index pretty badly.

@Sebastiaan: I'll patch ezfind soonish so you can have better control on what field types are used (along some more in this realm)

Take care you do the Ngram tokenisation only at index time, not query time.

Another approach that is taken sometimes is to use a synonym filter (at query time) to have fine grained control ... ngrams .. well ... are sometimes very useful, but can lead to many confusing search results as well.

hth

Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 18 2025 02:47:52
Script start
Timing: Jan 18 2025 02:47:52
Module start 'layout'
Timing: Jan 18 2025 02:47:52
Module start 'content'
Timing: Jan 18 2025 02:47:52
Module end 'content'
Timing: Jan 18 2025 02:47:52
Script end

Main resources:

Total runtime0.0149 sec
Peak memory usage4,096.0000 KB
Database Queries3

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0063 589.4922152.6406
Module start 'layout' 0.00630.0023 742.132839.4844
Module start 'content' 0.00870.0044 781.617298.6953
Module end 'content' 0.01310.0017 880.312538.2891
Script end 0.0149  918.6016 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.002617.4452140.0002
Check MTime0.00117.5012140.0001
Mysql Total
Database connection0.00074.989610.0007
Mysqli_queries0.002113.870730.0007
Looping result0.00000.153710.0000
Template Total0.00149.710.0014
Template load0.00085.484210.0008
Template processing0.00064.134810.0006
Override
Cache load0.00053.457710.0005
General
dbfile0.00031.848980.0000
String conversion0.00000.067240.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 1
 Number of unique templates used: 1

Time used to render debug report: 0.0001 secs