Facets : Truncated values on ezstring

Author Message

H-Works Agency

Monday 12 July 2010 3:33:20 am

Hello,

I am trying to configure ezfind on a website but when i ask for "class/attribute" facet, on a ezstring datatype, all results are truncated.

For exemple if you have keyword "my-house" you get results :

  • my
  • house
  • hous

I understand the possible benefit of this in certain cases but how can i modify this behavior ? Is there a way to tell solr "Hey please disable this truncate word feature".

EZP is Great

Ivo Lukac

Monday 12 July 2010 9:18:31 am

The problem (or feature) you experience is the result of how Solr tokenizes text. There is a word delimiter filter while indexing which breaks down the words with 'dash' in it. These tokens are then used for faceting.

There are new functionalities in ezfind 2.2 regarding this (using special fields for faceting) but I didn't explore it yet.

But you can always tune schema.xml :)

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

Paul Borgermans

Monday 12 July 2010 3:20:34 pm

Indeed, in ezfind 2.2 you can define dedicated field types for attributes in a facet context ... this was introduced exactly for having both meaningful search results (in that case you usually want this "break up") and facets/sorting (where you want verbatim strings).

What datatype is used for keywords? You are using either eZ Find 2.0 or eZ Find 2.1+ with a text field judging from your results

Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

H-Works Agency

Monday 26 July 2010 6:17:47 am

Thank you for those informations.

This solr query syntax looks very powerful.

EZP is Great

Sebastiaan van der Vliet

Tuesday 27 July 2010 5:45:30 am

In case you do want to tune schema.xml, here is the information you need. Leave in the line:

<dynamicField name="*_t" type="text" indexed="true" stored="true" multiValued="true" termVectors="true"/>

but add a definition underneath that one for your own field, and replace type="text" in type="long", e.g.:

<field name="attr_dc_coverage_t" type="long" indexed="true" stored="true" multiValued="true"/>

Certified eZ publish developer with over 9 years of eZ publish experience. Available for challenging eZ publish projects as a technical consultant, project manager, trouble shooter or strategic advisor.

H-Works Agency

Tuesday 05 October 2010 10:18:22 am

Hello everyone and thank you for the answers.

For example my facets results for a city attribute is "Paris, Pari" But the "s" letter is not a word separator isn't it ?

I tried Sebastiaan answer by adding :

<field name="attr_ville_t" type="long" indexed="true" stored="true" multiValued="true"/>

just after the mentionned line but it doesn't change anything :(

EZP is Great

Sebastiaan van der Vliet

Tuesday 05 October 2010 11:18:26 am

This looks interesting too: on http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory check out the entry for
solr.WordDelimiterFilterFactory, which has an option preserveOriginal="1", which causes the original token to be indexed without modifications (in addition to the tokens produced due to other options).

for example:

<fieldtype name="subword" class="solr.TextField">       <analyzer type="query">           <tokenizer class="solr.WhitespaceTokenizerFactory"/>           <filter class="solr.WordDelimiterFilterFactory"                 generateWordParts="1"                 generateNumberParts="1"                 catenateWords="0"                 catenateNumbers="0"                 catenateAll="0"                 preserveOriginal="1"                 />
etc...

Certified eZ publish developer with over 9 years of eZ publish experience. Available for challenging eZ publish projects as a technical consultant, project manager, trouble shooter or strategic advisor.

Patrick Kaiser

Tuesday 05 October 2010 5:09:25 pm

You can control the way content is indexed by defining a mapping between ez-datatypes and solr field-types. This can be configured in ezfind.ini[.append.php] independently for searching, sorting, faceting and filtering. For faceting the solr-field-type "string" is probably what you want.

[SolrFieldMapSettings]
# this is the configuration for searching
DatatypeMap[ezstring]=text
...

# for sorting
DatatypeMapSort[]
DatatypeMapSort[ezstring]=string
...

# for faceting 
DatatypeMapFacet[]
DatatypeMapFacet[ezstring]=string
...

# for filtering
DatatypeMapFilter[]
DatatypeMapFilter[ezstring]=string
..

Remember to run updatesearchindexsolr.php after you make these changes. hope this helps.


Best regards,

Patrick

H-Works Agency

Wednesday 06 October 2010 1:47:02 am

Damn still not working :(

I added those variables in ezfind.ini (which seems to be cleaner than modifying system wide schema.xml) :

  • DatatypeMap[ezstring]=string
  • DatatypeMapSort[ezstring]=string
  • DatatypeMapFilter[ezstring]=string
  • DatatypeMapFacet[ezstring]=string
  • Default=string

Then rerun updatesearchindexsolr.php -s $siteaccess_name --clean-all

My ezstring attribute still return facets truncated values.

EZP is Great

Sebastiaan van der Vliet

Thursday 07 October 2010 12:16:09 am

Martin, did you also try the option below in scheme.xml?

preserveOriginal="1"

Certified eZ publish developer with over 9 years of eZ publish experience. Available for challenging eZ publish projects as a technical consultant, project manager, trouble shooter or strategic advisor.

H-Works Agency

Thursday 07 October 2010 2:44:21 am

I tried to put this everywhere but tweaking of ezfind.ini or schema.xml seems to have no effect on what solr or ezfind returns.

Even deleting or bugging schema.xml doesn't change anything : After running "updatesearchindexsolr.php" all facets results remains the same !!!

Could someone tell me which schema.xml do we have to edit ? Here is the list i found :

  • ./java/solr/conf/schema.xml
  • ./java/solr.multicore/eng-GB/conf/schema.xml
  • ./java/solr.multicore/fre-FR/conf/schema.xml
  • ./java/solr.multicore/nor-NO/conf/schema.xml

None of those seems to be used ? if i delete all those files nothing changes.

EZP is Great

Sebastiaan van der Vliet

Monday 18 October 2010 7:03:48 am

Two quick checks:

Did you restart solr after editing schema.xml?
Did you delete your previous index first and then commit?

Certified eZ publish developer with over 9 years of eZ publish experience. Available for challenging eZ publish projects as a technical consultant, project manager, trouble shooter or strategic advisor.

H-Works Agency

Monday 18 October 2010 8:04:30 am

Thank you. In fact i haven't restarted solr.

What do you mean by deleting previous index and commit ? Commit = restart solr with new schema.xml ?

When i add my directive : <field name="attr_ville_t" type="long" indexed="true" stored="true" multiValued="true"/>

Then solr is crashing : (curl error 7)

EZP is Great

H-Works Agency

Monday 18 October 2010 8:52:27 am

Hello Patrick,

What does those modifications on ezfind.ini are supposed to do ?

Are they supposed to modify the way facets are returned through DatatypeMapFilter[] ?

I really don't get it as nothing ever change no matter what i modify in this file.

EZP is Great

H-Works Agency

Monday 18 October 2010 9:00:20 am

"

This looks interesting too: on http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory check out the entry for
solr.WordDelimiterFilterFactory, which has an option preserveOriginal="1", which causes the original token to be indexed without modifications (in addition to the tokens produced due to other options).

for example:

<fieldtype name="subword" class="solr.TextField">       <analyzer type="query">           <tokenizer class="solr.WhitespaceTokenizerFactory"/>           <filter class="solr.WordDelimiterFilterFactory"                 generateWordParts="1"                 generateNumberParts="1"                 catenateWords="0"                 catenateNumbers="0"                 catenateAll="0"                 preserveOriginal="1"                 />
etc...
"

This is what i get after adding "preserveOriginal="1" to schema.xml on line 221 (then restarting solr, then removing extension/ezfind/java/(...)/data/*, then rerunning updatesolrindex) :

"
<body><h2>HTTP ERROR: 500</h2><pre>Severe errors in solr configuration.  Check your log files for more detailed information on what may be wrong.  If you want solr to continue after configuration errors, change:    &lt;abortOnConfigurationError&gt;false&lt;/abortOnConfigurationError&gt;  in solr.xml
"

EZP is Great

Patrick Kaiser

Monday 18 October 2010 9:10:55 am

if you follow my directions then there should be no need to even touch the schema.xml.

If you didnt configure multicore solr, then your schema.xml ist this one: ./java/solr/conf/schema.xml

Before proceeding replace your messed up one with the orginal file. restart solr and make sure solr runs.

Then really make sure the attribute you want to facet on is of type ezstring (perhaps you are using eztext or something?). You could also try the ezkeyword datatype which should work "out of the box".

did this help?


Best regards,

Patrick

H-Works Agency

Monday 18 October 2010 9:31:51 am

Ok Patrick.

Thanks all those informations are a great help for me to finally being able to use ezfind on production projects.

My attribute is a simple "ezstring" attribute holding city names.

If i just add : DatatypeMapFacet[ezstring]=lckeyword in ezfind.ini then its not changing anything. after rerunning updatesolrindex.

My results are still truncated like this :

  • Paris become Pari
  • Rennes become Renn
  • ...etc

Do i need to insert : DatatypeMapFacet[ezstring]=string ?

EZP is Great

Patrick Kaiser

Monday 18 October 2010 10:13:02 am

I meant you should try add a new field of type ezkeyword to your class in addition to your existing city attribute. edit a few objects and add content in the new keyword field.

adjust ezfind.ini:

DatatypeMapFacet[]
DatatypeMapFacet[ezstring]=string
DatatypeMapFacet[ezkeyword]=lckeyword

clear the cache and control in admin interface if the siteaccess you are using for your facet tests uses the right settings for ezfind.ini (it really seems that the settings for ezfind.ini are not used).

then rerun updatesearchindexsolr.php -s YOUR_SITEACCESS --clean-all

then you can try try faceting on both fields, actually both should work.


Best regards,

Patrick

H-Works Agency

Tuesday 19 October 2010 10:57:42 am

You were right i had a loading problem with my ezfind.ini....extension loading order problem...:(

Now everything works with your directives !

Thanx a lot !

EZP is Great

Michele Paoli

Friday 01 April 2011 4:08:25 am

Hi everybody,

I have the same problem with an ezobjectrelationlist attribute.
Word space are considered not as a "normal charachter", but as a separator char.
I tried setting up ezfind.ini in this way:
DatatypeMap[ezobjectrelationlist]=text
DatatypeMapSort[ezobjectrelationlist]=string
DatatypeMapFacet[ezobjectrelationlist]=string
DatatypeMapFilter[ezobjectrelationlist]=string

I set preserveOriginal="1" on

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">...

<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>...

I restart solr, reindex, but facets of content type ezobjectrelationlist are truncated.
Could someone help me?

Bye

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.