Forums / Developer / Accented characters are not working in solr search

Accented characters are not working in solr search

Author Message

Praveen Kumar

Tuesday 16 August 2011 5:00:23 pm

Hi, 
This is Praveen. I am using apache-solr in our project to support search on cities. I having a problem with the accented characters while searching. 
For example: 
My city name is 'vrély'. 
if i search for 'vr*', it is giving the result. 
But if i search for 'vrél*', it is not giving any results.  
But if i search without accented characters like 'vre*', it again give results. 
My city field type is "text" and my schema.xml for this as follows: 
        <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                
                
                <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.ASCIIFoldingFilterFactory"/>
                <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.ASCIIFoldingFilterFactory"/>
                <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
            </analyzer>
        </fieldType>
Any suggestions or solution to resolve my problem is appreciable. 
Thanks in Advance... 
Regards, 
Praveen Kumar 

Ivo Lukac

Wednesday 17 August 2011 12:59:15 am

There could be 2 things:

- either your index and query analyzer are not the same (e.g. there is a small difference: catenateWords="0" catenateNumbers="0") so tokens are not the same in both situations or

- the "é" character is somehow badly encoded when sent to solr as a query

I had a similar problem before when I used jetty, it didn't support utf-8 queries very well. I switched to tomcat. Could be that jetty resolved those issues in newer version, I didn't check.

Anyway, you need to be aware that "vrély" is always tokenized as "vrely", that is why you are finding it with vr* and vre*

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

Philippe VINCENT-ROYOL

Wednesday 17 August 2011 1:24:20 am

Just a question : which version of solr do you use? 

Certified Developer (4.1): http://auth.ez.no/certification/verify/272607
Certified Developer (4.4): http://auth.ez.no/certification/verify/377321

G+ : http://plus.tl/dspe
Twitter : http://twitter.com/dspe

eZ debug

Timing: Jan 29 2025 13:43:48
Script start
Timing: Jan 29 2025 13:43:48
Module start 'content'
Timing: Jan 29 2025 13:43:48
Module end 'content'
Timing: Jan 29 2025 13:43:48
Script end

Main resources:

Total runtime0.1832 sec
Peak memory usage2,048.0000 KB
Database Queries141

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0112 588.0781180.8203
Module start 'content' 0.01120.0060 768.898498.0391
Module end 'content' 0.01720.1659 866.9375533.3594
Script end 0.1831  1,400.2969 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00341.8473200.0002
Check MTime0.00120.6792200.0001
Mysql Total
Database connection0.00070.393610.0007
Mysqli_queries0.127169.35591410.0009
Looping result0.00130.68981390.0000
Template Total0.165490.310.1654
Template load0.00080.428410.0008
Template processing0.164789.880410.1647
Override
Cache load0.00050.276210.0005
Sytem overhead
Fetch class attribute can translate value0.00080.419210.0008
XML
Image XML parsing0.00020.126110.0002
General
dbfile0.00693.7758200.0003
String conversion0.00000.003630.0000
Note: percentages do not add up to 100% because some accumulators overlap

CSS/JS files loaded with "ezjscPacker" during request:

CacheTypePacklevelSourceFiles
CSS0extension/community/design/community/stylesheets/ext/jquery.autocomplete.css
extension/community_design/design/suncana/stylesheets/scrollbars.css
extension/community_design/design/suncana/stylesheets/tabs.css
extension/community_design/design/suncana/stylesheets/roadmap.css
extension/community_design/design/suncana/stylesheets/content.css
extension/community_design/design/suncana/stylesheets/star-rating.css
extension/community_design/design/suncana/stylesheets/syntax_and_custom_tags.css
extension/community_design/design/suncana/stylesheets/buttons.css
extension/community_design/design/suncana/stylesheets/tweetbox.css
extension/community_design/design/suncana/stylesheets/jquery.fancybox-1.3.4.css
extension/bcsmoothgallery/design/standard/stylesheets/magnific-popup.css
extension/sevenx/design/simple/stylesheets/star_rating.css
extension/sevenx/design/simple/stylesheets/libs/fontawesome/css/all.min.css
extension/sevenx/design/simple/stylesheets/main.v02.css
extension/sevenx/design/simple/stylesheets/main.v02.res.css
JS0extension/ezjscore/design/standard/lib/yui/3.17.2/build/yui/yui-min.js
extension/ezjscore/design/standard/javascript/jquery-3.7.0.min.js
extension/community_design/design/suncana/javascript/jquery.ui.core.min.js
extension/community_design/design/suncana/javascript/jquery.ui.widget.min.js
extension/community_design/design/suncana/javascript/jquery.easing.1.3.js
extension/community_design/design/suncana/javascript/jquery.ui.tabs.js
extension/community_design/design/suncana/javascript/jquery.hoverIntent.min.js
extension/community_design/design/suncana/javascript/jquery.popmenu.js
extension/community_design/design/suncana/javascript/jScrollPane.js
extension/community_design/design/suncana/javascript/jquery.mousewheel.js
extension/community_design/design/suncana/javascript/jquery.cycle.all.js
extension/sevenx/design/simple/javascript/jquery.scrollTo.js
extension/community_design/design/suncana/javascript/jquery.cookie.js
extension/community_design/design/suncana/javascript/ezstarrating_jquery.js
extension/community_design/design/suncana/javascript/jquery.initboxes.js
extension/community_design/design/suncana/javascript/app.js
extension/community_design/design/suncana/javascript/twitterwidget.js
extension/community_design/design/suncana/javascript/community.js
extension/community_design/design/suncana/javascript/roadmap.js
extension/community_design/design/suncana/javascript/ez.js
extension/community_design/design/suncana/javascript/ezshareevents.js
extension/sevenx/design/simple/javascript/main.js

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1pagelayout.tpl<No override>extension/sevenx/design/simple/templates/pagelayout.tplEdit templateOverride template
 Number of times templates used: 1
 Number of unique templates used: 1

Time used to render debug report: 0.0001 secs