charset problem

Author Message

laurent le cadet

Wednesday 12 December 2007 3:02:32 am

Hi,

I'm using ezfind 1.0.2 with ezP 3.9.3 - iso-8859-1 and text is not correctly indexed.

ie :
V�rin hydraulique
V�rin hydraulique ... pompes, chaleur, hydraulique, v�rin

This should be "Vérin hydraulique"

Any additionnal settings are needed?

Regards.

Laurent

laurent le cadet

Thursday 13 December 2007 6:30:19 am

It sounds like the encoding is not correct.
Must we have a utf-8 db?

Kåre Køhler Høvik

Thursday 13 December 2007 7:36:40 am

Hi

UTF8 should not be required for eZ Find and eZP3. If you have a test environment available, please try to comment out these two lines in <i>extension/ezfind/java/solr/conf/schema.xml</i>

....
<!--        <filter class="ISOLatin1AccentFilterFactory"/> -->
...
<!--        <filter class="ISOLatin1AccentFilterFactory"/> -->
...

restart Solr, and reindex the data.

Kåre Høvik

laurent le cadet

Friday 14 December 2007 3:08:13 am

Hi Kåre,

We add comment for the lines :

<!-- <filter class="ISOLatin1AccentFilterFactory"/> -->

restart solr and reindex but the results are still corrupted :

This text :

Le DMP est con�u pour r�aliser pour le microdosage de tr�s haute pr�cision de tous les produits

Should be :

Le DMP est conçu pour réaliser pour le microdosage de très haute précision de tous les produits

The charcaters : ç,é,è (and I presume all the special characters) are not well encode.

Stuck at this point.

Any hint ?

regards.

Laurent

laurent le cadet

Monday 17 December 2007 4:36:37 am

Hi,

I read that on http://lucene.apache.org/solr/tutorial.html#Requirements :

"SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported"

Is that related to our problem or can we override that?

I tryed almost everythings without any results actually.

Best regards

Laurent

Kåre Køhler Høvik

Monday 17 December 2007 4:59:55 am

Hi

Thank you for looking into this.

It looks you found the problem. The resolution for this is to use eZ Find to convert the data to UTF-8 before it's indexed. Please add a bug report about this in the issue tracker, and I'll fix it as soon as I have time.

Best regards
Kåre

Kåre Høvik

laurent le cadet

Monday 17 December 2007 5:07:56 am

Kåre,

I'm going to report the bug.
As you can see, there is additionnal info for encoding/decoding (java.net) or another alternative with additionnal code :

String encoding = request.getCharacterEncoding();
if (null == encoding) {
  // Set your default encoding here 
  request.setCharacterEncoding("UTF-8");
} else {
  request.setCharacterEncoding(encoding);
}
...
String value = request.getParameter("q");

I'm digging in the "java.net" solution. For the other one, I don't know if it can serves us and where to apply the "patch".

Any idea?

Laurent

laurent le cadet

Wednesday 19 December 2007 2:50:18 am

Finally, I convert the DB to UTF-8.
Everything works fine.

(http://ez.no/developer/forum/general/convert_from_iso_8859_1_encoding_to_utf_8/)

Hope this help.

laurent

John Smith

Tuesday 19 August 2008 10:26:10 am

hi laurent,

I used the script by Kristof Coomans while upgrading 3.6.1 to 3.8.0 to do the uft-8 conversion, which is posted on

http://ez.no/developer/forum/install_configuration/update_to_3_8_and_codepage_problems

I am getting the notice of

SET NAMES 'utf8' on adminstration and public website.

Are you getting the same....

Please help...

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.