charset problem

Author Message

laurent le cadet

Wednesday 12 December 2007 3:02:32 am

Hi,

I'm using ezfind 1.0.2 with ezP 3.9.3 - iso-8859-1 and text is not correctly indexed.

ie :
V�rin hydraulique
V�rin hydraulique ... pompes, chaleur, hydraulique, v�rin

This should be "Vérin hydraulique"

Any additionnal settings are needed?

Regards.

Laurent

laurent le cadet

Thursday 13 December 2007 6:30:19 am

It sounds like the encoding is not correct.
Must we have a utf-8 db?

Kåre Køhler Høvik

Thursday 13 December 2007 7:36:40 am

Hi

UTF8 should not be required for eZ Find and eZP3. If you have a test environment available, please try to comment out these two lines in <i>extension/ezfind/java/solr/conf/schema.xml</i>

....
<!--        <filter class="ISOLatin1AccentFilterFactory"/> -->
...
<!--        <filter class="ISOLatin1AccentFilterFactory"/> -->
...

restart Solr, and reindex the data.

Kåre Høvik

laurent le cadet

Friday 14 December 2007 3:08:13 am

Hi Kåre,

We add comment for the lines :

<!-- <filter class="ISOLatin1AccentFilterFactory"/> -->

restart solr and reindex but the results are still corrupted :

This text :

Le DMP est con�u pour r�aliser pour le microdosage de tr�s haute pr�cision de tous les produits

Should be :

Le DMP est conçu pour réaliser pour le microdosage de très haute précision de tous les produits

The charcaters : ç,é,è (and I presume all the special characters) are not well encode.

Stuck at this point.

Any hint ?

regards.

Laurent

laurent le cadet

Monday 17 December 2007 4:36:37 am

Hi,

I read that on http://lucene.apache.org/solr/tutorial.html#Requirements :

"SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported"

Is that related to our problem or can we override that?

I tryed almost everythings without any results actually.

Best regards

Laurent

Kåre Køhler Høvik

Monday 17 December 2007 4:59:55 am

Hi

Thank you for looking into this.

It looks you found the problem. The resolution for this is to use eZ Find to convert the data to UTF-8 before it's indexed. Please add a bug report about this in the issue tracker, and I'll fix it as soon as I have time.

Best regards
Kåre

Kåre Høvik

laurent le cadet

Monday 17 December 2007 5:07:56 am

Kåre,

I'm going to report the bug.
As you can see, there is additionnal info for encoding/decoding (java.net) or another alternative with additionnal code :

String encoding = request.getCharacterEncoding();
if (null == encoding) {
  // Set your default encoding here 
  request.setCharacterEncoding("UTF-8");
} else {
  request.setCharacterEncoding(encoding);
}
...
String value = request.getParameter("q");

I'm digging in the "java.net" solution. For the other one, I don't know if it can serves us and where to apply the "patch".

Any idea?

Laurent

laurent le cadet

Wednesday 19 December 2007 2:50:18 am

Finally, I convert the DB to UTF-8.
Everything works fine.

(http://ez.no/developer/forum/general/convert_from_iso_8859_1_encoding_to_utf_8/)

Hope this help.

laurent

John Smith

Tuesday 19 August 2008 10:26:10 am

hi laurent,

I used the script by Kristof Coomans while upgrading 3.6.1 to 3.8.0 to do the uft-8 conversion, which is posted on

http://ez.no/developer/forum/install_configuration/update_to_3_8_and_codepage_problems

I am getting the notice of

SET NAMES 'utf8' on adminstration and public website.

Are you getting the same....

Please help...

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 30 2025 18:19:25
Script start
Timing: Jan 30 2025 18:19:25
Module start 'layout'
Timing: Jan 30 2025 18:19:25
Module start 'content'
Timing: Jan 30 2025 18:19:25
Module end 'content'
Timing: Jan 30 2025 18:19:25
Script end

Main resources:

Total runtime0.0355 sec
Peak memory usage6,144.0000 KB
Database Queries3

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0078 588.1484151.2109
Module start 'layout' 0.00780.0062 739.3594220.6875
Module start 'content' 0.01410.0198 960.04691,010.9609
Module end 'content' 0.03390.0016 1,971.007841.9922
Script end 0.0354  2,013.0000 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00277.4848140.0002
Check MTime0.00112.9669140.0001
Mysql Total
Database connection0.00113.028110.0011
Mysqli_queries0.005916.630830.0020
Looping result0.00000.062510.0000
Template Total0.00113.010.0011
Template load0.00082.278410.0008
Template processing0.00030.747610.0003
Override
Cache load0.00051.508710.0005
General
dbfile0.003710.500780.0005
String conversion0.00000.022940.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 1
 Number of unique templates used: 1

Time used to render debug report: 0.0001 secs