Multi-languages and encoding

Author Message

Guillaume D

Wednesday 13 October 2010 7:46:09 am

Hello,

here is my encoding problematic :

 

can a single ezpublish instance work with multiple languages, as far as different from latin languages ( French, English ) to Chinese, Russian and Arabic ?

 

Is an UTF8 encoded mysql db and UTF8 encoded ez templates enough for that ?

 

is it possible ? do we have to choose another encoding for that, or different encodings with different databases ?

 

as anyone already encountered this problematic ?

 

here is the project :

 

-1 single ezpublish instance, containing up to 15 international sites ( with different content trees ), with 1 to 3 languages per site.

 

- The back-office is only in French and English.

 

for example, list of sites and their languages :

 

- French,English

 

- German,French

 

- Mandarin

 

- Korean

 

- Arabic,English

 

- Italian,English

 

- Japanese

 

- French,English

 

- Romanian,English,French

 

- Russian,English,French

 

- Mandarin

 

- French,English

 

-Turkish,English

 

Environnement :

 

- ezpublish 4.0.2 w already existing specific extensions and templates UTF8 encoded

 

- MySql : 5.0.51a / UTF8 encoded

Thanks,

Best Regards.

Gaetano Giunta

Wednesday 13 October 2010 10:17:45 am

Using utf8 for the db should be good.

For the rendered pages, only japanese/chinese have (afaik) the need not to use utf8, as the font/browser support is still quite incomplete, so that utf8 code point exists but it will not be displayed properly.

But if you are careful enough in your config, you can have a correct chain:

- UTF8 data

- UTF8 settings files

- UTF8 or JP templates, depending on the siteaccess

- final page is rendered in the correct character set (based on setting in i18n.ini)

Principal Consultant International Business
Member of the Community Project Board

Gaetano Giunta

Wednesday 13 October 2010 10:18:55 am

ps: please do have the php mbstring extension enabled when you start to do charset juggling

Principal Consultant International Business
Member of the Community Project Board

Guillaume D

Thursday 14 October 2010 2:52:03 am

Thank you Gaetano for your answer.

I don't understand about the "JP templates". Do you say we'll have to use a different encoding for JP templates ?

For japanese language, in the local settings, the PreferedEncoding is UTF 8. So must we have to change that ?

In the case of an encoding change in the japan locale, or a choice of a different language as Turkish ( in locale Preferred=iso-8859-9 ) .. won't we have a conflict below in the web browser ? :

- Data ( UTF 8 )

- templates ( encoding in the new charset XXX )

- settings in UTF8

- i18n for siteaccess : ( encoding in the new charset XXX )

We will look forward to checking and testing mbstring.

Eric Sagnes

Thursday 14 October 2010 7:19:05 pm

Hi Guillaume,

From my experience, UTF8 will work without problems for Chinese, Japanese and Korean and theorically any langage, I suppose what Gaetano was refering was the font priority problem in rendering ideograms.
You can see the details of it at http://en.wikipedia.org/wiki/Unihan , but basically unicode defines "points" and not "characters", so even if the same ideogram is written in a different way in Japanese and Chinese, it will be one unicode "point", the font being responsable for rendering the right character.

But as the Japanese, Korean and Chinese users usually have their langage font on the top priority, it is not something you should worry much about.
The only tricky cases being when you mix Japanese, Korean and Chinese on the same page. The best practice in such cases being to set the lang attribute directly on the html tags, eventhough the browser might not render it with right font.

I personally think that a unicode charset is always the best choice as it is the only charset that allow to mix all the alphabets/symbols in the same document. (note that UTF8 is also ASCII compatible)

I hope it can help.

Guillaume D

Wednesday 20 October 2010 2:26:39 am

Eric,

sure il will help us :-)

We won't mix different languages in a same page, so no problem on this point.

Thanks a lot.

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 18 2025 10:22:06
Script start
Timing: Jan 18 2025 10:22:06
Module start 'layout'
Timing: Jan 18 2025 10:22:06
Module start 'content'
Timing: Jan 18 2025 10:22:07
Module end 'content'
Timing: Jan 18 2025 10:22:07
Script end

Main resources:

Total runtime1.6228 sec
Peak memory usage4,096.0000 KB
Database Queries68

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0059 587.9141152.6250
Module start 'layout' 0.00590.0037 740.539139.4531
Module start 'content' 0.00961.6118 779.9922610.3047
Module end 'content' 1.62140.0014 1,390.296916.1641
Script end 1.6228  1,406.4609 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00400.2493160.0003
Check MTime0.00200.1225160.0001
Mysql Total
Database connection0.00100.060110.0010
Mysqli_queries1.540594.9248680.0227
Looping result0.00070.0449660.0000
Template Total1.585497.720.7927
Template load0.00200.120820.0010
Template processing1.583497.568520.7917
Template load and register function0.00020.012710.0002
states
state_id_array0.00090.057310.0009
state_identifier_array0.00190.117820.0010
Override
Cache load0.00190.1178900.0000
Sytem overhead
Fetch class attribute can translate value0.00070.045730.0002
Fetch class attribute name0.00170.105770.0002
XML
Image XML parsing0.00130.078130.0004
class_abstraction
Instantiating content class attribute0.00000.000980.0000
General
dbfile0.00200.1244160.0001
String conversion0.00000.000640.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
6content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
8content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
2content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
2content/datatype/view/ezxmltags/line.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/line.tplEdit templateOverride template
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 20
 Number of unique templates used: 6

Time used to render debug report: 0.0001 secs