Multi-languages and encoding

Author Message

Guillaume D

Wednesday 13 October 2010 7:46:09 am

Hello,

here is my encoding problematic :

 

can a single ezpublish instance work with multiple languages, as far as different from latin languages ( French, English ) to Chinese, Russian and Arabic ?

 

Is an UTF8 encoded mysql db and UTF8 encoded ez templates enough for that ?

 

is it possible ? do we have to choose another encoding for that, or different encodings with different databases ?

 

as anyone already encountered this problematic ?

 

here is the project :

 

-1 single ezpublish instance, containing up to 15 international sites ( with different content trees ), with 1 to 3 languages per site.

 

- The back-office is only in French and English.

 

for example, list of sites and their languages :

 

- French,English

 

- German,French

 

- Mandarin

 

- Korean

 

- Arabic,English

 

- Italian,English

 

- Japanese

 

- French,English

 

- Romanian,English,French

 

- Russian,English,French

 

- Mandarin

 

- French,English

 

-Turkish,English

 

Environnement :

 

- ezpublish 4.0.2 w already existing specific extensions and templates UTF8 encoded

 

- MySql : 5.0.51a / UTF8 encoded

Thanks,

Best Regards.

Gaetano Giunta

Wednesday 13 October 2010 10:17:45 am

Using utf8 for the db should be good.

For the rendered pages, only japanese/chinese have (afaik) the need not to use utf8, as the font/browser support is still quite incomplete, so that utf8 code point exists but it will not be displayed properly.

But if you are careful enough in your config, you can have a correct chain:

- UTF8 data

- UTF8 settings files

- UTF8 or JP templates, depending on the siteaccess

- final page is rendered in the correct character set (based on setting in i18n.ini)

Principal Consultant International Business
Member of the Community Project Board

Gaetano Giunta

Wednesday 13 October 2010 10:18:55 am

ps: please do have the php mbstring extension enabled when you start to do charset juggling

Principal Consultant International Business
Member of the Community Project Board

Guillaume D

Thursday 14 October 2010 2:52:03 am

Thank you Gaetano for your answer.

I don't understand about the "JP templates". Do you say we'll have to use a different encoding for JP templates ?

For japanese language, in the local settings, the PreferedEncoding is UTF 8. So must we have to change that ?

In the case of an encoding change in the japan locale, or a choice of a different language as Turkish ( in locale Preferred=iso-8859-9 ) .. won't we have a conflict below in the web browser ? :

- Data ( UTF 8 )

- templates ( encoding in the new charset XXX )

- settings in UTF8

- i18n for siteaccess : ( encoding in the new charset XXX )

We will look forward to checking and testing mbstring.

Eric Sagnes

Thursday 14 October 2010 7:19:05 pm

Hi Guillaume,

From my experience, UTF8 will work without problems for Chinese, Japanese and Korean and theorically any langage, I suppose what Gaetano was refering was the font priority problem in rendering ideograms.
You can see the details of it at http://en.wikipedia.org/wiki/Unihan , but basically unicode defines "points" and not "characters", so even if the same ideogram is written in a different way in Japanese and Chinese, it will be one unicode "point", the font being responsable for rendering the right character.

But as the Japanese, Korean and Chinese users usually have their langage font on the top priority, it is not something you should worry much about.
The only tricky cases being when you mix Japanese, Korean and Chinese on the same page. The best practice in such cases being to set the lang attribute directly on the html tags, eventhough the browser might not render it with right font.

I personally think that a unicode charset is always the best choice as it is the only charset that allow to mix all the alphabets/symbols in the same document. (note that UTF8 is also ASCII compatible)

I hope it can help.

Guillaume D

Wednesday 20 October 2010 2:26:39 am

Eric,

sure il will help us :-)

We won't mix different languages in a same page, so no problem on this point.

Thanks a lot.

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.