how to actually use dictionaries & ezlupdate et al

Author Message

*- pike

Tuesday 22 July 2008 2:16:56 pm

ezPublish ships with a nice mechanism for translating text in templates. However, it is pretty undocumented too. There is an article here http://ez.no/ezpublish/documentation/configuration/configuration/language_and_charset/creating_a_new_translation

----------------------------
The basics are as follows:
----------------------------

Whenever your templates or php call i18n('foo'), once you look at your website in a browser, ezpublish will try to translate the word 'foo' into the language you are currently viewing. To do this, it uses dictionary files located in /share/translations.

So to setup a multilingual site, you create different siteaccesses in different languages, you start using i18n() in your templates and php, and you put some dictionaries in /share/translations.

To create those dictionaries, you can use ezlupdate from the command line. If anywhere in your code it says

i18n('foo')

and, in the root of the site, you run the command

bin/linux/ezlupdate ger-DE

a file will be updated here:

share/translations/ger-DE/translation.ts

containing

<message>
        <source>foo</source>
        <translation type="unfinished"></translation>
</message>

if you change this to

<message>
        <source>foo</source>
        <translation>bar</translation>
</message>

delete all caches, and look at the german version of your site, you will see

bar

there. Et voila.

----------------------------
ezlupdate, linguist and addicted
----------------------------

To manage the dictionary files in a gui environment, there is a program called Linguist from Trolltech. It is indeed very neat and usefull for professional translators too. And I love the way it prints.

You can download ezlupdate and Linguist in one package here:
http://ez.no/download/translations/ezlupdate_and_linguist

(NB, when forwarding this url to your translators, ask them to ignore the README and just doubleclick the Linguist icon...)

(NB, I've noticed ezlupdate to have some bugs. You may have to play with your command line parameters a bit )

Additional to editing your dictionaries by hand or using Linguist, you can
either use the Addicted extension, which allows you to edit your dictionaries through a webbrowser, or use xslt to transform them. there are some examples of xslt transformations in the Addicted distro. Addicted and the xslt goodies can be downloaded here:
http://ez.no/developer/contribs/applications/addicted

----------------------------
usage issues
----------------------------

So far so good. Now for the real usage. This mechanism has a lot of power, but you have to decide how to use it. For example:

1)
if you run ezlupdate as described, it will translate /lib/, /kernel/ and all folders in /design/. This will generate one dictionary of ~1Mb containg the whole ezPublish kernel, admin, everything. But all you wanted was the word 'foo'. You can't send this big file to a translator, too. Or should you ?

2)
After a year, you read "foo" in the dictionary, and obviously that is a typo, since it should be "phou". But in which templates is this word used ? It could be all over your site.

3)
You finally found the template, and you change the word "foo" into "phou". When you look at the german site, it doesn't say "bar" anymore, it says "phou", which is clearly not german.

These may seem minors issues, but if you have a site with >100 templates in x languages, they become annoying.

----------------------------
how I do these things
----------------------------

1)
I run ezlupdate as follows, in the root of the site:

bin/linux/ezlupdate -e share ger-DE -d design/mysite 

This is a bit of a hack, but it translates *only* design/mysite into share/translations/ger-DE/translation.ts. Option -e sets ezlupdate in 'extension mode', making it ignore /lib, /kernel etc, and only translating everything in 'share'. luckily, there is nothing to be translated in /share. The -d option adds the directory design/mysite to scan. And this contains 1 template with the word foo. So the dictionary file will only contain the word 'foo'.

Now beware. Most ezpublish sites actually do display content from kernel, lib, base etc. If you have a 100% hebrew site, you'll probably have to send your translator the full ~1Mb file ... and explain him/her what 'Module not found' and 'You have no permission to %1' means. Unless you think you've caught all errors and exceptions in your own templates. But you're never sure.

For all my extensions, I create dictionaries inside the extension, like this:

bin/linux/ezlupdate -e extension/myextension ger-DE

This will update the file extension/myextension/translations/ger-DE/translation.ts

2)
the i18n method requires you to specify a 'context' for your word, like

i18n('foo','/design/mysite/test')

this will return in the dictionary file as

<context>
    <name>/design/mysite/test</name>
    <message>
        <source>foo</source>
        <translation type="unfinished"></translation>
    </message>
</context>

after long confusion of what policy to use for contexts, i finally decided to simply use the path of the template or php file in which the word appears. so the above example would be inside /design/mysite/test.tpl. this does not make much sense to the translators, but all the words will still be chucked together nicely, and translators will recognize the clusters from the front end. there may be a bit of duplication, but since most of your duplicate code will end up in include files anyway, not much.

And as a big plus, whenever I see the word foo in the dictionary, I know where to find it in the templates or php, and vice versa, quickly and efficiently !

3)
ezp assumes english is the base language of your site; this is hardcoded in the kernel. If you are viewing the english version of your site, no dictionary will be applied. But if you change your english, you have to adjust all your dictionaries. This more or less freezes all your development. Imagine problems with ï or ø, or &rsquo;. Imagine telling your client 'adding a space before the questionmark there will take me 30 minutes'. no way.

So I've recently decided to stop using english in my templates. I use *tokens* now: instead of saying

{"foo"|i18n('/design/mysite/test')}

I say

{"test-message"|i18n('/design/mysite/test')}

and in the english dictionary, i write

<context>
    <name>/design/mysite/test</name>
    <message>
        <source>test-message</source>
        <translation>foo</translation>
    </message>
</context>

And something similar in the german dictionary file. I personally like the sight of a 'tokenized' webpage - not containing human texts yet. Once the design is done, I can take it to an english editor or interaction designer, and finetune the actual wordings. I can easily see on the front end what hasn't been translated yet (or where the dictionary went wrong), just by looking at the english. Later, I can easily change the english, without breaking the german. This is much like the use of "entities" in XUL. Note, I had to have the kernel hacked enable the eng-GB dictionary...

However, if I send this file to the translator, he/she wants some idea of what "test-message" should be. So I add this to the comment:

{"test-message"|i18n('/design/mysite/test','foo')}

which ends up in the dictionary as

<context>
    <name>/design/mysite/test</name>
    <message>
        <source>test-message</source>
        <translation type="unfinished"></translation>
        <comment>foo</comment>
    </message>
</context>

The comment shows up nicely in Linguist - especially if you print, btw.

So that's the way I do it. I'm curious what other people do ..

$2c!

---------------
The class eZContentObjectTreeNode does.

Gaetano Giunta

Tuesday 22 July 2008 3:08:29 pm

To enable english translation without hacking the kernel there is another workaround: you can simply declare your site to be using eng-US, and copy the translation files from eng-GB over the american ones. Or, you can define a new locale and copy over the locale definition from eng-GB...

Principal Consultant International Business
Member of the Community Project Board

Stéphane Bullier

Wednesday 23 July 2008 12:44:26 am

Hi pike,

Thank you for your howto.

Problem with the link for doc:

http://ez.no/ezpublish/documentation/configuration/configuration/language_and_charset/creating_a_new_translation

Stéphane

Piotrek Karaś

Wednesday 23 July 2008 2:05:42 am

We're in the middle of discussion on how to deal with issues right now. It's great that you share your experiences - so big thanks! I'll try to share some more thoughts when I have time, but right now only one thing:

after long confusion of what policy to use for contexts, i finally decided to simply use the path of the template or php file in which the word appears. so the above example would be inside /design/mysite/test.tpl. this does not make much sense to the translators, but all the words will still be chucked together nicely, and translators will recognize the clusters from the front end. there may be a bit of duplication, but since most of your duplicate code will end up in include files anyway, not much.
And as a big plus, whenever I see the word foo in the dictionary, I know where to find it in the templates or php, and vice versa, quickly and efficiently !

I'm not sure I like this approach. Yes, it is tempting to make things easier, but...

Seems to me that the idea behind context is a bit different (although I may be wrong). I stick to more semantic side of context, so for example if I have an extension with lots of validation messages, I group them under one context. If there's a module that does something specific, it deserves its own context, etc.

With your approach (template paths as contexts), I expect lots of unnecessary (expensive) redundancy as well as problems if application changes its structure etc, which is actually very likely to happen to any app. Also, the context name may reflect some meaningful information that may not be part of the path.

One more thing: we've divided extension development process into two separate, subsequent phases: Developers use UTF-8 in the templates and skip translation layer, using Polish language directly. This saves a lot of time, lots of translation mistakes, frees developers to concentrate on functionalities rather than communication/languages, which they don't have to be good at. We don't put translation layer (i18n/ezi18n) until the ext. is finished or almost finished. That also means planning for contexts can be done at that stage, which results in better context organization, and this can be done by different people!

--
Company: mediaSELF Sp. z o.o., http://www.mediaself.pl
eZ references: http://ez.no/partners/worldwide_partners/mediaself
eZ certified developer: http://ez.no/certification/verify/272585
eZ blog: http://ez.ryba.eu

*- pike

Wednesday 23 July 2008 4:49:39 am

Hi Piotrek

thanks for your reply.

I am very curious what kind of policy you use for defining your contexts, then. I understand the semantic approach - I think it's a good idea - but in practice, it got very messy over time, simply because I couldn't decide/explain what semantic contexts actually exist.

Could you give some real live examples from your end ?

>With your approach (template paths as contexts), I expect lots of unnecessary (expensive)
>redundancy as well as problems if application changes its structure etc, which is actually
>very likely to happen to any app.

The redundancy is less then I expected, and actually not a big deal. It is obviously exactly as much as you have code redundancy, which should not be much. If the application changes structure, yes, I have to rebuild the dicts ... if its simply moving a file, search and replace should do.

>Also, the context name may reflect some meaningful information
>that may not be part of the path.

Also not that often. In fact, the location of the template / php already *is* pretty semantic, if you use a good layout and naming scheme for your files.

*-pike

---------------
The class eZContentObjectTreeNode does.

Piotrek Karaś

Monday 28 July 2008 1:01:31 pm

Hi there,

I am very curious what kind of policy you use for defining your contexts, then. I understand the semantic approach - I think it's a good idea - but in practice, it got very messy over time, simply because I couldn't decide/explain what semantic contexts actually exist.
Could you give some real live examples from your end ?

There isn't a consistent policy, yet. It's rather case to case thing, trying to make it as fine as possible. Here's a few hints:
- if there seems to be a practice established, I try to follow it (for example, context names in datatypes (PHP-side))
- if there are rather complex functional modules or module views - I try to group one's translation under one context, sometimes differentiating between interface translations and validation messages, for example,
- I try not to put too many translations into one context - if there are more than 100-200 items in one - that's probably where it could be classified into two or three smaller ones
- if there is 'common phrase' redundancy - I do not get rid of it (for example, sometimes I use the same simple words like "submit", "edit" etc. several times). That's because you never know where there is going to be a change or in which language things will become unequal semantically. That's where one-context-per-file approach is weak actually. If you have one template with two phrases that are homographs in English, one of which is a form label, the other a validation message - they don't have to be homographs in other languages, and with one context you will be unable to cope with that ;(

The redundancy is less then I expected, and actually not a big deal. It is obviously exactly as much as you have code redundancy, which should not be much. If the application changes structure, yes, I have to rebuild the dicts ... if its simply moving a file, search and replace should do.

I believe that search and replace on hundred objects may be pain in the... unless you have some tricks/automation.

Also not that often. In fact, the location of the template / php already *is* pretty semantic, if you use a good layout and naming scheme for your files.

That's true, but I can think of cases where this would lead to crazy context names ;)

This is an interesting topic. I hope some other people share their thoughts.

--
Company: mediaSELF Sp. z o.o., http://www.mediaself.pl
eZ references: http://ez.no/partners/worldwide_partners/mediaself
eZ certified developer: http://ez.no/certification/verify/272585
eZ blog: http://ez.ryba.eu

*- pike

Friday 01 August 2008 6:24:46 pm

Hi

>There isn't a consistent policy, yet. It's rather case to case thing,

Been there .. and additionally, had to wade through ezpublish's standard / base / admin contexts, and through typo's in context names. Some start contexts with a slash, some don't. So in practice, contexts were all over the place. I'm working on dictionaries again right now, and feel quite happy.

Must admit, I'm using xslt to 'autotranslate' any duplications, which I then validate by hand just for checks.


> That's
> where one-context-per-file approach is weak actually. If you have one
> template with two phrases that are homographs in English, one of which
> is a form label, the other a validation message - they don't have to be
> homographs in other languages, and with one context you will be
> unable to cope with that ;(

sharp observation. or take the the word "Title" (of a project) versus "Title" (of a person) in one template. But as said, I don't use english as the source language, I use tokens. So it would be "input-projecttitle" versus "input-authortitle", in my example. The english value (Title) is in the comment.

I admit, it sounds like a hack. I might as well not use any contexts, and only use tokens, on a small site :-)

>That's true, but I can think of cases where this would lead to crazy context names ;)

If you have crazy paths :-) There is a quite similar very interesting discussion: where do you store your templates ?

*-pike

---------------
The class eZContentObjectTreeNode does.

Piotrek Karaś

Friday 01 August 2008 11:07:04 pm

But as said, I don't use english as the source language, I use tokens (...) I admit, it sounds like a hack. I might as well not use any contexts, and only use tokens, on a small site :-) (...) If you have crazy paths :-) There is a quite similar very interesting discussion: where do you store your templates ?

We are probably going to use tokens in some cases and possibly work out some internal tokenizing system/tool, but still I like the *.ts files, especially for more generic/shared/repeated tools. Once you have your work pushed through this i18n layer, *.ts file is much nicer for translation handing than any tokens I've seen, especially if you comment your translations, if needed.

As far as any conventions go in eZ, we currently try to follow them and any standards and patterns (or at least what seems to be patterns to us). Beside the usual reasons, we're still learning a lot and that helps. The same goes for templates (or maybe I'm not sure what exactly you're asking).

Cheers,
Piotrek

--
Company: mediaSELF Sp. z o.o., http://www.mediaself.pl
eZ references: http://ez.no/partners/worldwide_partners/mediaself
eZ certified developer: http://ez.no/certification/verify/272585
eZ blog: http://ez.ryba.eu

*- pike

Saturday 02 August 2008 4:14:19 am

Hi

Maybe this discussion is getting to too abstract for others :-)
Let me give some example of context structures I considered, just for fun

/design/mysite/mainmenu
/design/mysite/editforms/author
/design/mysite/fullviews/author
/extension/myext/design/mysite/fancypage

or maybe

design/mysite/mainmenu
design/mysite/author/editform
design/mysite/author/fullview
myext/design/mysite/fancypage

or maybe

/mysite/interface/mainmenu
/mysite/classes/author/edit
/mysite/classes/author/view
/mysite/extensions/myext/fancypage

It's trivial, but you need to choose something. I understand you do this afterwards - going through all your templates once more to fix contexts, add comments, and finalize your english before you send it out to translation.

Sounds like a good idea. You can't really do that while writing templates; you need the bird's eye view, it's a separate last step. It also means, while writing templates, you don't even bother - you'll decide later, just throw in some i18n('bla') - am I right ?

>We are probably going to use tokens in some cases and possibly work out some internal
>tokenizing system/tool, but still I like the *.ts files,

Yes, so do I. Maybe I wasn't clear - I'm using 'pseudo' tokens as the source of messages inside ts, and use an english dictionary. You write polish as the source, and translate it to english in the source later ? which has the same effect I guess :-)

As for where to store templates, that's another discussion... but if I look at the shape of the contexts in the above examples, I see resemblances.

*-pike

PS. I'm looking at ez.no/de, and I notice the dropdown menu on the search bar is partly english, partly german. So I'm not the only one having a hard time there :-)

---------------
The class eZContentObjectTreeNode does.

*- pike

Monday 01 December 2008 5:26:46 pm

A word of warning here.

When parsing templates, ezpublish looks for the translation of a message in

the context + the message + the comment

Yes, the comment is part of the key to find the translation. If you change your comment in the template, or in the dictionary, your translation will be invalid.

It sounds a bit weird to me, but since it is mentioned on issues.ez.no while discussing other bugs, it is apparently known and not a bug ...

$2c,
*-pike

---------------
The class eZContentObjectTreeNode does.

Peter Putzer

Tuesday 02 December 2008 2:22:42 am

Ah, but that is good. For example, when you are talking about people, you're bound to have different forms depending on gender in German. With an english string as the translation base, you're screwed without the "comment" as an additional lookup key.

Accessible website starting from eZ publish 3.0 (currently: 4.1.0): http://pluspunkt.at

*- pike

Wednesday 18 March 2009 5:44:40 pm

Hi

I had to think very deep about that one :-) I can't think of an example. I guess you mean something like

      {"Your profile"|i18n('/design/mysite/test')}

could be translated in 'Ihr Profil' or in 'Ihre Profil' - I suppose - but actually, the gender is stuck to 'Profil', so that's not a good example either. Can you give me one example where it is needed to have

      {"Some text"|i18n('/design/mysite/test','female version')}
      {"Some text"|i18n('/design/mysite/test','male version')}

where 'some text' is the same in english ?

just curious,
*-pike

---------------
The class eZContentObjectTreeNode does.

*- pike

Monday 06 July 2009 4:01:37 pm

>Ah, but that is good. For example, when you are talking about people,
>you're bound to have different forms depending on gender in German.

supposing it would be good, it would be even better if ezlupdate removed the translation from the dictionary if you changed the comment in the template. but it doesnt.

in effect, if you change the comment in the template, and you run ezlupdate, ezpublish says its not translated, and linguist says its translated. I dont care much who's wrong or who's right, but they cant both be right here :-)

*-pike

---------------
The class eZContentObjectTreeNode does.

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.