non-ansi characters in generated url's

Author Message

Erlend Halvorsen

Friday 09 November 2007 2:49:04 pm

Hi!

I've just installed ez 3.10.0, and I'm having some trouble with norwegian characters. I've created a folder named Tilbehør, and the generated url is http://somedomain.com/Tilbehør, which is then of course translated by the browser to /Tilbeh%F8r, which doesn't exist. If I however rewrite the url by hand back to /Tilbehør, it's translated to /Tilbeh%C3%B8r, and that works. Seems to me to be some sort of UTF-8/ISO-8859-1 problem. The site it self is encoded in ISO. I've been screwed over by PHP and UTF-8 so many times now, I'm not doing that again.

Any ideas how to fix this?

Also, while we're on the topic, is it possible to have ez generate lowercase url's?

-Erlend

Erlend Halvorsen

Saturday 10 November 2007 8:14:57 am

Ok, to answer my own question, adding

[URLTranslator]
TransformationGroup=urlalias

settings/override/site.ini.append.php fixes the problem. Now the generated url is /Tilbehoer, and /Tilbehør still works - perfect!

-Erlend

Ole Marius Smestad

Wednesday 28 November 2007 1:22:39 am

Hi Erlend,

I'm glad you found a solution which worked for your site. In 3.10.0 and in the 4.0.0 alpha and beta releasees, the default url transformation setting have been urlalias_iri, which as you've seen will include unicode characters in the generated urls.

For the final release of eZ Publish 4.0 and 3.10.1 we are considering changing the default to a more restrictive setting. Either 'urlalias_compat', or 'urlalias'

Do you, and the rest of the community have any wishes in this regard?

For reference I am including a snippet from the feature documentation:

1. Only allow a restricted set of characters in the url, this means
   a to z, numbers and underscore. (This is the same behaviour as in
   3.9 and earlier.)

   The identifier for this is *urlalias_compat*

2. Allow more characters in the url, but still restrict it to the
   ASCII characters (with a few exceptions). Capitalization of words
   are now kept.

   The identifier for this is *urlalias*

3. Similar to #2 but allow all Unicode characters (with a few
   exceptions). This allows the text to preserved as much as possible
   and is highly recommended for uni- or multi-lingual sites. The only
   changes to the text is removal of a few characters which are
   special to the urls on the Internet and trimming of multiple
   whitespaces to only one whitespace.
   It is recommened to use the utf-8 charset for the site when having
   this enabled (*i18n.ini*).

   The identifier for this is *urlalias_iri*

When the desired transformation is chosen it must be configured in
*site.ini* by setting the TransformationGroup setting in the settings
*group URLTranslator to contain the identifier of the chosen type.
e.g. if the third type was chosen::

  [URLTranslator]
  TransformationGroup=urlalias_iri

Advanced users might also want to take a look at *transform.ini* to
configure your own transformation group. Tweaking this file and adding
an extension to the transformation allows for full control over the
created URL aliases.

Note: #3 is referred to as IRI [1] (Internationalized Resource
      Identifiers) which is a specialization of URI/URL with Unicode
      support.

[1] http://www.w3.org/International/O-URL-and-ident.html


--
Ole Marius Smestad
Lead Engineer eZ Publish
Member of the Community Project Board

Peter Putzer

Wednesday 28 November 2007 2:16:04 am

I find 'urlalias' to be reasonable compromise. Due to the way browsers encode Unicode characters, 'urlalias_iri' is not really an option IMHO.

However, I have an additional feature request: Please add an option to automatically generate urlalias_compat forwardings even for new objects. In practice, this makes URLs case-insensitive (but case-preserving) when using 'urlalias'. An example:

Object 'Über uns' (German for 'About Us')

Old urlalias (< 3.10): '/ueber_uns'
urlalias (=> 3.10): '/Ueber-uns'
urlalias_iri: '/Über uns'

If 'Über uns' already existed before running the upgrade script, '/ueber_uns' continues to work fine. However, if I create new object 'Bla bla', only the new alias '/Bla-bla' is created. Using an old-style URL '/bla_bla' will not work.

This is a problem with URLs that are to be entered from memory (e.g. if you use them in printed media like leaflets). Changing the way URLs are generated should be as transparent as possible for users.

Accessible website starting from eZ publish 3.0 (currently: 4.1.0): http://pluspunkt.at

Erlend Halvorsen

Wednesday 28 November 2007 3:25:25 am

I can't really say I cared much for the unicode url's, as as soon as they are clicked the characters are converted to percentage encoded %g%a%r%b%a%g%e. This makes them hard to read, type, say, and remember. If one could come up with a solution where the generated url's contained only ansi characters, while still supporting utf-8 characters in the url (for instance for use in printed material) that would be the best solution in my opinion.

Update: Re-reading my own response, I see that this is exactly what urlalias does :) Now, if only I could get rid of those capitalized letters..

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 18 2025 02:44:40
Script start
Timing: Jan 18 2025 02:44:40
Module start 'layout'
Timing: Jan 18 2025 02:44:40
Module start 'content'
Timing: Jan 18 2025 02:44:40
Module end 'content'
Timing: Jan 18 2025 02:44:40
Script end

Main resources:

Total runtime0.6191 sec
Peak memory usage4,096.0000 KB
Database Queries65

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0060 588.0313152.6406
Module start 'layout' 0.00600.0033 740.671939.4766
Module start 'content' 0.00930.6077 780.1484601.6875
Module end 'content' 0.61690.0021 1,381.835916.1406
Script end 0.6191  1,397.9766 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00330.5252160.0002
Check MTime0.00130.2115160.0001
Mysql Total
Database connection0.00090.145910.0009
Mysqli_queries0.559390.3390650.0086
Looping result0.00070.1148630.0000
Template Total0.578693.520.2893
Template load0.00190.308320.0010
Template processing0.576693.126720.2883
Template load and register function0.00020.026510.0002
states
state_id_array0.00190.312710.0019
state_identifier_array0.00220.363220.0011
Override
Cache load0.00160.2533340.0000
Sytem overhead
Fetch class attribute can translate value0.00080.121130.0003
Fetch class attribute name0.00120.189360.0002
XML
Image XML parsing0.00100.156330.0003
class_abstraction
Instantiating content class attribute0.00000.001760.0000
General
dbfile0.00100.1665160.0001
String conversion0.00000.002140.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
5content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
7content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
2content/datatype/view/ezxmltags/line.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/line.tplEdit templateOverride template
1content/datatype/view/ezxmltags/literal.tpl<No override>extension/community/design/standard/templates/content/datatype/view/ezxmltags/literal.tplEdit templateOverride template
1content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 18
 Number of unique templates used: 7

Time used to render debug report: 0.0002 secs