Clean URL for Vietnamese pages

Author Message

Guillaume Marty

Thursday 09 June 2011 9:22:19 am

I saw a topic and a bug related to this issue, but they date back to 2009.

The problem is clean URL are not generated for pages written in Vietnamese, falling back to /content/view/full/ type URL.

I installed the transformation file attached to the bug report and override transform.ini this way:

[Transformation]
Charsets[]=utf-8;vietnamese

[vietnamese]
Files[]=vietnamese.tr
Extensions[]

 

That's almost OK as some characters are not caught by the transformation rules and are replace by a hyphen.

Character: ệ
Rule in tranformation file: U+1EC7 = "e"
Result: -
Expected result: e

Any ideas why not all characters are transformed?

Ivo Lukac

Thursday 09 June 2011 10:28:12 am

Hi

Try this custom url translator, place the file in "urlfilters/ngvietnamesefilter.php" in your extension with content:

<?php
class nGVietnameseFilter extends eZURLAliasFilter
{
static $mappingArray = array('\u00C0' => 'A', '\u1EA2' => 'A', '\u00C3' => 'A', '\u00C1' => 'A', '\u1EA0' => 'A', '\u1EB0' => 'A','\u1EB2' => 'A', '\u1EB4' => 'A', '\u1EAE' => 'A', '\u1EB6' => 'A', '\u1EA6' => 'A', '\u1EA8' => 'A','\u1EAA' => 'A', '\u1EA4' => 'A', '\u1EAC' => 'A', '\u00C8' => 'E', '\u1EBA' => 'E', '\u1EBC' => 'E','\u00C9' => 'E', '\u1EB8' => 'E', '\u1EC0' => 'E', '\u1EC2' => 'E', '\u1EC4' => 'E', '\u1EBE' => 'E','\u1EC6' => 'E', '\u00CC' => 'I', '\u1EC8' => 'I', '\u0128' => 'I', '\u00CD' => 'I', '\u1ECA' => 'I','\u00D2' => 'O', '\u1ECE' => 'O', '\u00D5' => 'O', '\u00D3' => 'O', '\u1ECC' => 'O', '\u1ED2' => 'O','\u1ED4' => 'O', '\u1ED6' => 'O', '\u1ED0' => 'O', '\u1ED8' => 'O', '\u1EDC' => 'O', '\u1EDE' => 'O','\u1EE0' => 'O', '\u1EDA' => 'O', '\u1EE2' => 'O', '\u00D9' => 'U', '\u1EE6' => 'U', '\u0168' => 'U','\u00DA' => 'U', '\u1EE4' => 'U', '\u1EEA' => 'U', '\u1EEC' => 'U', '\u1EEE' => 'U', '\u1EE8' => 'U','\u1EF0' => 'U', '\u1EF2' => 'Y', '\u1EF6' => 'Y', '\u1EF8' => 'Y', '\u00DD' => 'Y', '\u1EF4' => 'Y','\u00E0' => 'a', '\u1EA3' => 'a', '\u00E3' => 'a', '\u00E1' => 'a', '\u1EA1' => 'a', '\u1EB1' => 'a','\u1EB3' => 'a', '\u1EB5' => 'a', '\u1EAF' => 'a', '\u1EB7' => 'a', '\u1EA7' => 'a', '\u1EA9' => 'a','\u1EAB' => 'a', '\u1EA5' => 'a', '\u1EAD' => 'a', '\u00E8' => 'e', '\u1EBB' => 'e', '\u1EBD' => 'e','\u00E9' => 'e', '\u1EB9' => 'e', '\u1EC1' => 'e', '\u1EC3' => 'e', '\u1EC5' => 'e', '\u1EBF' => 'e','\u1EC7' => 'e', '\u00EC' => 'i', '\u1EC9' => 'i', '\u0129' => 'i', '\u00ED' => 'i', '\u1ECB' => 'i','\u00F2' => 'o', '\u1ECF' => 'o', '\u00F5' => 'o', '\u00F3' => 'o', '\u1ECD' => 'o', '\u1ED3' => 'o','\u1ED5' => 'o', '\u1ED7' => 'o', '\u1ED1' => 'o', '\u1ED9' => 'o', '\u1EDD' => 'o', '\u1EDF' => 'o','\u1EE1' => 'o', '\u1EDB' => 'o', '\u1EE3' => 'o', '\u00F9' => 'u', '\u1EE7' => 'u', '\u0169' => 'u','\u00FA' => 'u', '\u1EE5' => 'u', '\u1EEB' => 'u', '\u1EED' => 'u', '\u1EEF' => 'u', '\u1EE9' => 'u','\u1EF1' => 'u', '\u1EF3' => 'y', '\u1EF7' => 'y', '\u1EF9' => 'y', '\u00FD' => 'y', '\u1EF5' => 'y','\uFB00' => 'ff', '\uFB01' => 'fi', '\uFB02' => 'fl', '\uFB03' => 'ffi', '\uFB04' => 'ffl', '\uFB05' => 'ft', '\uFB06' => 'st','\u00C2' => 'A', '\u00CA' => 'E', '\u00CE' => 'I', '\u00D4' => 'O', '\u00DB' => 'U','\u00E2' => 'a', '\u00EA' => 'e', '\u00EE' => 'i', '\u00F4' => 'o', '\u00FB' => 'u','\u01A0' => 'O', '\u01A1' => 'o', '\u01AF' => 'U', '\u01B0' => 'u');

static function utf8ToUnicode( $str ) {
$unicode = array();$values = array();$lookingFor = 1;
for ($i = 0; $i < strlen( $str ); $i++ ) {
$thisValue = ord( $str[ $i ] );
if ( $thisValue < ord('A') ) {
if ($thisValue >= ord('0') && $thisValue <= ord('9')) {
$unicode[] = chr($thisValue);
}else {
$unicode[] = '%'.dechex($thisValue);
}
} else {
if ( $thisValue < 128)
$unicode[] = $str[ $i ];
else {
if ( count( $values ) == 0 ) $lookingFor = ( $thisValue < 224 ) ? 2 : 3;
$values[] = $thisValue;
if ( count( $values ) == $lookingFor ) {
$number = ( $lookingFor == 3 ) ?( ( $values[0] % 16 ) * 4096 ) + ( ( $values[1] % 64 ) * 64 ) + ( $values[2] % 64 ):( ( $values[0] % 32 ) * 64 ) + ( $values[1] % 64 );
$number = dechex($number);
$unicode[] = '\u' . strtoupper(str_pad($number, 4, '0', STR_PAD_LEFT));
$values = array();
$lookingFor = 1;
}
} 
}
} 
return implode("",$unicode);
} 
function process( $text, &$languageObject, &$caller ){
$outputText = '';$textArray = preg_split('/(?<!^)(?!$)/u', $text);
foreach($textArray as $char){
$unicodeChar = nGVietnameseFilter::utf8ToUnicode($char);
$outputText .= (array_key_exists($unicodeChar, nGVietnameseFilter::$mappingArray)) ? nGVietnameseFilter::$mappingArray[$unicodeChar] : $char;
}
return $outputText;
}
}
?>

Add following lines to your site.ini:

 [URLTranslator]
Extensions[]={YOUR EXTENSION NAME}
Filters[]=nGVietnameseFilter

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

Guillaume Marty

Tuesday 14 June 2011 5:35:45 am

Thanks for your reply, but it didn't work for me.

First, I tried to do what you described.

Then I regenerated the autoloads array and tried:

[URLTranslator]
FilterClasses[]=nGVietnameseFilter

(Extensions & Filters are deprecated now)

But it didn't work either. It looks like the characters are transformed in a bad way beforehand. I'm still enquiring.

Ivo Lukac

Tuesday 14 June 2011 5:50:21 am

Hi,

Send me your email via "Direct contact" form (http://share.ez.no/authorcontact/form/9504 ) and I'll send you the files, maybe the copy&paste method from post is not good

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 18 2025 01:09:53
Script start
Timing: Jan 18 2025 01:09:53
Module start 'layout'
Timing: Jan 18 2025 01:09:53
Module start 'content'
Timing: Jan 18 2025 01:09:54
Module end 'content'
Timing: Jan 18 2025 01:09:54
Script end

Main resources:

Total runtime0.9855 sec
Peak memory usage4,096.0000 KB
Database Queries62

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0054 587.9141152.6250
Module start 'layout' 0.00540.0031 740.539139.4688
Module start 'content' 0.00840.9755 780.0078574.4531
Module end 'content' 0.98400.0015 1,354.460920.1641
Script end 0.9855  1,374.6250 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00310.3105160.0002
Check MTime0.00130.1302160.0001
Mysql Total
Database connection0.00080.085810.0008
Mysqli_queries0.925693.9160620.0149
Looping result0.00060.0617600.0000
Template Total0.952796.720.4763
Template load0.00230.230820.0011
Template processing0.950496.436420.4752
Template load and register function0.00010.013610.0001
states
state_id_array0.00120.125810.0012
state_identifier_array0.00200.198420.0010
Override
Cache load0.00200.2019350.0001
Sytem overhead
Fetch class attribute can translate value0.00050.052520.0003
Fetch class attribute name0.00080.085450.0002
XML
Image XML parsing0.00060.061520.0003
class_abstraction
Instantiating content class attribute0.00000.001660.0000
General
dbfile0.00170.1676170.0001
String conversion0.00000.001040.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
4content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
2content/datatype/view/ezxmltags/link.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/link.tplEdit templateOverride template
9content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
4content/datatype/view/ezxmltags/literal.tpl<No override>extension/community/design/standard/templates/content/datatype/view/ezxmltags/literal.tplEdit templateOverride template
1content/datatype/view/ezxmltags/line.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/line.tplEdit templateOverride template
2content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 24
 Number of unique templates used: 8

Time used to render debug report: 0.0002 secs