Author
|
Message
|
Mikhail Chekanov
|
Thursday 18 March 2004 4:00:12 am
Is it possible to add url_alias manager to the administration interface?
The idea is to change url_aliases within the adm. interface without manual interventions to the database. This feature would be useful for non-english sites (by default, non-english characters converted to _ ). Currently, I've added transliteration functionality for convertToAlias function for my russian site in kernel/classes/ezurlalias.php:
function convertToAlias( $urlElement, $defaultValue = false )
{
//transliteration of one-character phonemas:
//russian A -> ascii a, russian a -> ascii a
$urlElement = strtr( $urlElement, "AaCc...;","AaSs..." );
//transliteration of multi-capital phonemas
//norwegian Ø -> OE, ø -> oe; or russian "ч"=>"ch" ...
$urlElement = strtr( $urlElement,
array( "ø"=>"oe", "Ø"=>"oe", "å"=>"aa", "Å"=>"aa", ...... ) );
....
So, the question to eZ crew is: perhaps this little hack could be useful enough to be included in the source? Of course, this have to be done more smart way:
1. Include to locale files optional section [transliteration] with appropriate rules: two strings for one-character transliteration and one array for multi-character transliteration. 2. The system checks up existence of the rules in appropriate locale file and transliterates $urlElement according to this rules.
--
mike
#6595551
|
Gunnstein Lye
|
Thursday 18 March 2004 8:02:45 am
I agree that this is something we need! I have done the same thing as you for the norwegian characters. It's a good idea to have settings for this in the locale files. Ini-files are somewhat slow at the moment, but it should not be much of a problem in this case. I'll do some research.
|
Trond Åge Kvalø
|
Thursday 18 March 2004 1:19:01 pm
Hello Gunnstein! Is this something you are willing to share? I have the exact same problem on a large portal we're making at the moment. The norwegian characters becomes _ in the urls. I could try to write a function like the one above, but I have a very bad feeling about tampering with the kernel, and if you already have invented the wheel...
best regards trondåge
trondåge
|
Jan Borsodi
|
Friday 19 March 2004 12:34:48 am
It's a good idea, However there are several problems with the implementation.
1. Character sets/encodings The placement of various characters will vary from charset to charset so it needs to be integrated with the i18n library to properly handle this. There's also the problem with non-8bit charset (e.g. utf8) which will use multiple bytes for a character. A simple of solving this now is to turn the string into a Unicode array using
$codec =& eZTextCodec::instance( false, 'unicode' );
$urlElementArray = $codec->convertString( $urlElement );
Then replacing the characters using their Unicode values and converting it back.
$reverseCodec =& eZTextCodec::instance( 'unicode', false );
$urlElement = $reverseCodec->convertString( $urlElementArray );
However it should be noted that this is not very fast.
2. Unicode A good implementation should provide conversion for all characters in Unicode. For instance a site could be running utf8 and have articles in multiple languages. Actually this type of conversion is similar to lowercase, uppercase and search normalization all which should be handled by the i18n system some day.
--
Amos
Documentation: http://ez.no/ez_publish/documentation
FAQ: http://ez.no/ez_publish/documentation/faq
|
Gunnstein Lye
|
Friday 19 March 2004 5:04:08 am
Hi Trond, For fixing just the norwegian characters I suggest you use the solution by Michail Che above. It is cleaner than mine. (Note: This will not work if you use UTF-8.) Currently, there is no way around tampering with the kernel.
|
Trond Åge Kvalø
|
Friday 19 March 2004 9:27:56 am
Ok, just to make sure I don't f**k up too much; All I have to do is to add the folowing line at the top of the convertToAlias-function: function convertToAlias( $urlElement, $defaultValue = false )
{
//transliteration of one-character phonemas:
//norwegian Æ -> ascii A, norwegian æ -> ascii a
$urlElement = strtr( $urlElement, "ÆæØøÅå;","AaOoAa" );
Am I correct or is there something I've misunderstood?
best regards trondåge
trondåge
|
Georg Franz
|
Friday 19 March 2004 9:47:04 am
Hi, just another comment to the url_alias conversion: I've talked with two search engine experts. They say that Google like the "-" in urls more than "_".
Or to say it in another way: In a search result of google /this-is/a-test-url will be ranked higher than /this_is/a_test_url Can anybody confirm this?
Kind regards, Emil.
Best wishes,
Georg.
--
http://www.schicksal.com Horoskop website which uses eZ Publish since 2004
|
Mikhail Chekanov
|
Monday 22 March 2004 4:00:52 am
Trond Åge Kvalø wrote: >All I have to do is to add the folowing line at the top of the convertToAlias-function:
function convertToAlias( $urlElement, $defaultValue = false )
{
$urlElement = strtr( $urlElement, "ÆæØøÅå;","AaOoAa" );
...
>Am I correct or is there something I've misunderstood? In case you want to replace "Æ" with "a", not with "Aa", this is correct, but you need to remove one semicolon:
$urlElement = strtr( $urlElement, "ÆæØøÅå","AaOoAa" );
---
Emil Webber wrote:
>just another comment to the url_alias conversion:
>I've talked with two search engine experts. They say that Google like the "-" in urls more than "_". >Can anybody confirm this?
May be they are right, because Google counts "word1-word2" as two words for pageranking formula, otherwise "word1_word2" as one senseless word, AFAIK. ---
I think there are 2 possible solutions to deal with i18n of the aliases:
1st: above-named way through transliteration, but there is a problem with UTF . Do you think that some slowing is critical? This is one-time operation, isn't it?
2nd: special text field for admin interface to submit/edit url_alias manually.
---
Jan Borsodi wrote: >A simple of solving this now is to turn the string into a Unicode array...
Or something more usual, isnt'it? At one of my sites I'm using 1251 within my templates and UTF-8 for database/site due some historical reasons ;), so I've tested this code:
function convertToAlias( $urlElement, $defaultValue = false )
{
include_once( 'lib/ezi18n/classes/eztextcodec.php' );
$codec =& eZTextCodec::instance( false, 'cp1251' );
$urlElementArray = $codec->convertString( $urlElement );
$urlElementArray = strtr( $urlElementArray, "Aa...", "aa..." );
$urlElementArray = strtr( $urlElementArray, array( "z"=>"zh" ));
$reverseCodec =& eZTextCodec::instance( 'cp1251', false );
$urlElement = $reverseCodec->convertString( $urlElementArray );
...
This works good enough... but this become too complicated to be included as self-tuned code, because we have to detect charsets, not only transliteration strings/arrays.
--
mike
#6595551
|
Trond Åge Kvalø
|
Monday 22 March 2004 5:52:44 am
> >Am I correct or is there something I've misunderstood?
> In case you want to replace "Æ" with "a", not with "Aa",
> this is correct, but you need to remove one semicolon: > $urlElement = strtr( $urlElement, "ÆæØøÅå","AaOoAa" ); Ok, thanks Mikhail. Just one question, though. The way I've written it it now, wouldn't "Æ" be replaced with "A" and "æ" with "a" etc..? I did remove the semi-colon also, but it doesn't seem to work. Any ideas why?
best regards trondåge
trondåge
|
Mikhail Chekanov
|
Tuesday 23 March 2004 2:17:01 am
>Just one question, though. The way I've written it it now, wouldn't "Æ" be replaced with "A" and "æ" with "a" etc..? Yes, exactly.
>I did remove the semi-colon also, but it doesn't seem to work. Any ideas why? What charset do you use? As you can see above, there is a problem with multi-byte encodings (e.g. UTF-8)...
--
mike
#6595551
|
Trond Åge Kvalø
|
Tuesday 23 March 2004 3:46:51 am
>> Just one question, though. The way I've written it it now, wouldn't "Æ" be replaced
>> with "A" and "æ" with "a" etc..? > Yes, exactly. Ok, got that one then :-)
>> I did remove the semi-colon also, but it doesn't seem to work. Any ideas why?
> What charset do you use? As you can see above, there is a problem with multi-byte > encodings (e.g. UTF-8)...
This is my charset in my pagelayout.tpl <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> But on second thought you probably mean the charset in my site.ini.append file, right?
Now let's see.... I have #?ini charset="iso-8859-1" and in the database settings I have Charset=iso-8859-1 Any other place I should look?
best regards trondåge
trondåge
|
Mikhail Chekanov
|
Wednesday 24 March 2004 2:35:29 am
> Any other place I should look? Sorry, I havn't any useful idea... In case that you have ISO everywhere, I don't see any errors... :(
--
mike
#6595551
|
Gunnstein Lye
|
Wednesday 24 March 2004 8:01:02 am
Try running update/common/scripts/updateniceurls.php
|
Trond Åge Kvalø
|
Wednesday 24 March 2004 10:09:28 am
> Try running update/common/scripts/updateniceurls.php Total updates 0/99 ??
(After som tweaking of code so that it found the includes and moving the
$argv = $_SERVER['argv']; to the top so the argv variable in line 124 wasn't undefined) And nothing happens with my URL's. I <b>am</b> using nice urls when I don't see the content/view/full/xyz, right? trondåge
trondåge
|
Gunnstein Lye
|
Friday 26 March 2004 3:30:45 am
> And nothing happens with my URL's. I am using nice urls when I don't see the > content/view/full/xyz, right? Yes. Well, I'm out of suggestions now.
|
Georg Franz
|
Sunday 28 March 2004 12:15:28 pm
Hi Trond, I've altered the kernel/classes/ezurlalias.php:
function convertToAlias( $urlElement, $defaultValue = false )
{
include_once ( 'path/to/gwf_textutils.php' );
$urlElement = gwf_TextUtils::convertToAlias ( $urlElement );
if ( strlen( $urlElement ) == 0 )
{
if ( $defaultValue === false )
$urlElement = '-1';
else
{
$urlElement = $defaultValue;
$urlElement = gwf_TextUtils::convertToAlias ( $urlElement );
}
}
return $urlElement;
}
You need my "text util" class which can be found at http://ez.no/community/contributions/hacks/gwf_textutils in gwf_TextUtils::convertToAlias
the main conversion is done with
$specialChars = array ( "à", "á", "â", "ã", "ä", "å", "æ", "è", "é", "ê", "ß", " ", "'", "´", "`",
"ë", "Ç", "í", "ì", "ò", "ó", "ô", "õ", "ö", "ù", "ú", "û", "ü");
$normalChars = array ( "a", "a", "a", "a", "ae", "a", "ae", "e", "e", "e", "ss", "-", "", "", "", "e", "c", "i", "i", "o", "o", "o", "o", "oe", "u", "u", "u", "ue"); So if you have additional characters which should be "converted" you have to put it in the two arrays.
After doing the "hack", you have to run update/common/scripts/updateniceurls.php
Kind regards,
Emil (alias Georg :-)
Best wishes,
Georg.
--
http://www.schicksal.com Horoskop website which uses eZ Publish since 2004
|
Gunnstein Lye
|
Wednesday 05 May 2004 8:12:15 am
I have made a locale-based fix for this, that should work well with unicode. http://ez.no/community/contributions/hacks/url_alias_transliteration
|