update to 3.8 and codepage problems

Author Message

Alexandre Cunha

Thursday 04 May 2006 4:06:23 am

Hi,

Have updated a website from exp 3.7.4 to 3.8 and now I have problems (probably) related with codepage.
I have ç, Ã, ó, ê instead of ç, à, ó, ê.

Before update to 3.8, have updated the database from "latin1_swedish_ci" collation to "utf-8" collation on all tables.
The new table (3.7.4 with utf-8) works fine in the 3.7.4 instalation without any problems.

Have problems just after update the database to 3.8

What is wrong ?

axel

http://AlexandreCunha.com

Kristof Coomans

Thursday 04 May 2006 4:55:28 am

It seems like your eZ installation doesn't know the data in the database is UTF8.

Do you have this in site.ini.append:

[DatabaseSettings]
Charset=utf8

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

Alexandre Cunha

Thursday 04 May 2006 5:59:27 am

yes, I have utf8 enabled in the site.ini.append.php

And I guess is not
Charset=utf8
but
Charset=utf-8

Both sites (ezp 3.7.4 and 3.8) have this setting enabled.

Works fine n 3.7.4 but not in 3.8

Also have Charset=utf-8 in i18n.ini.append.php

http://AlexandreCunha.com

Kristof Coomans

Thursday 04 May 2006 6:05:58 am

Do you see the message <b>SET NAMES 'utf8'</b> in the debug output?

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

Alexandre Cunha

Thursday 04 May 2006 6:15:39 am

Thanks for your help Kristof,

Yes, I see <b>SET NAMES 'utf8'</b> on both sites (ezp 3.7.4 and 3.8).

What means that ?

http://AlexandreCunha.com

Kristof Coomans

Thursday 04 May 2006 6:41:50 am

That means that it's really using UTF8 to read the data from your database. So we aren't any further yet :-)

Do you have the problem with those characters with both simple text line/text field attributes, as well as with ezxmltext attributes?

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

Alexandre Cunha

Thursday 04 May 2006 7:42:29 am

 

Do you have the problem with those characters with both simple text line/text field attributes, as well as with ezxmltext attributes?
 

Nice question ....

Seems the problem is only in [XML block] (Message in the weblog class, Intro and Body in the Article class)
Titles (weblog and Article classes) are not affected by this problem.

Your question guides me to the solution ...

Looking to field "data_text" in the table "ezcontentobject_attribute":

some "data_text" begins with:

<?xml version="1.0" encoding="UTF-8"?>"

And this seams the problem because all new content have in the "data_text":

 	<?xml version="1.0" encoding="utf-8"?>

Have replace on sample content "iso-8859-1" by "utf-8" and solves the problem.

The next problem is: How to replace all "iso-8859-1" by "utf-8" ?

http://AlexandreCunha.com

Kristof Coomans

Thursday 04 May 2006 9:10:45 am

Some time ago I've written a command line script which we used to convert all our eZ databases to UTF8. It is made for running on a latin-1 (default) encoded database, so you will need to run it on the initial database. We've used it with success on 5 sites (2 rather big and 3 small ones).

First it goes looking for all attributes who use utf8 encoded XML in the db. It will fetch them and reinsert their content into the db with latin-1 encoding.

Then it uses the SQL command "CONVERT TO CHARACTER SET utf8" on all tables, which will convert all data to UTF8.

As a final step, it again goes through all attributes looking for latin-1 encoded XML (the xml encoding attribute is not right now, since we converted everything to utf8 in the previous step). It will replace the iso-8859-1 string with utf-8 and then reinserts the data. I know the way this step is done now is not 100% safe, because it replaces every instance of "iso-8859-1" with "utf-8". But that was no issue for us.

#!/usr/bin/env php
<?php

include_once( 'lib/ezutils/classes/ezcli.php' );
include_once( 'kernel/classes/ezscript.php' );
include_once( 'lib/ezdb/classes/ezdb.php' );

$cli =& eZCLI::instance();
$script =& eZScript::instance( array( 'description' => ( "Changes your eZ publish database tables to use UTF8" ),
                                      'use-session' => false,
                                      'use-modules' => false,
                                      'use-extensions' => true ) );

$script->startup();

$options = $script->getOptions();

$db =& eZDB::instance();

// first store XML saved as utf-8 back as iso-8859-1
$selectSQL = "SELECT id, version, data_text FROM ezcontentobject_attribute where data_text LIKE '<?xml%utf-8%' LIMIT 10";
$result = $db->arrayQuery( $selectSQL );

while ( count( $result ) > 0 )
{
    include_once( 'lib/ezxml/classes/ezxml.php' );

    foreach ( array_keys( $result ) as $i )
    {
        $cli->output( 'converting attribute ' . $result[$i]['id'] . ' version ' . $result[$i]['version'] );
        $value = $result[$i]['data_text'];

        $xml = new eZXML();
        $doc =& $xml->domTree( $value );

        if ( $doc )
        {
            $value = $doc->toString( 'iso-8859-1' );
            $success = $db->query( "UPDATE ezcontentobject_attribute SET data_text='" . $db->escapeString( $value ) . "' WHERE id=" . $result[$i]['id'] . " AND version=" . $result[$i]['version'] );

            if ( !$success )
            {
                $cli->output( 'unable to update the db.' );
            }
        }
        else
        {
            $cli->output( 'error while creating DOM document' );
        }
    }

    unset( $result );
    $result = $db->arrayQuery( $selectSQL );
}

$selectSQL = "SELECT id, version, data_text5 FROM ezcontentclass_attribute where data_text5 LIKE '<?xml%utf-8%' LIMIT 10";
$result = $db->arrayQuery( $selectSQL );

while ( count( $result ) > 0 )
{
    include_once( 'lib/ezxml/classes/ezxml.php' );

    foreach ( array_keys( $result ) as $i )
    {
        $cli->output( 'converting class attribute ' . $result[$i]['id'] . ' version ' . $result[$i]['version'] );
        $value = $result[$i]['data_text5'];

        $xml = new eZXML();
        $doc =& $xml->domTree( $value );

        if ( $doc )
        {
            $value = $doc->toString( 'iso-8859-1' );
            $success = $db->query( "UPDATE ezcontentclass_attribute SET data_text5='" . $db->escapeString( $value ) . "' WHERE id=" . $result[$i]['id'] . " AND version=" . $result[$i]['version'] );

            if ( !$success )
            {
                $cli->output( 'unable to update the db.' );
            }
        }
        else
        {
            $cli->output( 'error while creating DOM document' );
        }
    }

    unset( $result );
    $result = $db->arrayQuery( $selectSQL );
}

$tables = $db->arrayQuery( 'SHOW tables' );

foreach ( $tables as $table )
{
    $tableName = false;
    
    foreach( array_keys( $table ) as $i )
    {
         $tableName = $table[$i];
         break;  
    }
    
    if ( $tableName )
    {
        $cli->output( 'Changing table: ' . $tableName );
        $db->query( 'ALTER TABLE ' . $db->escapeString( $tableName ) . ' CONVERT TO CHARACTER SET utf8' );
    }
}

$selectSQL = "SELECT id, version, data_text FROM ezcontentobject_attribute where data_text LIKE '<?xml%iso-8859-1%' LIMIT 10";
$result = $db->arrayQuery( $selectSQL );

while ( count( $result ) > 0 )
{
    foreach ( array_keys( $result ) as $i )
    {
        $cli->output( 'converting attribute ' . $result[$i]['id'] . ' version ' . $result[$i]['version'] );
        $value = $result[$i]['data_text'];
        $value = str_replace( 'iso-8859-1', 'utf-8', $value );
        $success = $db->query( "UPDATE ezcontentobject_attribute SET data_text='" . $db->escapeString( $value ) . "' WHERE id=" . $result[$i]['id'] . " AND version=" . $result[$i]['version'] );

        if ( !$success )
        {
            $cli->output( 'unable to update the db.' );
        }
    }
    
    unset( $result );
    $result = $db->arrayQuery( $selectSQL );
}

$selectSQL = "SELECT id, version, data_text5 FROM ezcontentclass_attribute where data_text5 LIKE '<?xml%iso-8859-1%' LIMIT 10";
$result = $db->arrayQuery( $selectSQL );

while ( count( $result ) > 0 )
{
    foreach ( array_keys( $result ) as $i )
    {
        $cli->output( 'converting class attribute ' . $result[$i]['id'] . ' version ' . $result[$i]['version'] );
        $value = $result[$i]['data_text5'];
        $value = str_replace( 'iso-8859-1', 'utf-8', $value );
        $success = $db->query( "UPDATE ezcontentclass_attribute SET data_text5='" . $db->escapeString( $value ) . "' WHERE id=" . $result[$i]['id'] . " AND version=" . $result[$i]['version'] );

        if ( !$success )
        {
            $cli->output( 'unable to update the db.' );
        }
    }
    
    unset( $result );
    $result = $db->arrayQuery( $selectSQL );
}

$script->shutdown();

?>

If you want to run something similar on your database as it is now, you can try to use only the last step.

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

Alexandre Cunha

Thursday 04 May 2006 12:02:30 pm

Very nice script. Have you sumbit it in the "contibutions" section of this site ?

Because my problem is only a "find and replace" solution, I have do this:

 

    UPDATE ezcontentobject_attribute SET data_text = REPLACE(data_text, 'encoding="iso-8859-1"?>', 'encoding="utf-8"?>')
 

I put this here because can help other users.

Kristof, thank you very mutch for your help.

axel

http://AlexandreCunha.com

Kristof Coomans

Thursday 04 May 2006 11:41:52 pm

No, I didn't post it yet as a contribution. I first wanted to see what you thought about it :-)

Your SQL query is much better (and faster) than my step 3 ;-) Maybe we can change my script a bit so it uses your query instead of mine, and then post it as a contribution?

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

Alexandre Cunha

Thursday 11 May 2006 5:16:58 am

Maybe I ill test it in the next 2 or 3 weeks when I update another site from ezp3.7 to ezp3.8

Maybe interest other users.

 

http://AlexandreCunha.com

Tuesday 16 May 2006 4:49:26 am

Hi,

I'm very interested in such a contribution :)

John Smith

Tuesday 19 August 2008 9:22:14 am

hi Kristof,

I am using your script to do the uft-8 conversion while upgrading 3.6.1 to 3.8.0. Scripts seems fine, no problems at all.

But in the debug on both adminstration and the public websites I am geting following message.

SET NAMES 'utf8'

I am running the script with the following sequence.

1. Changing the Charset=utf-8 in override/i18n.ini.append.php
[CharacterSettings]
Charset=utf-8

2. Leaving the Charset empty in override/site.ini.append.php
[DatabaseSettings]
Charset=

3. Then running the utf8 conversion script.

4. Running the other upgrade scripts for upgrading to 3.8.0

Can you please guide me, I am still getting the debug Notice.....

Cheers.

John Smith

Wednesday 20 August 2008 4:01:02 am

Anyone please????

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.