iso-8859-1 to UTF-8 conversion.

Author Message

laurent le cadet

Tuesday 23 August 2005 2:05:09 am

Hi,

Here's one more message about db charset conversion but what I read previously poses me many questions ( http://ez.no/products/ez_publish_cms/documentation/configuration/configuration/language_and_charset/unicode_with_ez_publish ).

I'm using eZp 3.5.2 revision 10972 with PHP 4.3.8 / MySQL 3.23.58, DB internal charset iso-8859-1.

Actually there is a lot of content on the site (fre-FR + en-GB) and I want to add Chinese and Japanese.

What I understand is to change the Charset to UTF-8.

The previous threads about thid refered to mysql version problem or many different things to do...

Actually, is there any step by step doc to perform this ?

Regards.

Laurent

Georg Franz

Tuesday 23 August 2005 6:27:51 am

Hi Laurent,

1st of all, you need a newer mysql, (the version should be greater or equal than 4.1.11).

Then you need to change the charset of the ez tables. I've rewritten a small script for that purpous which I found at the forums of mysql:

<?php
// put in your username, password
$conn = mysql_connect("localhost", "root", "mypassword");

//change this to false to alter on the fly
$printonly=true; 

$charset="utf8";
$collate="utf8_general_ci";

$altertablecharset=true;
$alterdatabasecharser=true;

// put here your databases ...
$currentDBArray = array();
$currentDBArray[] = "mydb";


function PMA_getDbCollation($db)
{
	$sq='SHOW CREATE DATABASE `'.$db.'`;';
	$res = mysql_query($sq);
	if(!$res) echo "\n\n".$sq."\n".mysql_error()."\n\n"; else
	if($row = mysql_fetch_assoc($res))
	{
		$tokenized = explode(' ', $row[1]);
		unset($row, $res, $sql_query);
		for ($i = 1; $i + 3 < count($tokenized); $i++)
		{
			if ($tokenized[$i] == 'DEFAULT' && $tokenized[$i + 1] == 'CHARACTER' && $tokenized[$i + 2] == 'SET')
			{
				if (isset($tokenized[$i + 5]) && $tokenized[$i + 4] == 'COLLATE')
				{
					 return array($tokenized [$i + 3],$tokenized[$i + 5]); // We found the collation!
				}
				else
				{
					return array($tokenized [$i + 3]);
				}
			}
		} 
	}
	return '';
}

$rs2 = mysql_query("SHOW DATABASES"); 
if(!$rs2)
	echo "\n\n".$sq."\n".mysql_error()."\n\n";
else
	while ($data2 = mysql_fetch_row($rs2))
	{
		$db=$data2[0];
		$db_cha=PMA_getDbCollation($db);
		if ( in_array ( $db, $currentDBArray ) )
			if ( substr($db_cha[0],0,4)!='utf8' ) // limit to charset
			{
				mysql_select_db($db);
				$rs = mysql_query("SHOW TABLES"); 
				if(!$rs)
					echo "\n\n".$sq."\n".mysql_error()."\n\n";
				else
					while ($data = mysql_fetch_row($rs))
					{
						if ( substr ( $data[0], 0,2 ) == "ez" )
						{
							$rs1 = mysql_query("show FULL columns from $data[0]");
							
							if(!$rs1)
								echo "\n\n".$sq."\n".mysql_error()."\n\n";
							else
								while ($data1 = mysql_fetch_assoc($rs1))
								{
									if(in_array(array_shift(split("\\(",$data1['Type'],2)),array(
																				//'national char',
																				//'nchar',
																				//'national varchar',
																				//'nvarchar',
																				'char',
																				'varchar',
																				'tinytext',
																				'text',
																				'mediumtext',
																				'longtext',
																				'enum',
																				'set'
																				  ))) 
									 {
										if(substr($data1['Collation'],0,4)!='utf8') // limit to charset
										{
											$sq="ALTER TABLE `$data[0]` CHANGE `".$data1['Field'].'` `'.$data1['Field'].'` '.$data1['Type'].' CHARACTER SET binary '.($data1['Default']==''?'':($data1['Default']=='NULL'?' DEFAULT NULL':' DEFAULT \''.mysql_escape_string($data1['Default']).'\'')).($data1['Null']=='YES'?' NULL ':' NOT NULL').';';
											if(!$printonly&&!mysql_query($sq)) echo "\n\n".$sq."\n".mysql_error()."\n\n"; 
											else
											{
												echo ($sq."\n") ; 
												$sq="ALTER TABLE `$data[0]` CHANGE `".$data1['Field'].'` `'.$data1['Field'].'` '.$data1['Type']." CHARACTER SET $charset ".($collate==''?'':"COLLATE $collate").($data1['Default']==''?'':($data1['Default']=='NULL'?' DEFAULT NULL':' DEFAULT \''.mysql_escape_string($data1['Default']).'\'')).($data1['Null']=='YES'?' NULL ':' NOT NULL').($data1['Comment']==''?'':' COMMENT \''.mysql_escape_string($data1['Comment']).'\'').';';
												if(!$printonly&&!mysql_query($sq)) echo "\n\n".$sq."\n".mysql_error()."\n\n"; 
												else echo ($sq."\n") ; 
											}
										}
									}
								}
								if($altertablecharset)
								{
									/*
									  $sq='ALTER TABLE `'.$data[0]."` DEFAULT CHARACTER SET binary";
									  echo ($sq."\n") ; 
									  if(!mysql_query($sq)) echo "\n\n".$sq."\n".mysql_error()."\n\n";
									*/
									$sq='ALTER TABLE `'.$data[0]."` DEFAULT CHARACTER SET $charset ".($collate==''?'':"COLLATE $collate");
									echo ($sq."\n") ; 
									if(!$printonly)
										if(!mysql_query($sq)) echo "\n\n".$sq."\n".mysql_error()."\n\n";
								}
						}
						else
							echo $data[0] . " nicht geƤndert.\n";
						if( $alterdatabasecharser )
						{
						  /*
						  $sq='ALTER DATABASE `'.$data2[0]."` DEFAULT CHARACTER SET binary";
						  echo ($sq."\n") ; 
						  if(!mysql_query($sq)) echo "\n\n".$sq."\n".mysql_error()."\n\n";
						  */ 
						  $sq='ALTER DATABASE `'.$data2[0]."` DEFAULT CHARACTER SET $charset ".($collate==''?'':"COLLATE $collate");
						  echo ($sq."\n") ; 
							if(!$printonly)
								if(!mysql_query($sq)) echo "\n\n".$sq."\n".mysql_error()."\n\n";
						}
					}
				}
			}
?>

Then you need to change the ini-settings of ezpublish.

-> site.ini.append: charset at db entry
-> i18n.ini.append: charset-setting

After that, don't forget to clear the ezpublish cache completly.

HTH.

Best wishes,
Georg.

--
http://www.schicksal.com Horoskop website which uses eZ Publish since 2004

laurent le cadet

Tuesday 23 August 2005 8:42:30 am

Hi Georg,

Thanks for your repply.
First step : upgrade mysql...

After that, no risks for contents ?
Is the content of each table is re-encode ? (no need ?)

About the script, I presume I just have to launch it one time from the root for the site (for example) ?

Regards

Laurent.

Georg Franz

Tuesday 23 August 2005 9:56:46 am

Hi Laurent,

backup - backup - backup ... of course :-))

The script converts the tables first to a binary format and then to utf8, so no data should be lost.

The script simply produces sql strings for the conversion. If you run it the first time and the var $printonly is set to true (at the begin of the script), only the sql strings are written to the screen, nothing else happen.

If you really want to do the conversion, set $printonly to false.

Best wishes,
Georg.

--
http://www.schicksal.com Horoskop website which uses eZ Publish since 2004

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 19 2025 01:24:01
Script start
Timing: Jan 19 2025 01:24:01
Module start 'layout'
Timing: Jan 19 2025 01:24:01
Module start 'content'
Timing: Jan 19 2025 01:24:02
Module end 'content'
Timing: Jan 19 2025 01:24:02
Script end

Main resources:

Total runtime0.7658 sec
Peak memory usage4,096.0000 KB
Database Queries60

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0071 589.1563152.6250
Module start 'layout' 0.00710.0037 741.781339.4609
Module start 'content' 0.01090.7534 781.2422574.8359
Module end 'content' 0.76430.0015 1,356.078120.1641
Script end 0.7658  1,376.2422 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00360.4652160.0002
Check MTime0.00150.1951160.0001
Mysql Total
Database connection0.00070.097010.0007
Mysqli_queries0.698791.2347600.0116
Looping result0.00060.0772580.0000
Template Total0.727695.020.3638
Template load0.00160.213320.0008
Template processing0.725994.787120.3630
Template load and register function0.00010.012510.0001
states
state_id_array0.00190.248210.0019
state_identifier_array0.00090.119620.0005
Override
Cache load0.00140.1846410.0000
Sytem overhead
Fetch class attribute can translate value0.00080.109120.0004
Fetch class attribute name0.00090.115860.0001
XML
Image XML parsing0.00080.107320.0004
class_abstraction
Instantiating content class attribute0.00010.006980.0000
General
dbfile0.00090.1171220.0000
String conversion0.00000.001740.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
4content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
4content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
8content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
1content/datatype/view/ezxmltags/literal.tpl<No override>extension/community/design/standard/templates/content/datatype/view/ezxmltags/literal.tplEdit templateOverride template
3content/datatype/view/ezxmltags/line.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/line.tplEdit templateOverride template
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 22
 Number of unique templates used: 7

Time used to render debug report: 0.0001 secs