HOWTO Convert database from charset latin-1 to utf-8

Author Message

Björn Dieding@xrow.de

Wednesday 21 December 2005 12:21:23 pm

What is the savest way to convert a hole database into another charset?

I did it with just dumping the data.

Sometimes it seems it does not work right.

I had to run this query since there stupid chars injected.

SELECT UNHEX ( 'C2A0' );

UPDATE ezcontentobject_attribute SET data_text = REPLACE ( data_text, UNHEX ( 'C2A0' ), ' ' );

I have no clue where those chars came from. Does anybody has an idea?

Looking for a new job? http://www.xrow.com/xrow-GmbH/Jobs
Looking for hosting? http://hostingezpublish.com
-----------------------------------------------------------------------------
GMT +01:00 Hannover, Germany
Web: http://www.xrow.com/

Georg Franz

Wednesday 21 December 2005 2:44:03 pm

Hi Björn,

which version of mysql?

The problem is: Content is saved in utf-8 in some table rows if you are using in latin-1. So if you convert the whole dump from latin-1 to utf-8 the content is messed up.

So I wrote some scripts to do the job "manually" ... i search the links after the post.

Best wishes,
Georg.

--
http://www.schicksal.com Horoskop website which uses eZ Publish since 2004

Georg Franz

Wednesday 21 December 2005 2:51:21 pm

Hi,

found the link:
http://ez.no/products/ez_publish_cms/documentation/configuration/configuration/language_and_charset/unicode_with_ez_publish

As I remember you need at least mysql 4.1.11+, because earlier versions have an utf-8 bug.

Best wishes,
Georg.

--
http://www.schicksal.com Horoskop website which uses eZ Publish since 2004

Björn Dieding@xrow.de

Wednesday 21 December 2005 2:59:21 pm

My versions are of course up to date... I moved the content from a mysql 3.x to a mysql 5 and back to a 4.1

Too said that your script is lost in the old documentation...

I wonder if the new documentation can properly answer this question ;-)

Looking for a new job? http://www.xrow.com/xrow-GmbH/Jobs
Looking for hosting? http://hostingezpublish.com
-----------------------------------------------------------------------------
GMT +01:00 Hannover, Germany
Web: http://www.xrow.com/

Georg Franz

Thursday 22 December 2005 4:59:53 am

Hi Björn,

http://ez.no/community/forum/setup_design/iso_8859_1_to_utf_8_conversion/re_iso_8859_1_to_utf_8_conversion__1

There is a script to change the charset + collation of the whole db at once. But be careful, it only works, if the content of all tables is in the same charset.

Best wishes,
Georg.

--
http://www.schicksal.com Horoskop website which uses eZ Publish since 2004

Xavier Dutoit

Thursday 22 December 2005 7:23:47 am

Let me share with you the pain I had with the conversion.

Dear microsoft has a tendency to put its non standard characters into the latin1 (the usual special quotes that polute the word documents). Are they your stupid caracters bt any chance ?

If you put them into latin1 and retrieve them, it works fine, and for the rest of us not having windows, well who cares about minorities ?

However, when I converted them to utf8, they ended up as plain garbage, and to make funny things even worse, the update to a new mysql version made things even messier (the default encoding was latin1, now the default is utf8).

I solved the problem with iconv, and sed to convert all the mischanged caracters into the dump (and an awful lot of time).

In short, start with utf8 from the start if you can. If you can't be really sure you don't have do to that in a hurry, otherwise you might be in a big problem.

Talking about the encoding: remember that you can't have one siteaccess in latin1 and the other in utf8. It has to be installwide.

Good luck.

http://www.sydesy.com

Yngve Bergheim

Tuesday 03 January 2006 3:58:14 am

As Georg Franz said, you also have to search and replace <?xml version="1.0" encoding="UTF-8"?> (or your current encoding) with <?xml version="1.0" encoding="utf-8"?> in an eZ sql dump file and then reimport the dump.

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 18 2025 19:12:11
Script start
Timing: Jan 18 2025 19:12:11
Module start 'layout'
Timing: Jan 18 2025 19:12:11
Module start 'content'
Timing: Jan 18 2025 19:12:12
Module end 'content'
Timing: Jan 18 2025 19:12:12
Script end

Main resources:

Total runtime1.1425 sec
Peak memory usage4,096.0000 KB
Database Queries73

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0057 589.1797152.6406
Module start 'layout' 0.00570.0027 741.820339.4766
Module start 'content' 0.00851.1323 781.2969662.2109
Module end 'content' 1.14080.0017 1,443.507820.1250
Script end 1.1425  1,463.6328 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00330.2924160.0002
Check MTime0.00130.1110160.0001
Mysql Total
Database connection0.00100.090010.0010
Mysqli_queries1.069893.6323730.0147
Looping result0.00070.0651710.0000
Template Total1.106696.920.5533
Template load0.00240.208620.0012
Template processing1.104296.650220.5521
Template load and register function0.00010.009910.0001
states
state_id_array0.00130.115510.0013
state_identifier_array0.00230.197620.0011
Override
Cache load0.00210.1836450.0000
Sytem overhead
Fetch class attribute can translate value0.00080.066340.0002
Fetch class attribute name0.00130.1101100.0001
XML
Image XML parsing0.00160.141240.0004
class_abstraction
Instantiating content class attribute0.00000.0019130.0000
General
dbfile0.00100.0911280.0000
String conversion0.00000.000740.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
6content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
7content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
8content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
1content/datatype/view/ezxmltags/line.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/line.tplEdit templateOverride template
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 24
 Number of unique templates used: 6

Time used to render debug report: 0.0001 secs