Forums / Developer / HOWTO Convert database from charset latin-1 to utf-8

HOWTO Convert database from charset latin-1 to utf-8

Author Message

Björn Dieding@xrow.de

Wednesday 21 December 2005 12:21:23 pm

What is the savest way to convert a hole database into another charset?

I did it with just dumping the data.

Sometimes it seems it does not work right.

I had to run this query since there stupid chars injected.

SELECT UNHEX ( 'C2A0' );

UPDATE ezcontentobject_attribute SET data_text = REPLACE ( data_text, UNHEX ( 'C2A0' ), ' ' );

I have no clue where those chars came from. Does anybody has an idea?

Looking for a new job? http://www.xrow.com/xrow-GmbH/Jobs
Looking for hosting? http://hostingezpublish.com
-----------------------------------------------------------------------------
GMT +01:00 Hannover, Germany
Web: http://www.xrow.com/

Georg Franz

Wednesday 21 December 2005 2:44:03 pm

Hi Björn,

which version of mysql?

The problem is: Content is saved in utf-8 in some table rows if you are using in latin-1. So if you convert the whole dump from latin-1 to utf-8 the content is messed up.

So I wrote some scripts to do the job "manually" ... i search the links after the post.

Best wishes,
Georg.

--
http://www.schicksal.com Horoskop website which uses eZ Publish since 2004

Georg Franz

Wednesday 21 December 2005 2:51:21 pm

Hi,

found the link:
http://ez.no/products/ez_publish_cms/documentation/configuration/configuration/language_and_charset/unicode_with_ez_publish

As I remember you need at least mysql 4.1.11+, because earlier versions have an utf-8 bug.

Best wishes,
Georg.

--
http://www.schicksal.com Horoskop website which uses eZ Publish since 2004

Björn Dieding@xrow.de

Wednesday 21 December 2005 2:59:21 pm

My versions are of course up to date... I moved the content from a mysql 3.x to a mysql 5 and back to a 4.1

Too said that your script is lost in the old documentation...

I wonder if the new documentation can properly answer this question ;-)

Looking for a new job? http://www.xrow.com/xrow-GmbH/Jobs
Looking for hosting? http://hostingezpublish.com
-----------------------------------------------------------------------------
GMT +01:00 Hannover, Germany
Web: http://www.xrow.com/

Georg Franz

Thursday 22 December 2005 4:59:53 am

Hi Björn,

http://ez.no/community/forum/setup_design/iso_8859_1_to_utf_8_conversion/re_iso_8859_1_to_utf_8_conversion__1

There is a script to change the charset + collation of the whole db at once. But be careful, it only works, if the content of all tables is in the same charset.

Best wishes,
Georg.

--
http://www.schicksal.com Horoskop website which uses eZ Publish since 2004

Xavier Dutoit

Thursday 22 December 2005 7:23:47 am

Let me share with you the pain I had with the conversion.

Dear microsoft has a tendency to put its non standard characters into the latin1 (the usual special quotes that polute the word documents). Are they your stupid caracters bt any chance ?

If you put them into latin1 and retrieve them, it works fine, and for the rest of us not having windows, well who cares about minorities ?

However, when I converted them to utf8, they ended up as plain garbage, and to make funny things even worse, the update to a new mysql version made things even messier (the default encoding was latin1, now the default is utf8).

I solved the problem with iconv, and sed to convert all the mischanged caracters into the dump (and an awful lot of time).

In short, start with utf8 from the start if you can. If you can't be really sure you don't have do to that in a hurry, otherwise you might be in a big problem.

Talking about the encoding: remember that you can't have one siteaccess in latin1 and the other in utf8. It has to be installwide.

Good luck.

http://www.sydesy.com

Yngve Bergheim

Tuesday 03 January 2006 3:58:14 am

As Georg Franz said, you also have to search and replace <?xml version="1.0" encoding="UTF-8"?> (or your current encoding) with <?xml version="1.0" encoding="utf-8"?> in an eZ sql dump file and then reimport the dump.

eZ debug

Timing: Jan 18 2025 19:20:10
Script start
Timing: Jan 18 2025 19:20:10
Module start 'content'
Timing: Jan 18 2025 19:20:10
Module end 'content'
Timing: Jan 18 2025 19:20:10
Script end

Main resources:

Total runtime0.2098 sec
Peak memory usage2,048.0000 KB
Database Queries141

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0073 587.8594180.8125
Module start 'content' 0.00730.0054 768.6719101.9297
Module end 'content' 0.01270.1970 870.6016530.2500
Script end 0.2097  1,400.8516 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00381.8291200.0002
Check MTime0.00160.7459200.0001
Mysql Total
Database connection0.00100.454310.0010
Mysqli_queries0.152072.45371410.0011
Looping result0.00130.59971390.0000
Template Total0.196793.810.1967
Template load0.00100.463810.0010
Template processing0.195793.293110.1957
Override
Cache load0.00070.336510.0007
Sytem overhead
Fetch class attribute can translate value0.00090.439010.0009
XML
Image XML parsing0.00030.159210.0003
General
dbfile0.00371.7748200.0002
String conversion0.00000.002830.0000
Note: percentages do not add up to 100% because some accumulators overlap

CSS/JS files loaded with "ezjscPacker" during request:

CacheTypePacklevelSourceFiles
CSS0extension/community/design/community/stylesheets/ext/jquery.autocomplete.css
extension/community_design/design/suncana/stylesheets/scrollbars.css
extension/community_design/design/suncana/stylesheets/tabs.css
extension/community_design/design/suncana/stylesheets/roadmap.css
extension/community_design/design/suncana/stylesheets/content.css
extension/community_design/design/suncana/stylesheets/star-rating.css
extension/community_design/design/suncana/stylesheets/syntax_and_custom_tags.css
extension/community_design/design/suncana/stylesheets/buttons.css
extension/community_design/design/suncana/stylesheets/tweetbox.css
extension/community_design/design/suncana/stylesheets/jquery.fancybox-1.3.4.css
extension/bcsmoothgallery/design/standard/stylesheets/magnific-popup.css
extension/sevenx/design/simple/stylesheets/star_rating.css
extension/sevenx/design/simple/stylesheets/libs/fontawesome/css/all.min.css
extension/sevenx/design/simple/stylesheets/main.v02.css
extension/sevenx/design/simple/stylesheets/main.v02.res.css
JS0extension/ezjscore/design/standard/lib/yui/3.17.2/build/yui/yui-min.js
extension/ezjscore/design/standard/javascript/jquery-3.7.0.min.js
extension/community_design/design/suncana/javascript/jquery.ui.core.min.js
extension/community_design/design/suncana/javascript/jquery.ui.widget.min.js
extension/community_design/design/suncana/javascript/jquery.easing.1.3.js
extension/community_design/design/suncana/javascript/jquery.ui.tabs.js
extension/community_design/design/suncana/javascript/jquery.hoverIntent.min.js
extension/community_design/design/suncana/javascript/jquery.popmenu.js
extension/community_design/design/suncana/javascript/jScrollPane.js
extension/community_design/design/suncana/javascript/jquery.mousewheel.js
extension/community_design/design/suncana/javascript/jquery.cycle.all.js
extension/sevenx/design/simple/javascript/jquery.scrollTo.js
extension/community_design/design/suncana/javascript/jquery.cookie.js
extension/community_design/design/suncana/javascript/ezstarrating_jquery.js
extension/community_design/design/suncana/javascript/jquery.initboxes.js
extension/community_design/design/suncana/javascript/app.js
extension/community_design/design/suncana/javascript/twitterwidget.js
extension/community_design/design/suncana/javascript/community.js
extension/community_design/design/suncana/javascript/roadmap.js
extension/community_design/design/suncana/javascript/ez.js
extension/community_design/design/suncana/javascript/ezshareevents.js
extension/sevenx/design/simple/javascript/main.js

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1pagelayout.tpl<No override>extension/sevenx/design/simple/templates/pagelayout.tplEdit templateOverride template
 Number of times templates used: 1
 Number of unique templates used: 1

Time used to render debug report: 0.0001 secs