Can't save article: Invalid UNICODE character sequence found

Author Message

Erik Ziesler

Monday 02 May 2005 10:37:07 am

eZ publish can't save article due to "Invalid UNICODE character sequence". What is causing this problem and how do I solve it?

eZ publish debug: http://www.infotorg.net/ez_debug.htm

kracker (the)

Monday 02 May 2005 7:36:00 pm

I wonder....

How would one test this breakdown?

How would you break down the content to find the specific character which eZ publish is having qualms with?

For Erik, I could see how you might try to submit smaller chunks of the content until you found the "block" of text which contains the "Invalid Unicode character sequence".

For others trying to replicate the problem it becomes more complicated as I in the USA don't deal with Unicode very often and as such I'm not that familiar with it...I just don't know how that would work..

But even if I don't know I can see if I can't try to break down the error message.

From the eZ debug information, It looks like Postgress(DB) is kicking the error.

It kicks the error while performing the update query ...

data_text='<?xml version="1.0" encoding="UTF-8"?>
<section xmlns:image="http://ez.no/namespaces/ezpublish3/image/"
xmlns:xhtml="http://ez.no/namespaces/ezpublish3/xhtml/"
xmlns:custom="http://ez.no/namespaces/ezpublish3/custom/">
<paragraph>Dette er ingressen. Den største skriftstykket kommer nedenfor. Ære være årelange forsøk på å utvikle den perfekte CMS-en.</paragraph>
</section>'

Now this is just a guess but is this a new problem with non-english content or a suddenly appearing problem with a brand new configuration.

I don't know what I'm talking about but I would guess that PostgreSQL may benefit from configuration to support non-english characters or something along those lines ...

It's not an answer but it's an idea,

//kracker

Can I kick it ? - A Tribe Called Quest

Member since: 2001.07.13 || http://ezpedia.se7enx.com/

kracker (the)

Monday 02 May 2005 7:52:11 pm

A little more looking and it seems that error is known in the postgresql nets ...

general:
http://www.google.com/search?num=50&hl=en&lr=&safe=off&c2coff=1&q=ERROR%3A+Invalid+UNICODE+character+sequence+found+&btnG=Search

very close: http://www.issociate.de/board/post/135979/Unicode_problem_inserting_records_-_Invalid_UNICODE_character_sequence_found_(0xfc7269).html
http://www.issociate.de/board/post/2862/Pb_with_the_French_accentuated_characters.html

It could simply be a language configuration breakdown between input text -> browser -> server -> eZ publish -> database ...

If it's not configuration (and it might not be, in leu of certain evidence). Then I would say your running : PostgreSQL 7.3.4 and not the suggested 7.3.6 (from the 2nd issociate.de link above) ...

//kracker

Grouch - Once Upon A Rhyme

Member since: 2001.07.13 || http://ezpedia.se7enx.com/

kracker (the)

Monday 02 May 2005 8:19:06 pm

Well,

It was fun! I hope you all enjoyed it more than I did ;)

It was a good way to eat my mini-pizza dinner and take a few swings at an odd little bugger of an issue that had has been backgrounding in my head for a little while.

I think the ideas brought up really take a few solid steps in the right direction.

I think it will take a little more testing to be certain but once certain a plan of action / resolution becomes clear fairly quickly.

//kracker
cheaper than paying for support ....

2pac__tupac : there_u_go

2pac__tupac : still ballin

Member since: 2001.07.13 || http://ezpedia.se7enx.com/

kracker (the)

Tuesday 03 May 2005 7:10:58 pm

Back to the lab again ...

I also wanted to note that Erik's PostgreSQL DB is (and has been confirmed to be) encoded in UNICODE.

//kracker

<i>anticon : family values : making love to your disk drive</i>

<b>anticon : family values : games (molemen feat. sebutone)</b>

Member since: 2001.07.13 || http://ezpedia.se7enx.com/

Erik Ziesler

Wednesday 04 May 2005 9:30:36 am

I think it might be related to the PostgreSQL database 7.3.4. I have found out that it is accepting the sequence 'æøå', but not 'æ ø å'. It won't accept the letters 'æ', 'ø' or 'å', or the sequences 'Rå' and 'Rø'.

Erik Ziesler

Wednesday 04 May 2005 4:05:55 pm

I thought it had something to do with PostgreSQL ..., and it had <i>something</i> to do with the database, but the problem was really that the character encoding was not uniform. The string in site.ini specifying the character encoding for the database was also empty. When I changed all the character encoding settings (site.ini, template.ini, i18n.ini) to utf-8 I was able to save the article I previously couldn't. Because utf-8 is not working with the .pdf output, I made a new PostgreSQL database with LATIN10, installed the new eZ publish 3.5.2 and changed all encoding settings to iso-8859-15. Unfortunately I aquired another problem with the new eZ publish installation, one which I might post at a later time.

Thanks Kracker for putting me on the trail or rather pointing out two specific, probable causes.

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 19 2025 01:06:07
Script start
Timing: Jan 19 2025 01:06:07
Module start 'layout'
Timing: Jan 19 2025 01:06:07
Module start 'content'
Timing: Jan 19 2025 01:06:08
Module end 'content'
Timing: Jan 19 2025 01:06:08
Script end

Main resources:

Total runtime0.6545 sec
Peak memory usage4,096.0000 KB
Database Queries69

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0068 595.9766152.6563
Module start 'layout' 0.00680.0027 748.632839.5078
Module start 'content' 0.00950.6429 788.1406591.7188
Module end 'content' 0.65240.0021 1,379.859420.0938
Script end 0.6544  1,399.9531 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00400.6079160.0002
Check MTime0.00160.2385160.0001
Mysql Total
Database connection0.00120.181110.0012
Mysqli_queries0.576588.0843690.0084
Looping result0.00080.1230670.0000
Template Total0.618394.520.3091
Template load0.00210.320320.0010
Template processing0.616294.152420.3081
Template load and register function0.00020.035010.0002
states
state_id_array0.00140.211010.0014
state_identifier_array0.00210.315320.0010
Override
Cache load0.00190.2832550.0000
Sytem overhead
Fetch class attribute can translate value0.00090.135220.0004
Fetch class attribute name0.00130.197680.0002
XML
Image XML parsing0.00080.120420.0004
class_abstraction
Instantiating content class attribute0.00000.0041110.0000
General
dbfile0.00220.3341180.0001
String conversion0.00000.001240.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
7content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
11content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
4content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
1content/datatype/view/ezxmltags/literal.tpl<No override>extension/community/design/standard/templates/content/datatype/view/ezxmltags/literal.tplEdit templateOverride template
3content/datatype/view/ezxmltags/line.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/line.tplEdit templateOverride template
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 28
 Number of unique templates used: 7

Time used to render debug report: 0.0001 secs