Can shorten be made to shorten in unicode units rather than bytes?

Author Message

Sean Carney

Friday 06 February 2004 2:30:16 pm

We are really happy with the Shorten function and do not mind that it cuts off words. But, we do have a problem where it cuts off unicode caracters in the middle and creates a garbage character. You can see an example at our page http://nsd.hopetalk.org

We need to find a way to have shorten cut off based on unicode units.

Marco Zinn

Saturday 07 February 2004 11:46:09 am

I'm not into unicode, but splitting the unicode character should not happen.
I suggest, that you file a bug report.

Marco
http://www.hyperroad-design.com

Sean Carney

Sunday 22 February 2004 9:37:28 pm

Thank you Marco. I filed a bug report. It also seems strange that shorten is cutting off some bytes even if the characters displayed are less then then characters that have been specified.

Jan Borsodi

Wednesday 03 March 2004 7:31:59 am

PHP itself does not support Unicode internally. You can get some support with the mbstring extension and overriding internal text functions but not all of PHP will support it.

We also use the mbstring extension (if available) to perform conversion when it's needed (instead of all the time). However our i18n system does not support text operation such as extraction a portion of it yet. This means that all template operators that modify text will not work on Unicode characters.

The reason for the cutoff is the UTF8 encoding (which encodes Unicode characters), each Unicode character will be represented in an UTF8 encoding which can vary from 1 byte to 6 bytes. (1-3 is the most common).
This means that a string that has three characters can actually be 4 or more bytes, and since PHP only sees each byte as a character it will cut off at the wrong place.

The only way to get support for this is create all the various text operations that are being used in the operators and place them in the i18n library. Then change the operators to use that functionality.
However this is not a small task, especially considering problems such as case mapping (lowercase, uppercase etc.).

--
Amos

Documentation: http://ez.no/ez_publish/documentation
FAQ: http://ez.no/ez_publish/documentation/faq

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 19 2025 01:00:13
Script start
Timing: Jan 19 2025 01:00:13
Module start 'layout'
Timing: Jan 19 2025 01:00:13
Module start 'content'
Timing: Jan 19 2025 01:00:15
Module end 'content'
Timing: Jan 19 2025 01:00:15
Script end

Main resources:

Total runtime1.1597 sec
Peak memory usage4,096.0000 KB
Database Queries62

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0053 590.5469152.6563
Module start 'layout' 0.00530.0035 743.203139.5234
Module start 'content' 0.00891.1487 782.7266588.0938
Module end 'content' 1.15750.0022 1,370.820312.4063
Script end 1.1597  1,383.2266 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00340.2943160.0002
Check MTime0.00160.1349160.0001
Mysql Total
Database connection0.00050.041310.0005
Mysqli_queries1.103495.1390620.0178
Looping result0.00070.0638600.0000
Template Total1.125597.020.5627
Template load0.00220.188920.0011
Template processing1.123396.855320.5616
Template load and register function0.00040.032210.0004
states
state_id_array0.00180.157910.0018
state_identifier_array0.00150.125920.0007
Override
Cache load0.00180.1540190.0001
Sytem overhead
Fetch class attribute can translate value0.00150.132130.0005
Fetch class attribute name0.00130.109560.0002
XML
Image XML parsing0.00190.164330.0006
class_abstraction
Instantiating content class attribute0.00000.001160.0000
General
dbfile0.00200.1685230.0001
String conversion0.00000.000940.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
4content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
6content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
2content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
3content/datatype/view/ezxmltags/line.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/line.tplEdit templateOverride template
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 17
 Number of unique templates used: 6

Time used to render debug report: 0.0001 secs