How to import HTML into an eZXMLObject?

Author Message

Rainer Krauss

Thursday 25 June 2009 1:39:13 am

Hi,

I'm importing content into my eZ Publish installation.

Some of that content is an article, and the content subset for the body is formatted as XHTML. It When I try to import that into an eZXMLObject, the routine Luke describes at
http://serwatka.net/blog/ezxmltext_how_to_store_and_ouput_your_content
does not work properly.
$parser->process works fine, but eZXMLTextType::domString gives me PHP fatal errors. - Sometimes that is. It's working for XHTML tags such as b and i, but not for font, img, p, ...

Do you have experience with this and can shed some light on how I may import my data successfully? Is there maybe another way to import data .. do I need to do further data conversions (i.e. p tags can be imported when I replace them with paragraph tags) .. or is there something else I have not yet thought of?

Best wishes,
Rainer

Heath

Thursday 25 June 2009 1:50:43 am

We've done this before and published an example,
<i>http://svn.projects.ez.no/bcimportcsv/trunk/extension/bcimportcsv/bin/bccsvjoomlacontenttablehtmlimport.php</i>

The key here is to replace those tags. Our example transforms variable html into valid ezxml (including replacing img and a tags with content object embeds).

Cheers,
Heath

Brookins Consulting | http://brookinsconsulting.com/
Certified | http://auth.ez.no/certification/verify/380350
Solutions | http://projects.ez.no/users/community/brookins_consulting
eZpedia community documentation project | http://ezpedia.org

André R.

Thursday 25 June 2009 5:36:34 am

Another idea is to use the html parser in Online Editor (5.0), since it already supports quite much (x)html. But I have not had any time to test it, so don't have any code examples other then the one in ezoe.
see: eZOEXmlInput::validateInput() in http://svn.projects.ez.no/ezoe/trunk/ezoe/ezxmltext/handlers/input/ezoexmlinput.php

It will not handle images though, as those are embed tags in ezxml, and you'll need to first import the image in eZ and add a id on the image tag in the form "eZObject_<object_id>".

eZ Online Editor 5: http://projects.ez.no/ezoe || eZJSCore (Ajax): http://projects.ez.no/ezjscore || eZ Publish EE http://ez.no/eZPublish/eZ-Publish-Enterprise-Subscription
@: http://twitter.com/andrerom

Rainer Krauss

Thursday 02 July 2009 12:13:41 am

Thank you, Heath and Andre.

Is there an overview on which HTML tags eZ Publish would accept, please?

André R.

Thursday 02 July 2009 12:59:11 am

The normal xml handler is documented here:
http://ez.no/doc/ez_publish/technical_manual/4_0/reference/xml_tags
It will accept <h[1-6]> in input as well as of 4.1.

The xml handler in OE will accept the html variants of the tags there, where:
literal -> <pre>
anchor -> <a name="">
embed (image) -> <img id="eZObject_<object_id>" />
In addtion the <u>, <sup> and <sub> tags are mapped to custom tags if enabled.

It is not documented since it was not meant for external imports. So at the moment, enable the 'code' button in ezoe.ini to be able to take a look at what kind of xhtml it uses internally(or use firebug or similar point and click html debuggers).

eZ Online Editor 5: http://projects.ez.no/ezoe || eZJSCore (Ajax): http://projects.ez.no/ezjscore || eZ Publish EE http://ez.no/eZPublish/eZ-Publish-Enterprise-Subscription
@: http://twitter.com/andrerom

Rainer Krauss

Monday 06 July 2009 2:25:59 am

Thank you André.

Say, does the parser work case sensitively? Does it mind in case the text to parse contains a paragraph tag starting with <P instead of <p ?

Best wishes,
Rainer

Rainer Krauss

Monday 06 July 2009 2:42:07 am

...it's not the parser that's being selective on case, but XHTML by definition requires all tags to be lower case, different from HTML.

I thus made all HTML tags in the text I parse lowercase using the following PHP function found here: http://www.codingforums.com/archive/index.php/t-108303.html

function lowerCaseHTML($Matches) {

if (preg_match("/<([^>]+)(\s\w+)=([^>]+)>/i", $Matches[1], $NewMatch)) {
return "<" . strtolower($NewMatch[1]) . strtolower($NewMatch[2]) . "=" . $NewMatch[3] . ">";

} else {
return strtolower($Matches[1]);

}

}

André R.

Monday 06 July 2009 2:44:56 am

Its not case sensitive when it comes to tag and attribute name, but it is on tag text content and attribute values (like you would expect :) ).

eZ Online Editor 5: http://projects.ez.no/ezoe || eZJSCore (Ajax): http://projects.ez.no/ezjscore || eZ Publish EE http://ez.no/eZPublish/eZ-Publish-Enterprise-Subscription
@: http://twitter.com/andrerom

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 18 2025 15:53:29
Script start
Timing: Jan 18 2025 15:53:29
Module start 'layout'
Timing: Jan 18 2025 15:53:29
Module start 'content'
Timing: Jan 18 2025 15:53:30
Module end 'content'
Timing: Jan 18 2025 15:53:30
Script end

Main resources:

Total runtime0.9337 sec
Peak memory usage4,096.0000 KB
Database Queries74

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0072 589.1641152.6406
Module start 'layout' 0.00720.0027 741.804739.4609
Module start 'content' 0.01000.9221 781.2656649.6563
Module end 'content' 0.93210.0016 1,430.921920.1563
Script end 0.9337  1,451.0781 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00340.3626160.0002
Check MTime0.00140.1512160.0001
Mysql Total
Database connection0.00150.160210.0015
Mysqli_queries0.856891.7591740.0116
Looping result0.00070.0793720.0000
Template Total0.896096.020.4480
Template load0.00200.211020.0010
Template processing0.894095.744620.4470
Template load and register function0.00010.015010.0001
states
state_id_array0.00060.063110.0006
state_identifier_array0.00260.279020.0013
Override
Cache load0.00180.1900560.0000
Sytem overhead
Fetch class attribute can translate value0.00100.105330.0003
Fetch class attribute name0.00180.1975100.0002
XML
Image XML parsing0.00350.374630.0012
class_abstraction
Instantiating content class attribute0.00000.0031120.0000
General
dbfile0.00470.5057240.0002
String conversion0.00000.000740.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
8content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
15content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
10content/datatype/view/ezxmltags/line.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/line.tplEdit templateOverride template
4content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 39
 Number of unique templates used: 6

Time used to render debug report: 0.0001 secs