Forums / Developer / How to import HTML into an eZXMLObject?

How to import HTML into an eZXMLObject?

Author Message

Rainer Krauss

Thursday 25 June 2009 1:39:13 am

Hi,

I'm importing content into my eZ Publish installation.

Some of that content is an article, and the content subset for the body is formatted as XHTML. It When I try to import that into an eZXMLObject, the routine Luke describes at
http://serwatka.net/blog/ezxmltext_how_to_store_and_ouput_your_content
does not work properly.
$parser->process works fine, but eZXMLTextType::domString gives me PHP fatal errors. - Sometimes that is. It's working for XHTML tags such as b and i, but not for font, img, p, ...

Do you have experience with this and can shed some light on how I may import my data successfully? Is there maybe another way to import data .. do I need to do further data conversions (i.e. p tags can be imported when I replace them with paragraph tags) .. or is there something else I have not yet thought of?

Best wishes,
Rainer

Heath

Thursday 25 June 2009 1:50:43 am

We've done this before and published an example,
<i>http://svn.projects.ez.no/bcimportcsv/trunk/extension/bcimportcsv/bin/bccsvjoomlacontenttablehtmlimport.php</i>

The key here is to replace those tags. Our example transforms variable html into valid ezxml (including replacing img and a tags with content object embeds).

Cheers,
Heath

Brookins Consulting | http://brookinsconsulting.com/
Certified | http://auth.ez.no/certification/verify/380350
Solutions | http://projects.ez.no/users/community/brookins_consulting
eZpedia community documentation project | http://ezpedia.org

André R.

Thursday 25 June 2009 5:36:34 am

Another idea is to use the html parser in Online Editor (5.0), since it already supports quite much (x)html. But I have not had any time to test it, so don't have any code examples other then the one in ezoe.
see: eZOEXmlInput::validateInput() in http://svn.projects.ez.no/ezoe/trunk/ezoe/ezxmltext/handlers/input/ezoexmlinput.php

It will not handle images though, as those are embed tags in ezxml, and you'll need to first import the image in eZ and add a id on the image tag in the form "eZObject_<object_id>".

eZ Online Editor 5: http://projects.ez.no/ezoe || eZJSCore (Ajax): http://projects.ez.no/ezjscore || eZ Publish EE http://ez.no/eZPublish/eZ-Publish-Enterprise-Subscription
@: http://twitter.com/andrerom

Rainer Krauss

Thursday 02 July 2009 12:13:41 am

Thank you, Heath and Andre.

Is there an overview on which HTML tags eZ Publish would accept, please?

André R.

Thursday 02 July 2009 12:59:11 am

The normal xml handler is documented here:
http://ez.no/doc/ez_publish/technical_manual/4_0/reference/xml_tags
It will accept <h[1-6]> in input as well as of 4.1.

The xml handler in OE will accept the html variants of the tags there, where:
literal -> <pre>
anchor -> <a name="">
embed (image) -> <img id="eZObject_<object_id>" />
In addtion the <u>, <sup> and <sub> tags are mapped to custom tags if enabled.

It is not documented since it was not meant for external imports. So at the moment, enable the 'code' button in ezoe.ini to be able to take a look at what kind of xhtml it uses internally(or use firebug or similar point and click html debuggers).

eZ Online Editor 5: http://projects.ez.no/ezoe || eZJSCore (Ajax): http://projects.ez.no/ezjscore || eZ Publish EE http://ez.no/eZPublish/eZ-Publish-Enterprise-Subscription
@: http://twitter.com/andrerom

Rainer Krauss

Monday 06 July 2009 2:25:59 am

Thank you André.

Say, does the parser work case sensitively? Does it mind in case the text to parse contains a paragraph tag starting with <P instead of <p ?

Best wishes,
Rainer

Rainer Krauss

Monday 06 July 2009 2:42:07 am

...it's not the parser that's being selective on case, but XHTML by definition requires all tags to be lower case, different from HTML.

I thus made all HTML tags in the text I parse lowercase using the following PHP function found here: http://www.codingforums.com/archive/index.php/t-108303.html

function lowerCaseHTML($Matches) {

if (preg_match("/<([^>]+)(\s\w+)=([^>]+)>/i", $Matches[1], $NewMatch)) {
return "<" . strtolower($NewMatch[1]) . strtolower($NewMatch[2]) . "=" . $NewMatch[3] . ">";

} else {
return strtolower($Matches[1]);

}

}

André R.

Monday 06 July 2009 2:44:56 am

Its not case sensitive when it comes to tag and attribute name, but it is on tag text content and attribute values (like you would expect :) ).

eZ Online Editor 5: http://projects.ez.no/ezoe || eZJSCore (Ajax): http://projects.ez.no/ezjscore || eZ Publish EE http://ez.no/eZPublish/eZ-Publish-Enterprise-Subscription
@: http://twitter.com/andrerom

eZ debug

Timing: Jan 19 2025 18:59:55
Script start
Timing: Jan 19 2025 18:59:55
Module start 'content'
Timing: Jan 19 2025 18:59:55
Module end 'content'
Timing: Jan 19 2025 18:59:56
Script end

Main resources:

Total runtime0.1818 sec
Peak memory usage4,096.0000 KB
Database Queries141

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0063 589.0547180.8281
Module start 'content' 0.00630.0061 769.8828105.8828
Module end 'content' 0.01240.1693 875.7656535.0781
Script end 0.1817  1,410.8438 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00331.8089200.0002
Check MTime0.00130.7098200.0001
Mysql Total
Database connection0.00060.348610.0006
Mysqli_queries0.134173.74791410.0010
Looping result0.00120.68461390.0000
Template Total0.169092.910.1690
Template load0.00080.450410.0008
Template processing0.168292.477610.1682
Override
Cache load0.00060.320610.0006
Sytem overhead
Fetch class attribute can translate value0.00060.349710.0006
XML
Image XML parsing0.00020.136410.0002
General
dbfile0.00884.8356200.0004
String conversion0.00000.003330.0000
Note: percentages do not add up to 100% because some accumulators overlap

CSS/JS files loaded with "ezjscPacker" during request:

CacheTypePacklevelSourceFiles
CSS0extension/community/design/community/stylesheets/ext/jquery.autocomplete.css
extension/community_design/design/suncana/stylesheets/scrollbars.css
extension/community_design/design/suncana/stylesheets/tabs.css
extension/community_design/design/suncana/stylesheets/roadmap.css
extension/community_design/design/suncana/stylesheets/content.css
extension/community_design/design/suncana/stylesheets/star-rating.css
extension/community_design/design/suncana/stylesheets/syntax_and_custom_tags.css
extension/community_design/design/suncana/stylesheets/buttons.css
extension/community_design/design/suncana/stylesheets/tweetbox.css
extension/community_design/design/suncana/stylesheets/jquery.fancybox-1.3.4.css
extension/bcsmoothgallery/design/standard/stylesheets/magnific-popup.css
extension/sevenx/design/simple/stylesheets/star_rating.css
extension/sevenx/design/simple/stylesheets/libs/fontawesome/css/all.min.css
extension/sevenx/design/simple/stylesheets/main.v02.css
extension/sevenx/design/simple/stylesheets/main.v02.res.css
JS0extension/ezjscore/design/standard/lib/yui/3.17.2/build/yui/yui-min.js
extension/ezjscore/design/standard/javascript/jquery-3.7.0.min.js
extension/community_design/design/suncana/javascript/jquery.ui.core.min.js
extension/community_design/design/suncana/javascript/jquery.ui.widget.min.js
extension/community_design/design/suncana/javascript/jquery.easing.1.3.js
extension/community_design/design/suncana/javascript/jquery.ui.tabs.js
extension/community_design/design/suncana/javascript/jquery.hoverIntent.min.js
extension/community_design/design/suncana/javascript/jquery.popmenu.js
extension/community_design/design/suncana/javascript/jScrollPane.js
extension/community_design/design/suncana/javascript/jquery.mousewheel.js
extension/community_design/design/suncana/javascript/jquery.cycle.all.js
extension/sevenx/design/simple/javascript/jquery.scrollTo.js
extension/community_design/design/suncana/javascript/jquery.cookie.js
extension/community_design/design/suncana/javascript/ezstarrating_jquery.js
extension/community_design/design/suncana/javascript/jquery.initboxes.js
extension/community_design/design/suncana/javascript/app.js
extension/community_design/design/suncana/javascript/twitterwidget.js
extension/community_design/design/suncana/javascript/community.js
extension/community_design/design/suncana/javascript/roadmap.js
extension/community_design/design/suncana/javascript/ez.js
extension/community_design/design/suncana/javascript/ezshareevents.js
extension/sevenx/design/simple/javascript/main.js

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1pagelayout.tpl<No override>extension/sevenx/design/simple/templates/pagelayout.tplEdit templateOverride template
 Number of times templates used: 1
 Number of unique templates used: 1

Time used to render debug report: 0.0002 secs