Forums / Developer / Howto import lot of html pages into eZPublish nodes ?

Howto import lot of html pages into eZPublish nodes ?

Author Message

Denis Zatsarinny

Thursday 26 October 2006 4:41:09 am

Hi,

Howto import lot of html pages into eZPublish nodes ?

Anybody do this?

Bye.

Xavier Dutoit

Friday 27 October 2006 1:44:17 am

Never easy.

I'd first download the html page, tidy them and parse the content I need to import. Then start with some kind of xml import (have a look at the contrib).

That means php dev anyway.

Good luck

X+

http://www.sydesy.com

Denis Zatsarinny

Tuesday 31 October 2006 12:04:37 am

Hi

>Never easy.
>I'd first download the html page, tidy them and parse the content I need to import. Then start >with some kind of xml import (have a look at the contrib).

Ok - I wrote a php script what convert mambo content in the eZ structures (Folders & Articles) - but mambo articles include very UGLY html content - and I found 3 error during run this script with eZPublisg 3.8.5

1. Fatal error: Call to a member function on a non-object in /srv/www/htdocs/ezportal/kernel/classes/datatypes/ezxmltext/input/ezxmlsimplifiedinputparser.php on line 611

orig: if ( $parent->nodeName == 'line' && !count( $parent->Children ) )

I am replace: if ( $parent->nodeName == 'line' && !count( $parent->Children ) && is_object($parent->parentNode) )

2. Fatal error: Call to a member function on a non-object in /srv/www/htdocs/ezportal/kernel/classes/datatypes/ezxmltext/ezxmlinputparser.php on line 772
orig: function &processSubtree( &$element, &$lastHandlerResult )
{
$ret = null;
$tmp = null;

I am replace: function &processSubtree( &$element, &$lastHandlerResult )
{
$ret = null;
$tmp = null;

if(!is_object($element)) return $ret;

but after this patch I got the fatal error - segmentation fault (ugly)

Can somebody recommdate me algoritm/soft/etc. what convert HTML in to eZXML ?

I was trying use tidy form http://tidy.sf.net - but has same result
If I use tidy options such as <b>--clean true, --word-2000 true</b> - I got empty content

 

 

Joe Kepley

Tuesday 31 October 2006 7:36:11 am

This sounds extremely hairy, and if you have a lot of pages with poorly-structured HTML, you're bound to have some things that don't fit into eZXML's content structure.

If I were faced with this, I'd look at importing it directly as HTML into an HTML datatype, and use something like TinyMCE or FCKEditor to provide the WYSIWYG. eZXML has a lot of benefits, but trying to coax HTML into eZXML (when it didn't start out that way) would be like trying to put smoke in a bottle.

Denis Zatsarinny

Tuesday 31 October 2006 10:01:24 am

Hi,

Dear Joe:

Using TinyMCE || FCKeditor - is not solution for this trouble

I was using FCEditor + Mambo for intranet site - and now have this problem.
I was found way to deploy mambo content to eZ structures - but ~20% of old mambo articles include MSO generated html and have size more then 2Mb. eZXML converter cannot process this documents - even tidy processed - but these documents very important

Bye

Denis Zatsarinny

Wednesday 01 November 2006 11:57:31 pm

Hi

Great news - I am upgrading from eZPublish 3.8.4 -> 3.8.6 - and any PHP error like this: Fatal error: Call to a member function on a non-object in - but (ugly) - I found internal PHP error: <b>*** glibc detected *** double free or corruption (fasttop): 0x1202fd98 ***</b> - but this error internal PHP interpretator bug

Bye

kracker (the)

Sunday 05 November 2006 3:51:48 pm

<b>@Denis</b>,

Please file a bug report.
<i>http://issues.ez.no/IssueEdit.php?ProjectId=3</i>

You can also help us repeat your bug more accurately
by sharing your mambo import script as a contribution.
If your concerned about it not being 100% complete, you may mark it as unstable.

- Login,
- Click, <i>http://ez.no/community/contribs/import_export</i>
- Click, upload contribution
- Complete and submit form
- Post a link in this forum thread to the contribution

//kracker
<i>The GNU/Linux Action Show! Podcast</i>

Member since: 2001.07.13 || http://ezpedia.se7enx.com/

eZ debug

Timing: Jan 18 2025 16:10:07
Script start
Timing: Jan 18 2025 16:10:07
Module start 'content'
Timing: Jan 18 2025 16:10:07
Module end 'content'
Timing: Jan 18 2025 16:10:08
Script end

Main resources:

Total runtime0.8959 sec
Peak memory usage4,096.0000 KB
Database Queries210

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0067 588.9922180.8125
Module start 'content' 0.00670.7681 769.8047684.3203
Module end 'content' 0.77480.1211 1,454.1250341.0547
Script end 0.8959  1,795.1797 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00410.4599210.0002
Check MTime0.00160.1742210.0001
Mysql Total
Database connection0.00080.086310.0008
Mysqli_queries0.804489.78322100.0038
Looping result0.00230.25322080.0000
Template Total0.866696.720.4333
Template load0.00210.234520.0011
Template processing0.864596.487120.4322
Template load and register function0.00010.013210.0001
states
state_id_array0.00160.178210.0016
state_identifier_array0.00100.106520.0005
Override
Cache load0.00190.2074720.0000
Sytem overhead
Fetch class attribute can translate value0.00150.166750.0003
Fetch class attribute name0.00120.1298100.0001
XML
Image XML parsing0.00220.243050.0004
class_abstraction
Instantiating content class attribute0.00000.0031130.0000
General
dbfile0.00400.4440410.0001
String conversion0.00000.000330.0000
Note: percentages do not add up to 100% because some accumulators overlap

CSS/JS files loaded with "ezjscPacker" during request:

CacheTypePacklevelSourceFiles
CSS0extension/community/design/community/stylesheets/ext/jquery.autocomplete.css
extension/community_design/design/suncana/stylesheets/scrollbars.css
extension/community_design/design/suncana/stylesheets/tabs.css
extension/community_design/design/suncana/stylesheets/roadmap.css
extension/community_design/design/suncana/stylesheets/content.css
extension/community_design/design/suncana/stylesheets/star-rating.css
extension/community_design/design/suncana/stylesheets/syntax_and_custom_tags.css
extension/community_design/design/suncana/stylesheets/buttons.css
extension/community_design/design/suncana/stylesheets/tweetbox.css
extension/community_design/design/suncana/stylesheets/jquery.fancybox-1.3.4.css
extension/bcsmoothgallery/design/standard/stylesheets/magnific-popup.css
extension/sevenx/design/simple/stylesheets/star_rating.css
extension/sevenx/design/simple/stylesheets/libs/fontawesome/css/all.min.css
extension/sevenx/design/simple/stylesheets/main.v02.css
extension/sevenx/design/simple/stylesheets/main.v02.res.css
JS0extension/ezjscore/design/standard/lib/yui/3.17.2/build/yui/yui-min.js
extension/ezjscore/design/standard/javascript/jquery-3.7.0.min.js
extension/community_design/design/suncana/javascript/jquery.ui.core.min.js
extension/community_design/design/suncana/javascript/jquery.ui.widget.min.js
extension/community_design/design/suncana/javascript/jquery.easing.1.3.js
extension/community_design/design/suncana/javascript/jquery.ui.tabs.js
extension/community_design/design/suncana/javascript/jquery.hoverIntent.min.js
extension/community_design/design/suncana/javascript/jquery.popmenu.js
extension/community_design/design/suncana/javascript/jScrollPane.js
extension/community_design/design/suncana/javascript/jquery.mousewheel.js
extension/community_design/design/suncana/javascript/jquery.cycle.all.js
extension/sevenx/design/simple/javascript/jquery.scrollTo.js
extension/community_design/design/suncana/javascript/jquery.cookie.js
extension/community_design/design/suncana/javascript/ezstarrating_jquery.js
extension/community_design/design/suncana/javascript/jquery.initboxes.js
extension/community_design/design/suncana/javascript/app.js
extension/community_design/design/suncana/javascript/twitterwidget.js
extension/community_design/design/suncana/javascript/community.js
extension/community_design/design/suncana/javascript/roadmap.js
extension/community_design/design/suncana/javascript/ez.js
extension/community_design/design/suncana/javascript/ezshareevents.js
extension/sevenx/design/simple/javascript/main.js

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
6content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
7content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
16content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
9content/datatype/view/ezxmltags/line.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/line.tplEdit templateOverride template
1pagelayout.tpl<No override>extension/sevenx/design/simple/templates/pagelayout.tplEdit templateOverride template
 Number of times templates used: 40
 Number of unique templates used: 6

Time used to render debug report: 0.0001 secs