Howto import lot of html pages into eZPublish nodes ?

Author Message

Denis Zatsarinny

Thursday 26 October 2006 4:41:09 am

Hi,

Howto import lot of html pages into eZPublish nodes ?

Anybody do this?

Bye.

Xavier Dutoit

Friday 27 October 2006 1:44:17 am

Never easy.

I'd first download the html page, tidy them and parse the content I need to import. Then start with some kind of xml import (have a look at the contrib).

That means php dev anyway.

Good luck

X+

http://www.sydesy.com

Denis Zatsarinny

Tuesday 31 October 2006 12:04:37 am

Hi

>Never easy.
>I'd first download the html page, tidy them and parse the content I need to import. Then start >with some kind of xml import (have a look at the contrib).

Ok - I wrote a php script what convert mambo content in the eZ structures (Folders & Articles) - but mambo articles include very UGLY html content - and I found 3 error during run this script with eZPublisg 3.8.5

1. Fatal error: Call to a member function on a non-object in /srv/www/htdocs/ezportal/kernel/classes/datatypes/ezxmltext/input/ezxmlsimplifiedinputparser.php on line 611

orig: if ( $parent->nodeName == 'line' && !count( $parent->Children ) )

I am replace: if ( $parent->nodeName == 'line' && !count( $parent->Children ) && is_object($parent->parentNode) )

2. Fatal error: Call to a member function on a non-object in /srv/www/htdocs/ezportal/kernel/classes/datatypes/ezxmltext/ezxmlinputparser.php on line 772
orig: function &processSubtree( &$element, &$lastHandlerResult )
{
$ret = null;
$tmp = null;

I am replace: function &processSubtree( &$element, &$lastHandlerResult )
{
$ret = null;
$tmp = null;

if(!is_object($element)) return $ret;

but after this patch I got the fatal error - segmentation fault (ugly)

Can somebody recommdate me algoritm/soft/etc. what convert HTML in to eZXML ?

I was trying use tidy form http://tidy.sf.net - but has same result
If I use tidy options such as <b>--clean true, --word-2000 true</b> - I got empty content

 

 

Joe Kepley

Tuesday 31 October 2006 7:36:11 am

This sounds extremely hairy, and if you have a lot of pages with poorly-structured HTML, you're bound to have some things that don't fit into eZXML's content structure.

If I were faced with this, I'd look at importing it directly as HTML into an HTML datatype, and use something like TinyMCE or FCKEditor to provide the WYSIWYG. eZXML has a lot of benefits, but trying to coax HTML into eZXML (when it didn't start out that way) would be like trying to put smoke in a bottle.

Denis Zatsarinny

Tuesday 31 October 2006 10:01:24 am

Hi,

Dear Joe:

Using TinyMCE || FCKeditor - is not solution for this trouble

I was using FCEditor + Mambo for intranet site - and now have this problem.
I was found way to deploy mambo content to eZ structures - but ~20% of old mambo articles include MSO generated html and have size more then 2Mb. eZXML converter cannot process this documents - even tidy processed - but these documents very important

Bye

Denis Zatsarinny

Wednesday 01 November 2006 11:57:31 pm

Hi

Great news - I am upgrading from eZPublish 3.8.4 -> 3.8.6 - and any PHP error like this: Fatal error: Call to a member function on a non-object in - but (ugly) - I found internal PHP error: <b>*** glibc detected *** double free or corruption (fasttop): 0x1202fd98 ***</b> - but this error internal PHP interpretator bug

Bye

kracker (the)

Sunday 05 November 2006 3:51:48 pm

<b>@Denis</b>,

Please file a bug report.
<i>http://issues.ez.no/IssueEdit.php?ProjectId=3</i>

You can also help us repeat your bug more accurately
by sharing your mambo import script as a contribution.
If your concerned about it not being 100% complete, you may mark it as unstable.

- Login,
- Click, <i>http://ez.no/community/contribs/import_export</i>
- Click, upload contribution
- Complete and submit form
- Post a link in this forum thread to the contribution

//kracker
<i>The GNU/Linux Action Show! Podcast</i>

Member since: 2001.07.13 || http://ezpedia.se7enx.com/

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 18 2025 10:35:14
Script start
Timing: Jan 18 2025 10:35:14
Module start 'layout'
Timing: Jan 18 2025 10:35:14
Module start 'content'
Timing: Jan 18 2025 10:35:14
Module end 'content'
Timing: Jan 18 2025 10:35:14
Script end

Main resources:

Total runtime0.0161 sec
Peak memory usage4,096.0000 KB
Database Queries3

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0063 589.1953152.6406
Module start 'layout' 0.00630.0032 741.835939.4766
Module start 'content' 0.00950.0049 781.312597.3672
Module end 'content' 0.01440.0017 878.679742.3047
Script end 0.0160  920.9844 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.002515.2984140.0002
Check MTime0.00116.6332140.0001
Mysql Total
Database connection0.00095.653610.0009
Mysqli_queries0.002817.287330.0009
Looping result0.00000.074210.0000
Template Total0.00138.310.0013
Template load0.00074.184210.0007
Template processing0.00074.108510.0007
Override
Cache load0.00052.900310.0005
General
dbfile0.00031.968180.0000
String conversion0.00000.050540.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 1
 Number of unique templates used: 1

Time used to render debug report: 0.0001 secs