Reporting - how to get custom data out of ezpub database?

Author Message

chris mol

Saturday 01 March 2008 7:49:17 pm

I work for a small company that is considering using ezpub as our enterprise content management tool. It has all the cms functionality we need and more. However, we have some concerns around ezpub's OO database and extracting data for reporting.

My company's business is to book events online. We book about 50,000 events per year, with a user base of 20,000 users that belong to 35,000 organizations that then roll up into about 50 clients. We store a lot of data, probably close to 500,000 records.

We are in talks with a local web shop to customize ezpub as our web scheduling application.

Our daily business depends heavily on the ability of the IT dept to deliver reports including all the data points listed above. We have some concerns that nearly all our custom data points will be stored in what amounts to 4-5 tables in ezpub (ezcontentclass, ezcontentobject, etc.).

Can anyone provide input regarding extracting data from the ezpub db for custom reporting (BI suites like Pentaho, MS, etc.) ? Is it even possible? If so, does it take a lot of effort in an writing ETLs to separate my data from the ezpub object data?

This is a huge issue for us and I would appreciate any input from users who have experience using the ezpub database for custom reporting.

Thanks.

Felix Laate

Sunday 02 March 2008 2:12:55 am

Hi Chris,

so the ezp-database is kind of abstract, BUT there are excellent ways to make a proxy that produces the output you need.

Say e.g. that you plan to use Pentaho. It supports many data sources, amongst them XML-based data sources. Then you could quite easily make a "view" with e.g. the layout-module that produces the XML you need. It then works pretty much like any feed.

Felix

Publlic Relations Manager
Greater Stavanger
www.greaterstavanger.com

Piotrek Karaś

Sunday 02 March 2008 3:46:30 am

That is all possible plus more - you can extend eZ Publish to handle data with eZ API rather than though presentation layer. Also, the content model is not that difficult once you've learned how eZ Publish handles content, so then you can directly pull from the DB in any way that suits you.

One question, though, to Felix and other experienced developers is: should those 500000 records be attempted to be managed with that model? Would you go for that? Or would you choose some sort of integration or extension with dedicated DB tables and interfaces? Looks like data management rather than content management project to me.

What do you think?

--
Company: mediaSELF Sp. z o.o., http://www.mediaself.pl
eZ references: http://ez.no/partners/worldwide_partners/mediaself
eZ certified developer: http://ez.no/certification/verify/272585
eZ blog: http://ez.ryba.eu

Felix Laate

Monday 03 March 2008 12:49:31 pm

Hi again,

>> should those 500000 records be attempted to be managed with that model? Would you go >> for that? Or would you choose some sort of integration or extension with dedicated DB
>> tables and interfaces? Looks like data management rather than content management
>> project to me.

Obviously, with that amount of data, a joint solution (ezp CMS + separate database) would be a good one. My suggestion (XML view of CMS-data) is not that efficient of course, but on the other side, it's quite easy to set up.

If you want more control and a more efficient solution, then I would opt for an separate extension based on the API.

Anyhow, I think the ezp-approach is a good one for projects like this. Where you have the classic needs for a CMS in combination with the need to provide access to and from just about any kind of database-systems. You need flexibility most of all. And that's, IMHO, what ezp is all about.

Felix

Publlic Relations Manager
Greater Stavanger
www.greaterstavanger.com

Björn Dieding@xrow.de

Monday 03 March 2008 4:50:25 pm

Hi,

the main problem of the 500,000 records stored in the database is the content object tree table. Due it`s architecture and design it can`t deliver certain fetches very effective ( path like 'mytree/%' ). A better model has been already developed for the eZ components.

So your keys to success are:
* Only store the least necessary data in the content object related tables
* Get a cool hardware for the db, buy a lot of ram, tweak your db(maybe your model holds the demand just by doing this)
* Get technology from the components as an early adopter.

I say get a good eZ partner and build a proof of concept with them. Since your model isn`t complex proving that it would not brake shouldn`t be expensive..

For reporting I would let the reporting tool directly access the db and create it`s reports.

If one knows it better, prove me wrong :-).

Looking for a new job? http://www.xrow.com/xrow-GmbH/Jobs
Looking for hosting? http://hostingezpublish.com
-----------------------------------------------------------------------------
GMT +01:00 Hannover, Germany
Web: http://www.xrow.com/

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 31 2025 08:04:09
Script start
Timing: Jan 31 2025 08:04:09
Module start 'layout'
Timing: Jan 31 2025 08:04:09
Module start 'content'
Timing: Jan 31 2025 08:04:09
Module end 'content'
Timing: Jan 31 2025 08:04:09
Script end

Main resources:

Total runtime0.0187 sec
Peak memory usage2,048.0000 KB
Database Queries3

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0055 588.3750151.2422
Module start 'layout' 0.00560.0043 739.617236.7109
Module start 'content' 0.00990.0073 776.328198.2188
Module end 'content' 0.01720.0015 874.546937.9922
Script end 0.0187  912.5391 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.002412.9038140.0002
Check MTime0.00105.5873140.0001
Mysql Total
Database connection0.00073.613710.0007
Mysqli_queries0.004021.277830.0013
Looping result0.00000.080210.0000
Template Total0.00105.310.0010
Template load0.00073.667210.0007
Template processing0.00031.633810.0003
Override
Cache load0.00052.556310.0005
General
dbfile0.002814.965280.0004
String conversion0.00000.049640.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 1
 Number of unique templates used: 1

Time used to render debug report: 0.0001 secs