Best way to removing old RSS files

Author Message

Jack Rackham

Thursday 01 September 2005 3:07:02 am

On my site I have 100 + RSS feeds that are updated once a day with cron.
But removing old RSS files is painful since you only can view 50 nodes at a time.
So I wondered if there is a way to search for a node type and then display all of them at the same time. So that I cud remove all at once?

Kristof Coomans

Thursday 01 September 2005 6:45:21 am

A cronjob can remove old objects.

Fetch all nodes of a specific content class with the "published" attribute smaller than the current time - x days with <i>eZContentObjectTreeNode::subTree</i>.

Then call the function <i>eZContentObjectTreeNode::removeSubtrees( $nodeIDArray , false );</i> where $nodeIDArray is an array of the node ID's of the nodes you've fetched.

Done. ;-)

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

Jack Rackham

Thursday 01 September 2005 7:17:27 am

Can you explain that more?
Do you mean that I shod create a script and run that in cron? Like http://ez.no/content/view/full/52779

Kristof Coomans

Thursday 01 September 2005 7:20:43 am

Yes.

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

Jack Rackham

Thursday 01 September 2005 8:39:16 am

Do you know the node_id of the trash, since I am kind of lazy I am thinking of just using one of the moving script and then move all old rss files to trash.
Or do you have a scrip that deletes old files ready?

Kristof Coomans

Thursday 01 September 2005 9:09:15 am

The trash can isn't a node. Actually, all content objects that have the status EZ_CONTENT_OBJECT_STATUS_ARCHIVED are in the trash.

When you trash files with the interface, the function <i>eZContentObjectTreeNode::removeSubtrees</i> is also used, but the second parameter is a boolean true instead of false.

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

Betsy Gamrat

Wednesday 17 January 2007 8:56:42 pm

Here is one solution to getting rid of old RSS data.

It is an adaptation of other code posted in the forums - with thanks to those contributors.

I used a custom class to store the incoming RSS data. This script, which is run by <b>runcronjobs</b>, fetches the RSS nodes in descending order of publication. Therefore, the newest nodes are first. An ini setting is used to define the initial fetch offset, which preserves the first n objects (where n=MaxRSSObjects). All remaining objects are deleted.

Remember to update cronjobs.ini to add this to the list. This can be run standalone, too - for testing. There is an attribute filter, but in this case, I want to keep only the 50 most recent objects, regardless of the actual publication date.

This is feed independent - so all RSS data will be processed.

//init shell script

include_once( 'kernel/classes/ezcontentobject.php' );
include_once( 'kernel/classes/ezcontentobjecttreenode.php' );
include_once( 'kernel/classes/ezcontentobjecttreenodeoperations.php' );

include_once( "lib/ezutils/classes/ezextension.php" );
include_once( "lib/ezutils/classes/ezmodule.php" );
include_once( 'lib/ezutils/classes/ezcli.php' );
include_once( 'lib/ezutils/classes/ezini.php' );
include_once( 'kernel/classes/ezscript.php' );

define ('SUBTREE_LIMIT',50);

if (!isset($script))
{
        $script =& eZScript::instance( array( 'debug-message' => true,
                                      'use-session' => true,
                                      'use-modules' => true,
                                      'use-extensions' => true ) );
        $script->startup();
        $script->initialize();
        $standalone=true;
}
else
        $standalone=false;

if (!isset($cli))
{
        $cli =& eZCLI::instance();
        $cli->setUseStyles( true ); // enable colors
}

$user = eZUser::fetchByName('Admin'); //*** ez administror
$userID = $user->attribute( 'contentobject_id' );
eZUser::setCurrentlyLoggedInUser( $user, $userID );

$today=getdate();
$time=mktime(0,0,0,$today['mon'],$today['mday'],$today['year']);

$cli->output('Checking for RSS objects',true);

$ini =& eZINI::instance();
$max_RSS_objects = (int)$ini->variable( "RSSSettings","MaxRSSObjects" );
$cli->output('Current limit is '.$max_RSS_objects.' RSS objects',true);

get_RSS_nodes($time,$max_RSS_objects);

$cli->output('Done',true);
if ($standalone)
        $script->shutdown();

function get_RSS_nodes ($today,$max_RSS_objects)
{
        global $cli;
        $offset=$max_RSS_objects;
        $done=FALSE;
        do
        {
                $params=array(
                        'ClassFilterType' => 'include',
                        'ClassFilterArray' => array(33),
                        'Limit' => SUBTREE_LIMIT,
                        'Depth' => 0,
                        'Offset' => $offset,
                        'SortBy' => array( 'published',false ),
//                        'AttributeFilter' => array('and',array('published','<=',$today)),
                        'status' => EZ_CONTENT_OBJECT_STATUS_PUBLISHED);
                $childNodes =& eZContentObjectTreeNode::subTree($params,2);
                if (count($childNodes) === 0)
                        $done=TRUE;
                else
                {
                        foreach( $childNodes as $child )
                        {
                                $deleteIDArray[] = $id = $child->attribute( 'main_node_id' );
                                $cli->output('Deleting: '.$child->attribute('name').' ['.$id.']',true);
                        }
                        eZContentObjectTreeNode::removeSubtrees( $deleteIDArray, false );
                        unset ($childNodes);
                }
        }
        while (!$done);
}
?>

kracker (the)

Thursday 18 January 2007 8:09:23 am

Betsy Gamrat also graciously created a related node on eZpedia on this topic,
<i>http://ezpedia.org/wiki/en/ez/rss_delete_script_allows_you_to_limit_the_amount_of_rss_data_in_the_system</i>

Cheers,
//kracker

Member since: 2001.07.13 || http://ezpedia.se7enx.com/

Kristof Coomans

Thursday 18 January 2007 9:18:52 am

A small modification would improve the portability of the script between sites: replace the class id 33 with a class identifier.

Anyway, nice contribution Betsy!

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

Betsy Gamrat

Monday 22 January 2007 4:35:21 am

Notes:

<b>Important</b>

The script posted was modified, to remove the line <i>$offset+=SUBTREE_LIMIT;</i> from the loop. The loop was originally built to run an update script, but in this case, the loop must simply preserve the first n elements retrieved. If any are deleted, the offset should remain the same, because the remaining elements will take the place of the deleted ones, until the delete finishes.

Other than that, the script is running very well.

Kristoff is right - the class identifier should be text. I didn't worry about including it, because the class name/id would be implementation dependent.

kracker (the)

Monday 22 January 2007 4:43:00 am

Well I think the idea is to push for more than another offshoot run once script, in general.

This script could easily accept these specific variables as arguments.

<i>//kracker

The Rentals - Friends of P
</i>

Member since: 2001.07.13 || http://ezpedia.se7enx.com/

Betsy Gamrat

Tuesday 23 January 2007 5:59:17 am

kracker,

The script could be configured in many different ways. The sole objective of my script was to limit the number of RSS objects/nodes across all the feeds in the system.

My initial post had a bug. The first time it fetches the nodes, the offset is important to 'skip' the first n nodes (where n=MaxRSSObjects). The next fetch should skip only the same nodes, not any additional ones. Increasing the offset in the loop increases the skip count.

The code was originally written to run an update and loop through the fetches in chunks, in a vain effort to avoid running out of memory. That strategy did not succeed, and I used the attribute filter to refine the fetch so only the necessary data was examined. That is working great, and runs nicely every night, moving data between folders based on the attributes.

The delete script should not have memory issues, but if it does, I will adjust it limit the nodes fetched by RSS feed. The feeds send varying quantities of data - and there are other settings in eZ to limit the amount of data coming in.

If I felt the code was robust enough to contribute, I would. However, to do a good job, the script should be bundled with the class, supporting documentation, and templates. It should also be carefully tested. I don't have time to deliver the full solution, so I'm posting what I have.

:)

Cheers!

Dominik LEE

Wednesday 20 May 2009 4:19:31 am

Hi everyone,

Just wanted to know if anybody managed to make this script work on ez publish 4.1.1

I get this error :

Using $this when not in object context in /../../ezcontentobjecttreenode.php on line 1965

thank you

Steven E. Bailey

Wednesday 20 May 2009 6:13:08 am

For that specific error you have to change:

eZContentObjectTreeNode::subTree

to

eZContentObjectTreeNode::subTreeByNodeID

Certified eZPublish developer
http://ez.no/certification/verify/396111

Available for ezpublish troubleshooting, hosting and custom extension development: http://www.leidentech.com

Dominik LEE

Wednesday 20 May 2009 7:48:26 am

Thank you for this very fast answer. Error message has gone. But objects are not deleted.

Here the output :

Running cronjobs/rssdelete.php
Checking for RSS objects
Current limit is 10 RSS objects per feed
Forum (my parent_node name)
Done

I guess the problem si somewhere in this piece of code, but I don't understand much in it.

$cli->output('Done',true);
if ($standalone)
        $script->shutdown();

function get_RSS_nodes ($today,$max_RSS_objects,$parentNodeID)
{
        global $cli;
 
        if ($max_RSS_objects === 0) return;
 
        $offset=$max_RSS_objects;
        $done=FALSE;
        do
        {
                $params=array(
                        'ClassFilterType' => 'include',
                        'ClassFilterArray' => array(33),
                        'Limit' => SUBTREE_LIMIT,
                        'Depth' => 0,
                        'Offset' => $offset,
                        'SortBy' => array( 'published',false ),
//                        'AttributeFilter' => array('and',array('published','<=',$today)),
                        'status' => EZ_CONTENT_OBJECT_STATUS_PUBLISHED);
                $childNodes =& eZContentObjectTreeNode::subTreeByNodeID($params,$parentNodeID);
                if (count($childNodes) === 0)
                        $done=TRUE;
                else
                {
                        foreach( $childNodes as $child )
                        {
                                $deleteIDArray[] = $id = $child->attribute( 'main_node_id' );
                                $cli->output('Deleting: '.$child->attribute('name').' ['.$id.']',true);
                        }
                        eZContentObjectTreeNode::removeSubtrees( $deleteIDArray, false );
                        unset ($childNodes);
                }
        }
        while (!$done);
} 

Steven E. Bailey

Wednesday 20 May 2009 8:25:26 am

$childNodes =& eZContentObjectTreeNode::subTreeByNodeID($params,$parentNodeID);

should be:

$childNodes = eZContentObjectTreeNode::subTreeByNodeID($params,$parentNodeID);

Then check the count of $childNodes and see if the fetch is actually coming back with anything.

If not, first make sure the $parentNodeID is correct... then the $params...

Certified eZPublish developer
http://ez.no/certification/verify/396111

Available for ezpublish troubleshooting, hosting and custom extension development: http://www.leidentech.com

Dominik LEE

Wednesday 20 May 2009 9:04:45 am

Awesome, it works now.

In fact it wasn't that hard to find. It's only that my 'ClassFilterArray' wa wrong.

I found out using

$cli->output(count($childNodes));

which of course resulted by 0.

Thanks a lot.

Lulio Vargas

Saturday 18 July 2009 3:35:12 pm

Steven,

I have been struggling with the rsspurge.php (the one originally posted by Betsy Gamrat) that's supposed to remove older RSS item objects from my site.

(1)- I wonder if there's a location where someone has an update of the script THAT WORKS!

(2)- When I try running the original script manually I get an ugly fatal error:

$ php runcronjobs.php rsspurge
Running cronjob part 'rsspurge'
Running cronjobs/rsspurge.php

Fatal error: eZ Publish did not finish its request
The execution of eZ Publish was abruptly ended, the debug output is present below.
zend_mm_heap corrupted

I would appreciate any help from Steven or any fellow developer that may have available a newer (working!) version of the RSS delete script. Thanks

Heath

Saturday 18 July 2009 6:57:00 pm

@ Lulio Vargas

Take a look at the BC Cleanup RSS extension [0]
Which is compatible with eZ Publish 4.0+

Cheers,
Heath

[0] <i>http://ez.no/developer/contribs/cronjobs/bc_cleanup_rss</i>
[1] <i>http://ez.no/content/download/275036/2529830/file/bccleanuprss.0.0.17.tar.gz</i>
[2] <i>http://projects.ez.no/bccleanuprss</i>

Brookins Consulting | http://brookinsconsulting.com/
Certified | http://auth.ez.no/certification/verify/380350
Solutions | http://projects.ez.no/users/community/brookins_consulting
eZpedia community documentation project | http://ezpedia.org

Lulio Vargas

Saturday 22 August 2009 7:04:44 pm

Finally, my problem with runaway accumulation of "RSS Item" content objects on my site is solved. For a couple of weeks I have been testing the excellent BC Cleanup RSS extension, contributed by <a href="http://brookinsconsulting.com">Brookins Consulting</a>.

I'm pleased to report that it works like a charm!

I configured its script -- bccleanuprss.php -- to run daily by the eZ Publish cronjobs, just five minutes previous to the time set for the main scripts to run. Since the main set of scripts include rssimport.php, I end up with only the "freshest" RSS feeds every day.

Thank you Heath!

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 18 2025 16:03:36
Script start
Timing: Jan 18 2025 16:03:36
Module start 'layout'
Timing: Jan 18 2025 16:03:36
Module start 'content'
Timing: Jan 18 2025 16:03:36
Module end 'content'
Timing: Jan 18 2025 16:03:36
Script end

Main resources:

Total runtime0.9113 sec
Peak memory usage4,096.0000 KB
Database Queries138

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0085 587.9219152.6250
Module start 'layout' 0.00850.0037 740.546939.4453
Module start 'content' 0.01220.8976 779.99221,029.1641
Module end 'content' 0.90980.0014 1,809.156348.1563
Script end 0.9112  1,857.3125 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00350.3800160.0002
Check MTime0.00130.1395160.0001
Mysql Total
Database connection0.00120.128510.0012
Mysqli_queries0.763183.73681380.0055
Looping result0.00160.17761360.0000
Template Total0.872095.720.4360
Template load0.00250.271720.0012
Template processing0.869595.411420.4347
Template load and register function0.00010.013310.0001
states
state_id_array0.00100.110810.0010
state_identifier_array0.00120.127520.0006
Override
Cache load0.00250.27001350.0000
Sytem overhead
Fetch class attribute can translate value0.00070.078780.0001
Fetch class attribute name0.00130.1410270.0000
XML
Image XML parsing0.00370.403880.0005
class_abstraction
Instantiating content class attribute0.00010.0077370.0000
General
dbfile0.00160.1794560.0000
String conversion0.00000.001140.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
17content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
20content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
12content/datatype/view/ezxmltags/line.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/line.tplEdit templateOverride template
29content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
3content/datatype/view/ezxmltags/literal.tpl<No override>extension/community/design/standard/templates/content/datatype/view/ezxmltags/literal.tplEdit templateOverride template
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 83
 Number of unique templates used: 7

Time used to render debug report: 0.0002 secs