Best way to removing old RSS files

Author Message

Jack Rackham

Thursday 01 September 2005 3:07:02 am

On my site I have 100 + RSS feeds that are updated once a day with cron.
But removing old RSS files is painful since you only can view 50 nodes at a time.
So I wondered if there is a way to search for a node type and then display all of them at the same time. So that I cud remove all at once?

Kristof Coomans

Thursday 01 September 2005 6:45:21 am

A cronjob can remove old objects.

Fetch all nodes of a specific content class with the "published" attribute smaller than the current time - x days with <i>eZContentObjectTreeNode::subTree</i>.

Then call the function <i>eZContentObjectTreeNode::removeSubtrees( $nodeIDArray , false );</i> where $nodeIDArray is an array of the node ID's of the nodes you've fetched.

Done. ;-)

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

Jack Rackham

Thursday 01 September 2005 7:17:27 am

Can you explain that more?
Do you mean that I shod create a script and run that in cron? Like http://ez.no/content/view/full/52779

Kristof Coomans

Thursday 01 September 2005 7:20:43 am

Yes.

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

Jack Rackham

Thursday 01 September 2005 8:39:16 am

Do you know the node_id of the trash, since I am kind of lazy I am thinking of just using one of the moving script and then move all old rss files to trash.
Or do you have a scrip that deletes old files ready?

Kristof Coomans

Thursday 01 September 2005 9:09:15 am

The trash can isn't a node. Actually, all content objects that have the status EZ_CONTENT_OBJECT_STATUS_ARCHIVED are in the trash.

When you trash files with the interface, the function <i>eZContentObjectTreeNode::removeSubtrees</i> is also used, but the second parameter is a boolean true instead of false.

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

Betsy Gamrat

Wednesday 17 January 2007 8:56:42 pm

Here is one solution to getting rid of old RSS data.

It is an adaptation of other code posted in the forums - with thanks to those contributors.

I used a custom class to store the incoming RSS data. This script, which is run by <b>runcronjobs</b>, fetches the RSS nodes in descending order of publication. Therefore, the newest nodes are first. An ini setting is used to define the initial fetch offset, which preserves the first n objects (where n=MaxRSSObjects). All remaining objects are deleted.

Remember to update cronjobs.ini to add this to the list. This can be run standalone, too - for testing. There is an attribute filter, but in this case, I want to keep only the 50 most recent objects, regardless of the actual publication date.

This is feed independent - so all RSS data will be processed.

//init shell script

include_once( 'kernel/classes/ezcontentobject.php' );
include_once( 'kernel/classes/ezcontentobjecttreenode.php' );
include_once( 'kernel/classes/ezcontentobjecttreenodeoperations.php' );

include_once( "lib/ezutils/classes/ezextension.php" );
include_once( "lib/ezutils/classes/ezmodule.php" );
include_once( 'lib/ezutils/classes/ezcli.php' );
include_once( 'lib/ezutils/classes/ezini.php' );
include_once( 'kernel/classes/ezscript.php' );

define ('SUBTREE_LIMIT',50);

if (!isset($script))
{
        $script =& eZScript::instance( array( 'debug-message' => true,
                                      'use-session' => true,
                                      'use-modules' => true,
                                      'use-extensions' => true ) );
        $script->startup();
        $script->initialize();
        $standalone=true;
}
else
        $standalone=false;

if (!isset($cli))
{
        $cli =& eZCLI::instance();
        $cli->setUseStyles( true ); // enable colors
}

$user = eZUser::fetchByName('Admin'); //*** ez administror
$userID = $user->attribute( 'contentobject_id' );
eZUser::setCurrentlyLoggedInUser( $user, $userID );

$today=getdate();
$time=mktime(0,0,0,$today['mon'],$today['mday'],$today['year']);

$cli->output('Checking for RSS objects',true);

$ini =& eZINI::instance();
$max_RSS_objects = (int)$ini->variable( "RSSSettings","MaxRSSObjects" );
$cli->output('Current limit is '.$max_RSS_objects.' RSS objects',true);

get_RSS_nodes($time,$max_RSS_objects);

$cli->output('Done',true);
if ($standalone)
        $script->shutdown();

function get_RSS_nodes ($today,$max_RSS_objects)
{
        global $cli;
        $offset=$max_RSS_objects;
        $done=FALSE;
        do
        {
                $params=array(
                        'ClassFilterType' => 'include',
                        'ClassFilterArray' => array(33),
                        'Limit' => SUBTREE_LIMIT,
                        'Depth' => 0,
                        'Offset' => $offset,
                        'SortBy' => array( 'published',false ),
//                        'AttributeFilter' => array('and',array('published','<=',$today)),
                        'status' => EZ_CONTENT_OBJECT_STATUS_PUBLISHED);
                $childNodes =& eZContentObjectTreeNode::subTree($params,2);
                if (count($childNodes) === 0)
                        $done=TRUE;
                else
                {
                        foreach( $childNodes as $child )
                        {
                                $deleteIDArray[] = $id = $child->attribute( 'main_node_id' );
                                $cli->output('Deleting: '.$child->attribute('name').' ['.$id.']',true);
                        }
                        eZContentObjectTreeNode::removeSubtrees( $deleteIDArray, false );
                        unset ($childNodes);
                }
        }
        while (!$done);
}
?>

kracker (the)

Thursday 18 January 2007 8:09:23 am

Betsy Gamrat also graciously created a related node on eZpedia on this topic,
<i>http://ezpedia.org/wiki/en/ez/rss_delete_script_allows_you_to_limit_the_amount_of_rss_data_in_the_system</i>

Cheers,
//kracker

Member since: 2001.07.13 || http://ezpedia.se7enx.com/

Kristof Coomans

Thursday 18 January 2007 9:18:52 am

A small modification would improve the portability of the script between sites: replace the class id 33 with a class identifier.

Anyway, nice contribution Betsy!

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

Betsy Gamrat

Monday 22 January 2007 4:35:21 am

Notes:

<b>Important</b>

The script posted was modified, to remove the line <i>$offset+=SUBTREE_LIMIT;</i> from the loop. The loop was originally built to run an update script, but in this case, the loop must simply preserve the first n elements retrieved. If any are deleted, the offset should remain the same, because the remaining elements will take the place of the deleted ones, until the delete finishes.

Other than that, the script is running very well.

Kristoff is right - the class identifier should be text. I didn't worry about including it, because the class name/id would be implementation dependent.

kracker (the)

Monday 22 January 2007 4:43:00 am

Well I think the idea is to push for more than another offshoot run once script, in general.

This script could easily accept these specific variables as arguments.

<i>//kracker

The Rentals - Friends of P
</i>

Member since: 2001.07.13 || http://ezpedia.se7enx.com/

Betsy Gamrat

Tuesday 23 January 2007 5:59:17 am

kracker,

The script could be configured in many different ways. The sole objective of my script was to limit the number of RSS objects/nodes across all the feeds in the system.

My initial post had a bug. The first time it fetches the nodes, the offset is important to 'skip' the first n nodes (where n=MaxRSSObjects). The next fetch should skip only the same nodes, not any additional ones. Increasing the offset in the loop increases the skip count.

The code was originally written to run an update and loop through the fetches in chunks, in a vain effort to avoid running out of memory. That strategy did not succeed, and I used the attribute filter to refine the fetch so only the necessary data was examined. That is working great, and runs nicely every night, moving data between folders based on the attributes.

The delete script should not have memory issues, but if it does, I will adjust it limit the nodes fetched by RSS feed. The feeds send varying quantities of data - and there are other settings in eZ to limit the amount of data coming in.

If I felt the code was robust enough to contribute, I would. However, to do a good job, the script should be bundled with the class, supporting documentation, and templates. It should also be carefully tested. I don't have time to deliver the full solution, so I'm posting what I have.

:)

Cheers!

Dominik LEE

Wednesday 20 May 2009 4:19:31 am

Hi everyone,

Just wanted to know if anybody managed to make this script work on ez publish 4.1.1

I get this error :

Using $this when not in object context in /../../ezcontentobjecttreenode.php on line 1965

thank you

Steven E. Bailey

Wednesday 20 May 2009 6:13:08 am

For that specific error you have to change:

eZContentObjectTreeNode::subTree

to

eZContentObjectTreeNode::subTreeByNodeID

Certified eZPublish developer
http://ez.no/certification/verify/396111

Available for ezpublish troubleshooting, hosting and custom extension development: http://www.leidentech.com

Dominik LEE

Wednesday 20 May 2009 7:48:26 am

Thank you for this very fast answer. Error message has gone. But objects are not deleted.

Here the output :

Running cronjobs/rssdelete.php
Checking for RSS objects
Current limit is 10 RSS objects per feed
Forum (my parent_node name)
Done

I guess the problem si somewhere in this piece of code, but I don't understand much in it.

$cli->output('Done',true);
if ($standalone)
        $script->shutdown();

function get_RSS_nodes ($today,$max_RSS_objects,$parentNodeID)
{
        global $cli;
 
        if ($max_RSS_objects === 0) return;
 
        $offset=$max_RSS_objects;
        $done=FALSE;
        do
        {
                $params=array(
                        'ClassFilterType' => 'include',
                        'ClassFilterArray' => array(33),
                        'Limit' => SUBTREE_LIMIT,
                        'Depth' => 0,
                        'Offset' => $offset,
                        'SortBy' => array( 'published',false ),
//                        'AttributeFilter' => array('and',array('published','<=',$today)),
                        'status' => EZ_CONTENT_OBJECT_STATUS_PUBLISHED);
                $childNodes =& eZContentObjectTreeNode::subTreeByNodeID($params,$parentNodeID);
                if (count($childNodes) === 0)
                        $done=TRUE;
                else
                {
                        foreach( $childNodes as $child )
                        {
                                $deleteIDArray[] = $id = $child->attribute( 'main_node_id' );
                                $cli->output('Deleting: '.$child->attribute('name').' ['.$id.']',true);
                        }
                        eZContentObjectTreeNode::removeSubtrees( $deleteIDArray, false );
                        unset ($childNodes);
                }
        }
        while (!$done);
} 

Steven E. Bailey

Wednesday 20 May 2009 8:25:26 am

$childNodes =& eZContentObjectTreeNode::subTreeByNodeID($params,$parentNodeID);

should be:

$childNodes = eZContentObjectTreeNode::subTreeByNodeID($params,$parentNodeID);

Then check the count of $childNodes and see if the fetch is actually coming back with anything.

If not, first make sure the $parentNodeID is correct... then the $params...

Certified eZPublish developer
http://ez.no/certification/verify/396111

Available for ezpublish troubleshooting, hosting and custom extension development: http://www.leidentech.com

Dominik LEE

Wednesday 20 May 2009 9:04:45 am

Awesome, it works now.

In fact it wasn't that hard to find. It's only that my 'ClassFilterArray' wa wrong.

I found out using

$cli->output(count($childNodes));

which of course resulted by 0.

Thanks a lot.

Lulio Vargas

Saturday 18 July 2009 3:35:12 pm

Steven,

I have been struggling with the rsspurge.php (the one originally posted by Betsy Gamrat) that's supposed to remove older RSS item objects from my site.

(1)- I wonder if there's a location where someone has an update of the script THAT WORKS!

(2)- When I try running the original script manually I get an ugly fatal error:

$ php runcronjobs.php rsspurge
Running cronjob part 'rsspurge'
Running cronjobs/rsspurge.php

Fatal error: eZ Publish did not finish its request
The execution of eZ Publish was abruptly ended, the debug output is present below.
zend_mm_heap corrupted

I would appreciate any help from Steven or any fellow developer that may have available a newer (working!) version of the RSS delete script. Thanks

Heath

Saturday 18 July 2009 6:57:00 pm

@ Lulio Vargas

Take a look at the BC Cleanup RSS extension [0]
Which is compatible with eZ Publish 4.0+

Cheers,
Heath

[0] <i>http://ez.no/developer/contribs/cronjobs/bc_cleanup_rss</i>
[1] <i>http://ez.no/content/download/275036/2529830/file/bccleanuprss.0.0.17.tar.gz</i>
[2] <i>http://projects.ez.no/bccleanuprss</i>

Brookins Consulting | http://brookinsconsulting.com/
Certified | http://auth.ez.no/certification/verify/380350
Solutions | http://projects.ez.no/users/community/brookins_consulting
eZpedia community documentation project | http://ezpedia.org

Lulio Vargas

Saturday 22 August 2009 7:04:44 pm

Finally, my problem with runaway accumulation of "RSS Item" content objects on my site is solved. For a couple of weeks I have been testing the excellent BC Cleanup RSS extension, contributed by <a href="http://brookinsconsulting.com">Brookins Consulting</a>.

I'm pleased to report that it works like a charm!

I configured its script -- bccleanuprss.php -- to run daily by the eZ Publish cronjobs, just five minutes previous to the time set for the main scripts to run. Since the main set of scripts include rssimport.php, I end up with only the "freshest" RSS feeds every day.

Thank you Heath!

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.