fetching random content combined with cache

Author Message

Marko Žmak

Monday 04 September 2006 4:14:58 am

I was browsing the forum and came upon this post:

http://ez.no/community/forum/setup_design/how_to_fetch_random_content_3_8/

I know it's a little bit old topic but still someone might find this usefull. So here's a little discussion about extracting random content (from my experience)...

Extracting the content on random choice in the way it's described in the mentioned post can consume a lot of computer resources and slow down your site. Why? Here are some reasons:

- when you extract content on random, you cannot use cache for that conent.

- fetching the results and then looping over them can consume a lot of CPU and/or memory. There are two cases:
a) You fetch all the items at once and then get some of then in a loop. In this case, if the fetch returns a large number of items, there's a large amount data to be stored which can consume a lot of memory (and probably cause the sistem to use disk cache).
b) You fetch an item by item in your database. In this case for every step in the loop you perform a fetch which results in performing a SELECT query in your database. Doing so many queries repetitevely can also consume a lot of memory and CPU.

We should also mention that doing a loop, prepending items to the array and extracting unique elements afterwards consumes a certain amount of resurces too.

It could seem that on fast machines with a lot of memory these reasons are of no importance, but when a lot of users browse the pages you may soon find out that the machine is not fast enough for this (as I found out). Especially if you are doing this "radnom fetches" on several places on your page.

So, what's the sollution? The first thing that comes in mind is to use cache (view cache or cache blocks). But as I already mentioned, it's not possible because if you use cache, then you don't get random results but only one result all the time.

So what I did is create a template that extracts elements in "almost random" order. In that template I get a random index from the tree_count the same way that Marc described in the mentioned post and then get all next elements as they are ordered in the array (the step in the loop is 1, I didn't YET figured out a simple way to extract the elements with the step created on random).
This by itself is not some enhanchement, but the trick is in being able to use the cache blocks together with this. So I created a template cache block with a key constructed from:

- a custom prefix
- id of the parent node
- the index of the start element (generated on random)
- the count of elements from the fetch

And then I put the actual fetch inside the cache block, so only the fetch for tree_count is executed outside the block. The effect is that data fetching and the loop are executed for each combination of this parameters only once and every next time the result comes from the cache (which is pure HTML with no PHP processing or database querying).

One possible problem with this sollution is that it generates a lot of cache blocks on the disk so it consumes a lot of disk space and also it takes a little bit longer to clean the cache when needed. But if you have enough disk space, that's not a problem. In my experience I didn't find any other problem with it.

The results of this aproach are working great for me. The execution of this "random fetch" is now 10 times faster on my site and the resoure consumption is greatly reduced. Hope this discussion helps someone, and If anyone is interested in details of this template just contact me here on the forum.

--
Nothing is impossible. Not if you can imagine it!

Hubert Farnsworth

Softriva .com

Wednesday 18 April 2007 8:41:45 am

Dear Marko

I am interested. Would you please provide me with more information.

Jorge estévez

Wednesday 24 December 2008 3:02:06 pm

Hello,

Please explain more, I am having problems when the cache is turn on, please see http://ez.no/developer/forum/developer/shuffle_and_cache_please_help

Can you post some code so I can get the whole idea!

Thanks

Diseño Web Cuba
Web Design Cuba
www.elfosdesign.com

Stéphane Couzinier

Friday 26 December 2008 2:55:15 am

Hi

To retrieve random content ypu can use a specific operator see : http://projects.ez.no/la_fetch_random

To avoid cache pb, you must disable content cache for the tpl :
in the override full add {set-block scope=root variable=cache_ttl}0{/set-block} and to reduce CPU/memory you should add some cache block in the TPL

http://www.kouz-cooking.fr

Marko Žmak

Friday 26 December 2008 5:36:49 am

Hi Jorge, I'll give you an example. The trick is to define keys for the cache-block in a smart way so that cache-bock depend on all the parameters that influence the results that should be displayed.

First here's the simpler example that uses the step of 1:

{*
Parameters given to this template are:

$search_parent_node_id - parent node for fetching
$class_filter_type - class filter type for fetching
$class_filter_array -class filter array for fetching
$attribute_filter - attribute filter for fetching
$main_node_only - whether to fetch only main nodes
$sort_by - sort criteria for fetching
$fetch_function - the function for fetching ('list', 'tree')

$max_nodes - the number of nodes to extract (at the end $max_nodes contains the number of nodes that were actually displayed)
$list_item_view - the view used for displaying fetched items (e.g. 'full', 'line'...)
$keys_prefix - this is added as a prefix to the keys parameter of the cache block so that we could use differentiate cache blocks when used in several parts of the page

*}

{* First we define the hash used for fetch operator *}
{def $fetch_hash=hash(
					'parent_node_id', $search_parent_node_id,
					'class_filter_type', $class_filter_type,
					'class_filter_array', $class_filter_array,
					'attribute_filter', $attribute_filter,
					'main_node_only', $main_node_only
				)
}

{* Now we get the number of nodes that fetch will return *}
{def $nodes_count=fetch(content, concat($fetch_function,'_count'), $fetch_hash)}

{if gt($nodes_count,0)}

	{* If $max_nodes is lower that 0 then we set it to $nodes_count so we'll fetch all the nodes *}
	{if le($max_nodes,0)}
		{set $max_nodes=$nodes_count}
	{/if}
	
	{* We get the index of the first element randomly and define the step of 1. (we could also make $step randomly but that's more complicated *}
	{def $index=rand(0, dec($nodes_count)) $step=1}

	{* We fetch the parent node so that we can use it's url_alias in subtree_expiry of the cache-block *}
	{def $parent_node=fetch(content, node, hash('node_id', $search_parent_node_id))}

	{* And here's the cache-block. The keys parameter is constructed from:
		$keys_prefix
		$search_parent_node_id
		$nodes_count
		$index
		$step
		
		(the $step is not really needed in keys parameter because is always 1, but if we create $step randomly, then it will be needed)
	 *}
	{cache-block
	keys=concat($keys_prefix,$search_parent_node_id,'_',$nodes_count,'_',$index,'_',$step) subtree_expiry=$parent_node.url_alias
	 expiry=0 ignore_content_expiry}

		{* The piece of code that is most intensive comes within this cache block. That is:
			- fetching the content
			- going through the list and displaying nodes
		*}

		{* We add the sort_by to the fetch hash *}
		{set $fetch_hash=$fetch_hash|merge( hash('sort_by', $sort_by) )}

		{* Now we fetch the content *}
		{def $nodes=fetch(content, $fetch_function, $fetch_hash)}

		{* We adjust the $max_nodes just in case if the number of fetched nodes is lower than $max_nodes *}
		{set $max_nodes=min($nodes_count,$max_nodes)}
				
		{* We go through the fetched items and display them *}
		{for 1 to $max_nodes as $count}
			{node_view_gui content_node=$nodes[$index] view=$list_item_view}
			{set $index=mod(sum($index,$step),$nodes_count)}
		{/for}
		
	{/cache-block}

{else}
	{set $max_nodes=0}
{/if}

Now, if we want to use the $step that is not 1 but created randomly we should replace this part of code:

	{* We get the index of the first element randomly and define the step of 1. (we could also make $step randomly but that's more complicated *}
	{def $index=rand(0, dec($nodes_count)) $step=1}

with this code:

{* We get the index of the first element randomly *}
{def $index=rand(0, dec($nodes_count))}

{*
Now we get the $step randomly, but with some adjustments.
The idea is to set $step to a random value and then repeat this process until
the remainder od division ($nodes_count / $step) is zero.
(because otherwise we could get some nodes displayed more than once)
*}


{* The $max_iterations parameter. If we don't get the $step that has the
remainder zero after $max_iterations, we give up and set $step to 1 *}
{def $max_iterations=10}


{* We set the initial random value for $step *}
{def $step=rand(0, dec($nodes_count))}


{* We loop until:
- mod($nodes_count / $step) = 0
- or the loop has been executed $max_iterations times
*}
{def $i=0}
{while and(lt($i,$max_iterations),eq(mod($nodes_count,$step),0))}
	{* We generate a new random $step *}
	{set $step=rand(0, dec($nodes_count))}
	{set $i=inc($i)}
{/while}


{* If we didnt find the $step that has the remainder zero after the loop, we set the $step to 1*}
{if eq(mod($nodes_count,$step),0))}
	{set $step=1}
{/if}

This is it. It works well for me. If you have any further questions, feel free to ask.

--
Nothing is impossible. Not if you can imagine it!

Hubert Farnsworth

Marko Žmak

Friday 26 December 2008 5:39:50 am

P.S. And of course, like Stéphane suggested you should set:

{set-block scope=root variable=cache_ttl}0{/set-block}

before this if you're using this in some template that is cached (e.g. node view override).

--
Nothing is impossible. Not if you can imagine it!

Hubert Farnsworth

Jorge estévez

Friday 26 December 2008 8:00:55 am

Thanks, I will try and come back later.

Diseño Web Cuba
Web Design Cuba
www.elfosdesign.com

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.