Advanced development with eZ Find - part 3 : Leveraging the Solr syntax

Introduction

The previous and second post of this series described how to index additionnal fields in Solr, in order to leverage them using eZ Find's native syntaxe, of the form : 'mycontentclass/mycontentattribute/mycontentsubattribute'. This eZ Find-specific syntax is very comfortable, but not exclusive. It is indeed possible to mix the eZ Find-specific syntax and the Solr-specific syntax, like for example the field names ( 'attr_myfield_type' ) or logic operators ( AND, NOT, etc. ) .

- Yes, this is bad practice. An “interface” syntax is not made to be worked-around, this potentially endangering the lower layers' evolutivity, namely Solr.

- Yes, this can make development easier in some cases, or even be a life-saver in some complex situations

This post dives into concrete examples of how and when one can leverage Solr's syntax. The examples are simplified on purpose, for obvious educational reasons.

Pre-requisites and target population

This tutorial requires to know how to set up eZ Find. The online documentation describes the required operation in details, there : http://ez.no/doc/extensions/ez_find/2_2 .

You should also read and understand the first and second part of this tutorial :

Step 1 : How to sort on an attribute present in several content classes

The issue

It is one of eZ Publish's timeless problematics :

Two distinct content classes are created, for some reason : "Post" and "Article"
Identical attributes are added to both classes, for they are useful in both, like a "Date" attribute for instance.

Result :
It is impossible to have both "Post" and "Article" objects in a fetch result, sorted by decreasing date (unless a terrifying template operator is developed for this specific purpose). Generally, developers try to use one single content class, a more generic one, to work around the issue, or rather relocate it ( mutualization of content classes has its drawbacks ).

The solution, using eZ Find

The first post in this series details the naming conventions of Solr fields. One positive side-effect of this convention (related to Solr's dynamicfields concept) is the fortunate absence of the content class identfier in the field name. This means we can leverage this homonymy as we wish, through searches, filters or sorts depending on the use-case.

eZ Publish template code example when filtering on the “Post” content class only :

{def $search_result = fetch( 'content', 'list', hash( 'parent_node_id', 2,
    'class_filter_type',  'include',
    'class_filter_array', array(24),
    'sort_by', array( array( 'attribute', false(), 'post/date' ) ),
    'limit', 10,
    'depth', 3
))}

Equivalent eZ Find template code example, solving our cross-content-class sort, applied to “Post” & “Article” :

{def $search=fetch( ezfind, search,
     hash( query , '',
           'class_id', array('post', 'article'),
           'limit', 10,
           'sort_by', hash('attr_date_dt', 'desc')
))}

Note :
A desirable evolution of eZ Find would be to give the possibility to use a '//date' type of syntax, in order to make optional the currently automatically added content class filter in the query sent to Solr.

Step 2 : How to work with keywords

Unlike the previous example of dates, keywords in eZ Publish are stored in an external table ezkeyword_attribute_link ( additional storage location, on top of the standard content storage location, for an extended logic ), allowing to link a given keyword to various pieces of content, of various content classes. However, the per-keyword fetch is not as equipped as a standard content/list fetch for instance, in terms of available filters (class_filter_type, class_filter_array, extended_attribute_filter, etc.). This limitation is understandable since allowing for a cross-content-class filter reduces freedom when it comes to filtering on class-specific attributes.

Following the same idea as for the per-date sorting, it is possible to leverage eZ Find to realize all necessary operations around keywords. Here are examples :

'filter', array('attr_tags_lk:"ez publish"', 'NOT attr_title_t:"RSS"')

Result :
Only returns the results associated with the "eZ Publish" or "ez publish" keywords (mind the usage of _lk, meaning lowercase ), and the title of which do not contain "RSS".

'filter', array('attr_tags_lk:"ez publish"', 'attr_tags_lk:"mootools"')

Result :
Only returns the results associated with both the "eZ Publish" and "ez publish" keywords, and the "Mootools" and "mootools" keywords.

Step 3 : How to create complex search filters

Here are a few illustrations of what it is possible to achieve using the vast set of Lucene operators. The set of available operators depends on the deployed Solr version ( Solr 1.4, shipped with eZ Find 2.2 at the time of writing this post ).

'filter', array('NOT ( attr_title_t:(ez+find) OR attr_intro_t:(ez+find) )')

Result :
Only returns results which contain the 'ez find' or 'eZ Find' expression in the 'title' or 'Intro' attributes. Note the usage of the 'text' (_t) of the 'title' attribute, bringing case-insensitivity, unlike the 'string' type.

'filter', array('attr_title_s:[A TO G] AND ezf_df_text:google~0.7')

Result :
Only returns results of which the 'title' starts by A,B,C,D, E or F (G excluded), and the content of which approximately contains the 'google' expression ( means it may also contain : Google, iGoogle, etc.).

Note : the '0.7' ratio can be adjusted to better suit a given situation
Note bis : the 'ezf_df_text' field is built dynamically, by copying the content of all of the document's 'string', 'text' ou 'keyword' fields. One could also use the 'ezf_sp_words' field if the spelcheck feature is enabled. See the schema.xml file, and the definition of these “copyField” fields for more details.

Conclusion

This last post presents how eZ Find helps working around and/or extending legacy eZ Publish fetches, by for instance using a cross-content-class query, or by relying on Apache Solr's native filters (Lucene syntax).

eZ Find constitutes one of the major breakthroughs of eZ Publish, proposing a first step towards the next CMS generations, namely :

An advanced indexing and querying system. The current integration level of Solr in eZ Find is close to exhaustive, placing eZ Publish a step ahead its Open Source concurrents
A dynamic storage system : currently handled through am obsolete SQL / Filesystem layer, should evolve towards a dynamic storage system as MongoDB or CouchDB. It probably is eZ Systems' next challenge
Both an exhaustive and well-performing API and Framework : a key project for eZ Publish, which would deserve a better performing template engine, and most important, a thorough low-level workflow layer, proposing hooks in many, key places in all available operations : what is the future of Zeta Components in this regard ? Should a wide-spread framework be used instead (Zend) ?

We can also raise the the subject of convergence between professional CMSes and DMSes (Document Management Systems, like Alfresco). Both universes tend to come closer functionally, when it comes to achieving the three points mentioned right above (indexing, storage, API).

As many questions and challenges eZ Systems will have to address within the forthcoming months or years, relying on a major asset : eZ Find is already functioning, widely used, extensively field-tested, extensible and highly competitive when it comes to complex and professional deployments.

I would like to thank Nicolas Pastorino for translating this tutorial to english, and Paul Borgermans for his availability.

Resources

eZ Find 2.2 official documentation :
http://ez.no/doc/extensions/ez_find/2_2
eZ Find source code : here
Apache Solr Wiki : http://wiki.apache.org/solr/
Tutorial on eZPedia.org : "How to create a template operator"
http://ezpedia.org/ez/template_operators
Keyword fetching documentation : here
Lucene query parser syntax :
http://lucene.apache.org/java/2_9_1/queryparsersyntax.html
eZ Find's Solr schema definition file : here
Solr Copy Fields documentation :
http://wiki.apache.org/solr/SchemaXml#Copy_Fields

This tutorial is available for offline reading :
Gilles Guirand - Advanced development with eZ Find - part 3 - Leveraging Solr's syntax - PDF Version

About the author : Gilles Guirand

Gilles Guirand is a certified eZ Publish Developer. He is widely acknowledged by the community to be one of the national experts on highly technical and complex eZ Publish issues. With over 12 years experience in designing complex web architectures, he has been the driving force behind some of the most ambitious eZ Publish Projects: Web Site Generators, HighAvailability, Widgets, SOA, eZ Find, SSO, Web Accessibility and IT systems Integrations.

License

This work is licensed under the Creative Commons – Share Alike license ( http://creativecommons.org/licenses/by-sa/3.0 ).

eZ debug

Timing:	Jan 18 2025 00:05:39
Script start
Timing:	Jan 18 2025 00:05:39
Module start 'layout'
Timing:	Jan 18 2025 00:05:39
Module start 'content'
Warning: XML output handler: link	Jan 18 2025 00:05:40
Node #95688 doesn't exist
Warning: XML output handler: link	Jan 18 2025 00:05:40
Node #95436 doesn't exist
Timing:	Jan 18 2025 00:05:40
Module end 'content'
Timing:	Jan 18 2025 00:05:40
Script end

Main resources:

Total runtime	0.1764 sec
Peak memory usage	4,096.0000 KB
Database Queries	66

Timing points:

Checkpoint	Start (sec)	Duration (sec)	Memory at start (KB)	Memory used (KB)
Script start	0.0000	0.0042	588.2031	152.6563
Module start 'layout'	0.0042	0.0026	740.8594	39.5000
Module start 'content'	0.0069	0.1683	780.3594	818.1406
Module end 'content'	0.1752	0.0012	1,598.5000	20.5234
Script end	0.1763		1,619.0234

Time accumulators:

Accumulator	Duration (sec)	Duration (%)	Count	Average (sec)
Ini load
Load cache	0.0028	1.6151	16	0.0002
Check MTime	0.0012	0.6730	16	0.0001
Mysql Total
Database connection	0.0006	0.3589	1	0.0006
Mysqli_queries	0.0573	32.4632	66	0.0009
Looping result	0.0005	0.2640	63	0.0000
Template Total	0.1483	84.1	2	0.0742
Template load	0.0022	1.2432	2	0.0011
Template processing	0.1461	82.8334	2	0.0731
Template load and register function	0.0002	0.1088	1	0.0002
states
state_id_array	0.0045	2.5292	7	0.0006
state_identifier_array	0.0041	2.3174	8	0.0005
Override
Cache load	0.0031	1.7774	200	0.0000
Sytem overhead
Fetch class attribute name	0.0014	0.8124	4	0.0004
Fetch class attribute can translate value	0.0000	0.0205	3	0.0000
class_abstraction
Instantiating content class attribute	0.0000	0.0064	4	0.0000
XML
Image XML parsing	0.0048	2.7294	3	0.0016
General
dbfile	0.0047	2.6400	19	0.0002
String conversion	0.0000	0.0051	4	0.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

Usage	Requested template	Template	Template loaded
1	node/view/full.tpl	full/article.tpl	extension/sevenx/design/simple/override/templates/full/article.tpl
1	content/datatype/view/ezxmltext.tpl	<No override>	extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tpl
10	content/datatype/view/ezxmltags/header.tpl	<No override>	design/standard/templates/content/datatype/view/ezxmltags/header.tpl
3	content/datatype/view/ezxmltags/embed.tpl	<No override>	design/standard/templates/content/datatype/view/ezxmltags/embed.tpl
3	content/view/embed.tpl	embed/image.tpl	extension/sevenx/design/simple/override/templates/embed/image.tpl
3	content/datatype/view/ezimage.tpl	<No override>	extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tpl
17	content/datatype/view/ezxmltags/link.tpl	<No override>	design/standard/templates/content/datatype/view/ezxmltags/link.tpl
21	content/datatype/view/ezxmltags/strong.tpl	<No override>	design/standard/templates/content/datatype/view/ezxmltags/strong.tpl
32	content/datatype/view/ezxmltags/paragraph.tpl	<No override>	extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tpl
6	content/datatype/view/ezxmltags/emphasize.tpl	<No override>	design/standard/templates/content/datatype/view/ezxmltags/emphasize.tpl
1	content/datatype/view/ezxmltags/quote.tpl	datatype/ezxmltext/quote.tpl	extension/ezwebin/design/ezwebin/override/templates/datatype/ezxmltext/quote.tpl
10	content/datatype/view/ezxmltags/separator.tpl	<No override>	extension/community_design/design/suncana/templates/content/datatype/view/ezxmltags/separator.tpl
3	content/datatype/view/ezxmltags/embed-inline.tpl	<No override>	design/standard/templates/content/datatype/view/ezxmltags/embed-inline.tpl
3	content/view/embed-inline.tpl	<No override>	design/standard/templates/content/view/embed-inline.tpl
15	content/datatype/view/ezxmltags/li.tpl	<No override>	design/standard/templates/content/datatype/view/ezxmltags/li.tpl
5	content/datatype/view/ezxmltags/ul.tpl	<No override>	design/standard/templates/content/datatype/view/ezxmltags/ul.tpl
4	content/datatype/view/ezxmltags/newpage.tpl	<No override>	extension/community/design/standard/templates/content/datatype/view/ezxmltags/newpage.tpl
11	content/datatype/view/ezxmltags/line.tpl	<No override>	design/standard/templates/content/datatype/view/ezxmltags/line.tpl
6	content/datatype/view/ezxmltags/literal.tpl	<No override>	extension/community/design/standard/templates/content/datatype/view/ezxmltags/literal.tpl
1	print_pagelayout.tpl	<No override>	extension/community/design/community/templates/print_pagelayout.tpl
Number of times templates used: 156 Number of unique templates used: 20

Time used to render debug report: 0.0001 secs