Advanced development with eZ Find - part 3 : Leveraging the Solr syntax

Introduction

The previous and second post of this series described how to index additionnal fields in Solr, in order to leverage them using eZ Find's native syntaxe, of the form : 'mycontentclass/mycontentattribute/mycontentsubattribute'. This eZ Find-specific syntax is very comfortable, but not exclusive. It is indeed possible to mix the eZ Find-specific syntax and the Solr-specific syntax, like for example the field names ( 'attr_myfield_type' ) or logic operators ( AND, NOT, etc. ) .

"

- Yes, this is bad practice. An “interface” syntax is not made to be worked-around, this potentially endangering the lower layers' evolutivity, namely Solr.

- Yes, this can make development easier in some cases, or even be a life-saver in some complex situations

"

This post dives into concrete examples of how and when one can leverage Solr's syntax. The examples are simplified on purpose, for obvious educational reasons.

 

Pre-requisites and target population

This tutorial requires to know how to set up eZ Find. The online documentation describes the required operation in details, there : http://ez.no/doc/extensions/ez_find/2_2.

You should also read and understand the first and second part of this tutorial :

 

Step 1 : How to sort on an attribute present in several content classes

The issue

It is one of eZ Publish's timeless problematics :

  • Two distinct content classes are created, for some reason : "Post" and "Article"
  • Identical attributes are added to both classes, for they are useful in both, like a "Date" attribute for instance.

Result :
It is impossible to have both "Post" and "Article" objects in a fetch result, sorted by decreasing date (unless a terrifying template operator is developed for this specific purpose). Generally, developers try to use one single content class, a more generic one, to work around the issue, or rather relocate it ( mutualization of content classes has its drawbacks ).

 

The solution, using eZ Find

The first post in this series details the naming conventions of Solr fields. One positive side-effect of this convention (related to Solr's dynamicfields concept) is the fortunate absence of the content class identfier in the field name. This means we can leverage this homonymy as we wish, through searches, filters or sorts depending on the use-case.

eZ Publish template code example when filtering on the “Post” content class only :

{def $search_result = fetch( 'content', 'list', hash( 'parent_node_id', 2,
    'class_filter_type',  'include',
    'class_filter_array', array(24),
    'sort_by', array( array( 'attribute', false(), 'post/date' ) ),
    'limit', 10,
    'depth', 3
))}
 

Equivalent eZ Find template code example, solving our cross-content-class sort, applied to “Post” & “Article” :

{def $search=fetch( ezfind, search,
     hash( query , '',
           'class_id', array('post', 'article'),
           'limit', 10,
           'sort_by', hash('attr_date_dt', 'desc')
))}
 

Note :
A desirable evolution of eZ Find would be to give the possibility to use a '//date' type of syntax, in order to make optional the currently automatically added content class filter in the query sent to Solr.

 

Step 2 : How to work with keywords

Unlike the previous example of dates, keywords in eZ Publish are stored in an external table ezkeyword_attribute_link ( additional storage location, on top of the standard content storage location, for an extended logic ), allowing to link a given keyword to various pieces of content, of various content classes. However, the per-keyword fetch is not as equipped as a standard content/list fetch for instance, in terms of available filters (class_filter_type, class_filter_array, extended_attribute_filter, etc.). This limitation is understandable since allowing for a cross-content-class filter reduces freedom when it comes to filtering on class-specific attributes.

Following the same idea as for the per-date sorting, it is possible to leverage eZ Find to realize all necessary operations around keywords. Here are examples :

'filter', array('attr_tags_lk:"ez publish"', 'NOT attr_title_t:"RSS"')

Result :
Only returns the results associated with the "eZ Publish" or "ez publish" keywords (mind the usage of _lk, meaning lowercase ), and the title of which do not contain "RSS".

'filter', array('attr_tags_lk:"ez publish"', 'attr_tags_lk:"mootools"')

Result :
Only returns the results associated with both the "eZ Publish" and "ez publish" keywords, and the "Mootools" and "mootools" keywords.

 

Step 3 : How to create complex search filters

Here are a few illustrations of what it is possible to achieve using the vast set of Lucene operators. The set of available operators depends on the deployed Solr version ( Solr 1.4, shipped with eZ Find 2.2 at the time of writing this post ).

'filter', array('NOT ( attr_title_t:(ez+find) OR attr_intro_t:(ez+find) )') 

Result :
Only returns results which contain the 'ez find' or 'eZ Find' expression in the 'title' or 'Intro' attributes. Note the usage of the 'text' (_t) of the 'title' attribute, bringing case-insensitivity, unlike the 'string' type.

 
'filter', array('attr_title_s:[A TO G] AND ezf_df_text:google~0.7')

Result :
Only returns results of which the 'title' starts by A,B,C,D, E or F (G excluded), and the content of which approximately contains the 'google' expression ( means it may also contain : Google, iGoogle, etc.).

  • Note : the '0.7' ratio can be adjusted to better suit a given situation
  • Note bis : the 'ezf_df_text' field is built dynamically, by copying the content of all of the document's 'string', 'text' ou 'keyword' fields. One could also use the 'ezf_sp_words' field if the spelcheck feature is enabled. See the schema.xml file, and the definition of these “copyField” fields for more details.
 

Conclusion

This last post presents how eZ Find helps working around and/or extending legacy eZ Publish fetches, by for instance using a cross-content-class query, or by relying on Apache Solr's native filters (Lucene syntax).

eZ Find constitutes one of the major breakthroughs of eZ Publish, proposing a first step towards the next CMS generations, namely :

  • An advanced indexing and querying system. The current integration level of Solr in eZ Find is close to exhaustive, placing eZ Publish a step ahead its Open Source concurrents
  • A dynamic storage system : currently handled through am obsolete SQL / Filesystem layer, should evolve towards a dynamic storage system as MongoDB or CouchDB. It probably is eZ Systems' next challenge
  • Both an exhaustive and well-performing API and Framework : a key project for eZ Publish, which would deserve a better performing template engine, and most important, a thorough low-level workflow layer, proposing hooks in many, key places in all available operations : what is the future of Zeta Components in this regard ? Should a wide-spread framework be used instead (Zend) ?

We can also raise the the subject of convergence between professional CMSes and DMSes (Document Management Systems, like Alfresco). Both universes tend to come closer functionally, when it comes to achieving the three points mentioned right above (indexing, storage, API).

As many questions and challenges eZ Systems will have to address within the forthcoming months or years, relying on a major asset : eZ Find is already functioning, widely used, extensively field-tested, extensible and highly competitive when it comes to complex and professional deployments.

I would like to thank Nicolas Pastorino for translating this tutorial to english, and Paul Borgermans for his availability.

Resources

 

This tutorial is available for offline reading :
Gilles Guirand - Advanced development with eZ Find - part 3 - Leveraging Solr's syntax - PDF Version

 

 

About the author : Gilles Guirand

Gilles Guirand is a certified eZ Publish Developer. He is widely acknowledged by the community to be one of the national experts on highly technical and complex eZ Publish issues. With over 12 years experience in designing complex web architectures, he has been the driving force behind some of the most ambitious eZ Publish Projects: Web Site Generators, HighAvailability, Widgets, SOA, eZ Find, SSO, Web Accessibility and IT systems Integrations.

License

This work is licensed under the Creative Commons – Share Alike license ( http://creativecommons.org/licenses/by-sa/3.0 ).

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 18 2025 00:05:39
Script start
Timing: Jan 18 2025 00:05:39
Module start 'layout'
Timing: Jan 18 2025 00:05:39
Module start 'content'
Warning: XML output handler: link Jan 18 2025 00:05:40
Node #95688 doesn't exist
Warning: XML output handler: link Jan 18 2025 00:05:40
Node #95436 doesn't exist
Timing: Jan 18 2025 00:05:40
Module end 'content'
Timing: Jan 18 2025 00:05:40
Script end

Main resources:

Total runtime0.1764 sec
Peak memory usage4,096.0000 KB
Database Queries66

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0042 588.2031152.6563
Module start 'layout' 0.00420.0026 740.859439.5000
Module start 'content' 0.00690.1683 780.3594818.1406
Module end 'content' 0.17520.0012 1,598.500020.5234
Script end 0.1763  1,619.0234 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00281.6151160.0002
Check MTime0.00120.6730160.0001
Mysql Total
Database connection0.00060.358910.0006
Mysqli_queries0.057332.4632660.0009
Looping result0.00050.2640630.0000
Template Total0.148384.120.0742
Template load0.00221.243220.0011
Template processing0.146182.833420.0731
Template load and register function0.00020.108810.0002
states
state_id_array0.00452.529270.0006
state_identifier_array0.00412.317480.0005
Override
Cache load0.00311.77742000.0000
Sytem overhead
Fetch class attribute name0.00140.812440.0004
Fetch class attribute can translate value0.00000.020530.0000
class_abstraction
Instantiating content class attribute0.00000.006440.0000
XML
Image XML parsing0.00482.729430.0016
General
dbfile0.00472.6400190.0002
String conversion0.00000.005140.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/article.tplextension/sevenx/design/simple/override/templates/full/article.tplEdit templateOverride template
1content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
10content/datatype/view/ezxmltags/header.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/header.tplEdit templateOverride template
3content/datatype/view/ezxmltags/embed.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/embed.tplEdit templateOverride template
3content/view/embed.tplembed/image.tplextension/sevenx/design/simple/override/templates/embed/image.tplEdit templateOverride template
3content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
17content/datatype/view/ezxmltags/link.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/link.tplEdit templateOverride template
21content/datatype/view/ezxmltags/strong.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/strong.tplEdit templateOverride template
32content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
6content/datatype/view/ezxmltags/emphasize.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/emphasize.tplEdit templateOverride template
1content/datatype/view/ezxmltags/quote.tpldatatype/ezxmltext/quote.tplextension/ezwebin/design/ezwebin/override/templates/datatype/ezxmltext/quote.tplEdit templateOverride template
10content/datatype/view/ezxmltags/separator.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltags/separator.tplEdit templateOverride template
3content/datatype/view/ezxmltags/embed-inline.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/embed-inline.tplEdit templateOverride template
3content/view/embed-inline.tpl<No override>design/standard/templates/content/view/embed-inline.tplEdit templateOverride template
15content/datatype/view/ezxmltags/li.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/li.tplEdit templateOverride template
5content/datatype/view/ezxmltags/ul.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/ul.tplEdit templateOverride template
4content/datatype/view/ezxmltags/newpage.tpl<No override>extension/community/design/standard/templates/content/datatype/view/ezxmltags/newpage.tplEdit templateOverride template
11content/datatype/view/ezxmltags/line.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/line.tplEdit templateOverride template
6content/datatype/view/ezxmltags/literal.tpl<No override>extension/community/design/standard/templates/content/datatype/view/ezxmltags/literal.tplEdit templateOverride template
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 156
 Number of unique templates used: 20

Time used to render debug report: 0.0001 secs