Forums / Developer / eZFind - Adding extra information in SolR

eZFind - Adding extra information in SolR

Author Message

Maxime Thomas

Sunday 09 January 2011 8:27:07 am

Hi eZ People,

I will try to be the most clear as possible to explain what I want to do.

I've made an extension that handles extra data in separate tables in the eZPublish database.

I would like to find a simple and coherent way to index the data and to be able to do some queries in SolR.

There's two scenarii for indexation :

1 - I find a way to use eZFind cronjob to index my data.

2 - I do my own indexer using ezcSearch (which is already in eZPublish).

For the query part, I also have two options :

1 - I use eZFind to get the data and get the cool features already developped.

2 - I do my own queries and don't have the cool features (or I have to do it myself).

As far as I understand how the big thing works, I've thought about some points :

1 - The data schema set in the SolR instance delivered with eZFind is not compliant with my data. It's normal because eZFind returns node ids and not other things.

2 - The best strategy is to make my own indexer and user eZFind to get back my data. By this way, I can control how data is indexed (useful for performance and data update questions) and I deffer eZFind do the query job.

So the question is : can I enhance the current schema to fit my needs without interfering with eZFind ?

Is this the best thing to do ?

Have you guys already done that ?

Thank you for any kind of help !

Maxime Thomas
maxime.thomas@wascou.org | www.wascou.org | http://twitter.com/wascou

Company Blog : http://www.wascou.org/eng/Company/Blog
Technical Blog : http://share.ez.no/blogs/maxime-thomas

gilles guirand

Sunday 09 January 2011 8:59:46 am

Hi maxime,

  • Do you want to share eZ results and extra data results ?
  • Do you display your extra data full view (link destination) inside eZ ? module ?

Option 1 : i you want to use eZFind (but poor performance)

  • Create a custom datatype
  • Create a class, with 1 attribute / using your custom datatype
  • Create a custom PHP Class to map your datatype and eZFind
  • Create 1 content object / node for each external data
  • enjoy !

Option 2 : without eZ Find, but better performances

  • Index your datas with PHP / Solr (you could copy / paste some parts of ezfind code), or use ezc if you want
  • Create a custom Fetch, like eZFind does (you could copy / paste some parts of ezfind code, and replace ezcontent queries by your extra datas queris)

--
Gilles Guirand
eZ Community Board Member
http://twitter.com/gandbox
http://www.gandbox.fr

Maxime Thomas

Sunday 09 January 2011 1:33:56 pm

The first option is not a real option because it means to duplicate data and this is exactly what I want to avoid.

As mentionned in the documentaion of SolR, the best option is to define a new "core" and index separately my information.

Apparently too, eZFind 2.2 comes with a multicore configuration, one core for each language. I'm going to dig to see if I can use one of these core for my own data.

And another good point, my job is simplified by the ezcSearch component.

Maxime Thomas
maxime.thomas@wascou.org | www.wascou.org | http://twitter.com/wascou

Company Blog : http://www.wascou.org/eng/Company/Blog
Technical Blog : http://share.ez.no/blogs/maxime-thomas

gilles guirand

Sunday 09 January 2011 2:48:05 pm

No you don't duplicate data. Your attribute (using your custom datatype) just have to store a koreign key to your extrenal tables, and nothing else. You can't use eZ Find without this link.

If you don't want to create ezcontentobjects, so you could try the option 2 :

  • Index your datas with PHP / Solr (you could copy / paste some parts of ezfind code), or use ezc if you want
  • Create a custom Fetch, like eZFind does (you could copy / paste some parts of ezfind code, and replace ezcontent queries by your extra datas queries)

--
Gilles Guirand
eZ Community Board Member
http://twitter.com/gandbox
http://www.gandbox.fr

Maxime Thomas

Monday 10 January 2011 5:12:24 am

Again on the first solution : if I made a custom datatype which will store what I want to index, I duplicate Data (it means that if I update my external data, I have to publish again the content on the eZ side). it's definitively not a good way for what I want to do. Another bad point is that I set new nodes in eZPublish that are not made to be shown on the website.

I've followed the second track and I've succeeded to index external data in another core, different from ezfind standard cores without hacking the whole thing.

It's pretty cool but if you need to index heterogenous and linked data, you need to specify a shared schema for this core.

I was a bit worried about the ability of ezcSearch to handle the multi core but it's possible (in a simple way, without sharding) and it fits my need.

Thank you anyway to make some purpose, it's feeding the discussion.

Maxime Thomas
maxime.thomas@wascou.org | www.wascou.org | http://twitter.com/wascou

Company Blog : http://www.wascou.org/eng/Company/Blog
Technical Blog : http://share.ez.no/blogs/maxime-thomas

Paul Borgermans

Monday 10 January 2011 1:43:04 pm

Hello Maxime

A few bug fixes/enhancements and docs are keeping you from the desired outcome.

Unfortunately, I am very occupied at the moment .. including adding more of those capabilities to eZ Find for searching native ez publish and "foreign" data at the same time in a flexible way.

Once this is finished (in about 10 days or so), I'll post this enhanced version of eZ Find for general consumption.

Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Maxime Thomas

Monday 10 January 2011 3:22:16 pm

Hi Paul,

I'm trying to make this flexible as my extension is an addon to eZPublish.

By the way, I got some errors when enabling all the cores (ez languages + mine), the data searched try to apply the ez language core schema and not mine.

I've got an error : "unknown field 'ezcsearch_type_s'" and this field is of course not in my schema.

Any idea so I can go on ?

EDIT :

Actually it comes from the ezcSearch component which adds a field called "ezcsearch_type" during the query. Very mysterious.

I will try to get some answers on the mailing list.

Maxime Thomas
maxime.thomas@wascou.org | www.wascou.org | http://twitter.com/wascou

Company Blog : http://www.wascou.org/eng/Company/Blog
Technical Blog : http://share.ez.no/blogs/maxime-thomas

Bertrand Dunogier

Tuesday 11 January 2011 1:31:18 am

Maxime,

the ezcsearch_type field is automatically added by ezcSearch (you can easily see it by greping for _type in ezc/Search). In the solr handler, the index() method indeed adds an attribute named ezcsearch_type_s (handlers/solr.php:873 on my copy).

I suggest you just add this field as a string one in your schema.xml for the core you index on, and see what's in there.

Bertrand Dunogier
eZ Systems Engineering, Lyon
http://twitter.com/bdunogier
http://gplus.to/BertrandDunogier

Maxime Thomas

Sunday 06 March 2011 4:04:22 pm

I finally find the answer to my question.

The ezcsearch_type_s field is the class type you want your results to be instancied in.

For example, if I'm indexing Articles, the ezsearch_type_s will be set with the data inside my index.

Then, searching for some text in my Article, the ezcsearch_type_s is automatically added to the query (as said by bertrand).

Maxime Thomas
maxime.thomas@wascou.org | www.wascou.org | http://twitter.com/wascou

Company Blog : http://www.wascou.org/eng/Company/Blog
Technical Blog : http://share.ez.no/blogs/maxime-thomas

eZ debug

Timing: Jan 17 2025 23:51:26
Script start
Timing: Jan 17 2025 23:51:26
Module start 'content'
Timing: Jan 17 2025 23:51:27
Module end 'content'
Timing: Jan 17 2025 23:51:27
Script end

Main resources:

Total runtime1.1456 sec
Peak memory usage4,096.0000 KB
Database Queries216

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0072 589.2578180.8359
Module start 'content' 0.00721.0040 770.0938728.3281
Module end 'content' 1.01120.1343 1,498.4219348.4297
Script end 1.1455  1,846.8516 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00470.4085210.0002
Check MTime0.00170.1464210.0001
Mysql Total
Database connection0.00060.052910.0006
Mysqli_queries1.033390.20212160.0048
Looping result0.00240.20582140.0000
Template Total1.111997.120.5559
Template load0.00300.260020.0015
Template processing1.108996.794220.5544
Template load and register function0.00010.009310.0001
states
state_id_array0.00080.072010.0008
state_identifier_array0.00130.112120.0006
Override
Cache load0.00260.2236880.0000
Sytem overhead
Fetch class attribute can translate value0.00140.123450.0003
Fetch class attribute name0.00130.1166130.0001
XML
Image XML parsing0.00250.217950.0005
class_abstraction
Instantiating content class attribute0.00010.0057180.0000
General
dbfile0.00270.2392470.0001
String conversion0.00000.000630.0000
Note: percentages do not add up to 100% because some accumulators overlap

CSS/JS files loaded with "ezjscPacker" during request:

CacheTypePacklevelSourceFiles
CSS0extension/community/design/community/stylesheets/ext/jquery.autocomplete.css
extension/community_design/design/suncana/stylesheets/scrollbars.css
extension/community_design/design/suncana/stylesheets/tabs.css
extension/community_design/design/suncana/stylesheets/roadmap.css
extension/community_design/design/suncana/stylesheets/content.css
extension/community_design/design/suncana/stylesheets/star-rating.css
extension/community_design/design/suncana/stylesheets/syntax_and_custom_tags.css
extension/community_design/design/suncana/stylesheets/buttons.css
extension/community_design/design/suncana/stylesheets/tweetbox.css
extension/community_design/design/suncana/stylesheets/jquery.fancybox-1.3.4.css
extension/bcsmoothgallery/design/standard/stylesheets/magnific-popup.css
extension/sevenx/design/simple/stylesheets/star_rating.css
extension/sevenx/design/simple/stylesheets/libs/fontawesome/css/all.min.css
extension/sevenx/design/simple/stylesheets/main.v02.css
extension/sevenx/design/simple/stylesheets/main.v02.res.css
JS0extension/ezjscore/design/standard/lib/yui/3.17.2/build/yui/yui-min.js
extension/ezjscore/design/standard/javascript/jquery-3.7.0.min.js
extension/community_design/design/suncana/javascript/jquery.ui.core.min.js
extension/community_design/design/suncana/javascript/jquery.ui.widget.min.js
extension/community_design/design/suncana/javascript/jquery.easing.1.3.js
extension/community_design/design/suncana/javascript/jquery.ui.tabs.js
extension/community_design/design/suncana/javascript/jquery.hoverIntent.min.js
extension/community_design/design/suncana/javascript/jquery.popmenu.js
extension/community_design/design/suncana/javascript/jScrollPane.js
extension/community_design/design/suncana/javascript/jquery.mousewheel.js
extension/community_design/design/suncana/javascript/jquery.cycle.all.js
extension/sevenx/design/simple/javascript/jquery.scrollTo.js
extension/community_design/design/suncana/javascript/jquery.cookie.js
extension/community_design/design/suncana/javascript/ezstarrating_jquery.js
extension/community_design/design/suncana/javascript/jquery.initboxes.js
extension/community_design/design/suncana/javascript/app.js
extension/community_design/design/suncana/javascript/twitterwidget.js
extension/community_design/design/suncana/javascript/community.js
extension/community_design/design/suncana/javascript/roadmap.js
extension/community_design/design/suncana/javascript/ez.js
extension/community_design/design/suncana/javascript/ezshareevents.js
extension/sevenx/design/simple/javascript/main.js

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
9content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
9content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
11content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
4content/datatype/view/ezxmltags/li.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/li.tplEdit templateOverride template
4content/datatype/view/ezxmltags/ul.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/ul.tplEdit templateOverride template
2content/datatype/view/ezxmltags/strong.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/strong.tplEdit templateOverride template
1pagelayout.tpl<No override>extension/sevenx/design/simple/templates/pagelayout.tplEdit templateOverride template
 Number of times templates used: 41
 Number of unique templates used: 8

Time used to render debug report: 0.0001 secs