Forums / Setup & design / PDF Indexing

PDF Indexing

Author Message

Betsy Gamrat

Friday 29 December 2006 7:42:48 pm

Hi,

I followed the directions on this page: http://ez.no/ezpublish/documentation/configuration/optimization/speeding_up_acrobat_pdf_document_indexing_ and was able to index PDFs, with no trouble.

I wanted to install ezpdftotext and pdftotext on the server in /usr/local/bin, so I could access them from all the sites on the server, but I can't get it to work.

I checked the server error logs, and the eZ logs, and they weren't helpful.

I ran the commands with just straight PHP, using passthru, and everything was okay.

Any ideas?

Thank you in advance,

Betsy

kracker (the)

Friday 29 December 2006 11:37:25 pm

Betsy,

That documentation entry looks rather dated despite the fresh notes (comments).

Have you read this article?
<i>http://ez.no/layout/set/printarticle/community/articles/indexing_multiple_binary_file_types</i>

I did and was then sent down this path.
<i>http://ez.no/community/forum/developer/binary_file_search_index_creation_debugging_3_7_4
http://ezpedia.org/wiki/en/ez/references
http://ezpedia.org/wiki/en/ez/solution_building_php_cli_for_ez_publish_command_line_scripts
http://ezpedia.org/wiki/en/ez/debugging
http://ezpedia.org/wiki/en/ez/tips_for_working_with_ez_publish_cli_scripts
</i>

eZ search is simple to setup, yet one can run into php references problems (segfaults) without a custom patched php-cli binary.

<i>I got by with a little help from my friends.</i>

Cheers,
<i>//kracker

eminem : don,t call me' marshall</i>

Member since: 2001.07.13 || http://ezpedia.se7enx.com/

Betsy Gamrat

Saturday 30 December 2006 7:03:24 am

Kracker,

Thank you, I have alot of information to review.

My real question is: why will <b>ezpdftotext</b> run out of the site's local directory, but not out of <i>/usr/local/bin</i>. The indexing works great as described on the rather dated post - except I can't make it available to the rest of the server.

Since I got <b>ezpdftotext</b> to run under PHP, and execute the scripts in all the locations, I was wondering if eZ had some security settings that prevented execution of scripts outside the local directory. I did check the server settings, and since the code ran okay outside of eZ, I am assuming the settings are alright.

My goal is to construct a robust infrastructure that will allow extremely efficient deployment of eZ sites. :)

Paul Borgermans

Saturday 30 December 2006 7:24:08 am

Hi Betsy

I do exactly this (putting it in /usr/local/bin to share among different web sites) so it really looks like a permission (or maybe a path problem). I would add some hard coded statements and increase the log level (error_reporting in php.ini) if there is not enough in the debug output.

Good luck identifying the problem
Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Betsy Gamrat

Saturday 30 December 2006 11:04:11 am

Hi,

I tried one more time, before calling for help ... and it worked. <b>:)</b>

After all is said and done, these are the key components:

A <b>custom class</b> (I called mine 'File - Indexed') that has a file attribute and sets the 'Is
Searchable' flag to true for that attribute.

<b>/usr/local/bin</b> has these files

-rwxr-xr-x 3 root root 62 Dec 30 12:51 <b>ezpdftotext*</b>
-rwxr-xr-x 3 root root 1135987 Dec 24 08:34 <b>pdftotext*</b>

<b>ezpdftotext</b>

#!/bin/sh
#ezpdftotext script
<b>/usr/local/bin/pdftotext</b> $1 -

<b>override/binaryfile.ini.append.php</b>

<?php /* #?ini charset="utf8"?

[PDFHandlerSettings]
TextExtractionTool=<b>/usr/local/bin/ezpdftotext</b>

*/ ?>

I think the path in the ini file is probably unnecessary, because
/usr/local/bin is in the path anyway.

I uploaded a PDF file, and it worked.

Thanks for the support - it really helped.

Betsy

eZ debug

Timing: Jan 18 2025 19:25:39
Script start
Timing: Jan 18 2025 19:25:39
Module start 'content'
Timing: Jan 18 2025 19:25:40
Module end 'content'
Timing: Jan 18 2025 19:25:40
Script end

Main resources:

Total runtime1.2573 sec
Peak memory usage4,096.0000 KB
Database Queries202

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0074 588.7031180.8516
Module start 'content' 0.00741.0268 769.5547634.2734
Module end 'content' 1.03420.2230 1,403.8281341.3906
Script end 1.2572  1,745.2188 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00410.3295210.0002
Check MTime0.00160.1256210.0001
Mysql Total
Database connection0.00130.100810.0013
Mysqli_queries1.153891.76652020.0057
Looping result0.00290.23422000.0000
Template Total1.219797.020.6099
Template load0.00210.169020.0011
Template processing1.217696.841920.6088
Template load and register function0.00010.007810.0001
states
state_id_array0.00100.083210.0010
state_identifier_array0.00160.123420.0008
Override
Cache load0.00190.1530670.0000
Sytem overhead
Fetch class attribute can translate value0.00170.137440.0004
Fetch class attribute name0.00110.089380.0001
XML
Image XML parsing0.00230.181240.0006
class_abstraction
Instantiating content class attribute0.00000.0018100.0000
General
dbfile0.00470.3707420.0001
String conversion0.00000.000430.0000
Note: percentages do not add up to 100% because some accumulators overlap

CSS/JS files loaded with "ezjscPacker" during request:

CacheTypePacklevelSourceFiles
CSS0extension/community/design/community/stylesheets/ext/jquery.autocomplete.css
extension/community_design/design/suncana/stylesheets/scrollbars.css
extension/community_design/design/suncana/stylesheets/tabs.css
extension/community_design/design/suncana/stylesheets/roadmap.css
extension/community_design/design/suncana/stylesheets/content.css
extension/community_design/design/suncana/stylesheets/star-rating.css
extension/community_design/design/suncana/stylesheets/syntax_and_custom_tags.css
extension/community_design/design/suncana/stylesheets/buttons.css
extension/community_design/design/suncana/stylesheets/tweetbox.css
extension/community_design/design/suncana/stylesheets/jquery.fancybox-1.3.4.css
extension/bcsmoothgallery/design/standard/stylesheets/magnific-popup.css
extension/sevenx/design/simple/stylesheets/star_rating.css
extension/sevenx/design/simple/stylesheets/libs/fontawesome/css/all.min.css
extension/sevenx/design/simple/stylesheets/main.v02.css
extension/sevenx/design/simple/stylesheets/main.v02.res.css
JS0extension/ezjscore/design/standard/lib/yui/3.17.2/build/yui/yui-min.js
extension/ezjscore/design/standard/javascript/jquery-3.7.0.min.js
extension/community_design/design/suncana/javascript/jquery.ui.core.min.js
extension/community_design/design/suncana/javascript/jquery.ui.widget.min.js
extension/community_design/design/suncana/javascript/jquery.easing.1.3.js
extension/community_design/design/suncana/javascript/jquery.ui.tabs.js
extension/community_design/design/suncana/javascript/jquery.hoverIntent.min.js
extension/community_design/design/suncana/javascript/jquery.popmenu.js
extension/community_design/design/suncana/javascript/jScrollPane.js
extension/community_design/design/suncana/javascript/jquery.mousewheel.js
extension/community_design/design/suncana/javascript/jquery.cycle.all.js
extension/sevenx/design/simple/javascript/jquery.scrollTo.js
extension/community_design/design/suncana/javascript/jquery.cookie.js
extension/community_design/design/suncana/javascript/ezstarrating_jquery.js
extension/community_design/design/suncana/javascript/jquery.initboxes.js
extension/community_design/design/suncana/javascript/app.js
extension/community_design/design/suncana/javascript/twitterwidget.js
extension/community_design/design/suncana/javascript/community.js
extension/community_design/design/suncana/javascript/roadmap.js
extension/community_design/design/suncana/javascript/ez.js
extension/community_design/design/suncana/javascript/ezshareevents.js
extension/sevenx/design/simple/javascript/main.js

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
5content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
5content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
14content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
9content/datatype/view/ezxmltags/line.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/line.tplEdit templateOverride template
1pagelayout.tpl<No override>extension/sevenx/design/simple/templates/pagelayout.tplEdit templateOverride template
 Number of times templates used: 35
 Number of unique templates used: 6

Time used to render debug report: 0.0002 secs