Forums / Install & configuration / PDF files not indexed

PDF files not indexed

Author Message

Jeroen Sangers

Thursday 29 June 2006 1:54:44 am

I am trying to include the contents of PDF files in the search index, but cannot get it to work.

I installed pstotext on my server, and tested it with a PDF file. I followed the steps as layed out in http://ez.no/products/ez_publish/documentation/configuration/configuration/search_engine/configuring_binary_file_indexing, and uploaded a PDF file to my site. However, when I search for some words in that file, no results show up.

Is there any way I can turn on logging/auditing to see what is happening when I upload a PDF file?

Siniša Šehović

Thursday 29 June 2006 11:20:51 pm

Hi Jeroen

I have the same problem on eZ 3.8.2.

Can anyone help us here? :-)

Best regards,
S.

---
If at first you don't succeed, look in the trash for the instructions.

Jeroen Sangers

Friday 30 June 2006 8:17:04 am

I still can't get it to work. I have tried moving around pstotext all over my server, I switched to pdftotext, I specified the full path to pstotext in my binaryfile.ini.append.php, but always I receive the same error:

Plugin for application/pdf was not found

Does anybody have a clue on how I can solve this?

Siniša Šehović

Saturday 01 July 2006 4:18:09 am

Hi Jeroen

What happend if you try to execute pstotext from linux shell?

Do you get any errors?

Did you try this aproach?
http://ez.no/community/forum/setup_design/indexing_binary_files_excel_and_powerpoint

S.

---
If at first you don't succeed, look in the trash for the instructions.

Jeroen Sangers

Monday 03 July 2006 1:33:13 am

I managed to solve it this weekend. There were two problems, and in the various configurations I have tried, always one of them appeared, until I tried the right combination!

The first problem is a mistake in the documentation. http://ez.no/products/ez_publish/documentation/configuration/configuration/search_engine/configuring_binary_file_indexing mentioned the following code:

[HandlerSettings]
MetaDataExtractor[application/pdf]=pdf

I copied that setting to my binaryfile.ini file, effectively destroying PDF parsing. Of course, I should have left it at the default value:

[HandlerSettings]
MetaDataExtractor[application/pdf]=ezpdf

The second problem I had was related to pdftotext. I've found out that the command used by eZ Publish (pdftotext example.pdf) does not produce any output. To get this to work, I had to modify kernel/classes/datatypes/ezbinaryfile/plugins/ezpdfparser.php:

passthru( "$textExtractionTool $fileName -" );

Siniša Šehović

Tuesday 04 July 2006 2:57:18 am

Hi Jeroen

Thanx for tip!

Now I can index my PDFs.

Best regards,
S.

---
If at first you don't succeed, look in the trash for the instructions.

eZ debug

Timing: Jan 19 2025 04:15:10
Script start
Timing: Jan 19 2025 04:15:11
Module start 'content'
Timing: Jan 19 2025 04:15:11
Module end 'content'
Timing: Jan 19 2025 04:15:12
Script end

Main resources:

Total runtime1.0559 sec
Peak memory usage4,096.0000 KB
Database Queries203

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0070 588.9688180.8359
Module start 'content' 0.00700.8993 769.8047586.9297
Module end 'content' 0.90640.1495 1,356.7344342.0313
Script end 1.0559  1,698.7656 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00420.4020210.0002
Check MTime0.00150.1436210.0001
Mysql Total
Database connection0.00090.082310.0009
Mysqli_queries0.965291.40282030.0048
Looping result0.00260.24882010.0000
Template Total1.027997.320.5139
Template load0.00200.187320.0010
Template processing1.025997.154020.5129
Template load and register function0.00010.008310.0001
states
state_id_array0.00080.073210.0008
state_identifier_array0.00070.065420.0003
Override
Cache load0.00170.1629430.0000
Sytem overhead
Fetch class attribute can translate value0.00180.170130.0006
Fetch class attribute name0.00120.110580.0001
XML
Image XML parsing0.00400.375530.0013
class_abstraction
Instantiating content class attribute0.00000.0030120.0000
General
dbfile0.00530.4989320.0002
String conversion0.00000.000530.0000
Note: percentages do not add up to 100% because some accumulators overlap

CSS/JS files loaded with "ezjscPacker" during request:

CacheTypePacklevelSourceFiles
CSS0extension/community/design/community/stylesheets/ext/jquery.autocomplete.css
extension/community_design/design/suncana/stylesheets/scrollbars.css
extension/community_design/design/suncana/stylesheets/tabs.css
extension/community_design/design/suncana/stylesheets/roadmap.css
extension/community_design/design/suncana/stylesheets/content.css
extension/community_design/design/suncana/stylesheets/star-rating.css
extension/community_design/design/suncana/stylesheets/syntax_and_custom_tags.css
extension/community_design/design/suncana/stylesheets/buttons.css
extension/community_design/design/suncana/stylesheets/tweetbox.css
extension/community_design/design/suncana/stylesheets/jquery.fancybox-1.3.4.css
extension/bcsmoothgallery/design/standard/stylesheets/magnific-popup.css
extension/sevenx/design/simple/stylesheets/star_rating.css
extension/sevenx/design/simple/stylesheets/libs/fontawesome/css/all.min.css
extension/sevenx/design/simple/stylesheets/main.v02.css
extension/sevenx/design/simple/stylesheets/main.v02.res.css
JS0extension/ezjscore/design/standard/lib/yui/3.17.2/build/yui/yui-min.js
extension/ezjscore/design/standard/javascript/jquery-3.7.0.min.js
extension/community_design/design/suncana/javascript/jquery.ui.core.min.js
extension/community_design/design/suncana/javascript/jquery.ui.widget.min.js
extension/community_design/design/suncana/javascript/jquery.easing.1.3.js
extension/community_design/design/suncana/javascript/jquery.ui.tabs.js
extension/community_design/design/suncana/javascript/jquery.hoverIntent.min.js
extension/community_design/design/suncana/javascript/jquery.popmenu.js
extension/community_design/design/suncana/javascript/jScrollPane.js
extension/community_design/design/suncana/javascript/jquery.mousewheel.js
extension/community_design/design/suncana/javascript/jquery.cycle.all.js
extension/sevenx/design/simple/javascript/jquery.scrollTo.js
extension/community_design/design/suncana/javascript/jquery.cookie.js
extension/community_design/design/suncana/javascript/ezstarrating_jquery.js
extension/community_design/design/suncana/javascript/jquery.initboxes.js
extension/community_design/design/suncana/javascript/app.js
extension/community_design/design/suncana/javascript/twitterwidget.js
extension/community_design/design/suncana/javascript/community.js
extension/community_design/design/suncana/javascript/roadmap.js
extension/community_design/design/suncana/javascript/ez.js
extension/community_design/design/suncana/javascript/ezshareevents.js
extension/sevenx/design/simple/javascript/main.js

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
6content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
6content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
12content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
3content/datatype/view/ezxmltags/line.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/line.tplEdit templateOverride template
4content/datatype/view/ezxmltags/literal.tpl<No override>extension/community/design/standard/templates/content/datatype/view/ezxmltags/literal.tplEdit templateOverride template
1pagelayout.tpl<No override>extension/sevenx/design/simple/templates/pagelayout.tplEdit templateOverride template
 Number of times templates used: 33
 Number of unique templates used: 7

Time used to render debug report: 0.0002 secs