ezfind : problem with special chars and pdf

Author Message

Romain Bremaud

Tuesday 07 December 2010 1:31:21 am

Hello everybody,

I use the following code for indexing my pdf :

http://share.ez.no/learn/ez-publish/indexing-multiple-binary-file-types/%28page%29/3

This script use the xpdf library http://www.foolabs.com/xpdf/download.html

The problem is when I use the following command line : php updatesearchindexsolr.php -s <admin siteacces> the pdf are indexed but the special chars disappear and are replaced by a white space.

But if I do the same thing with the command line interface : pdftotext example.pdf example.txt It works.

I do not manage to identify why it doesn't work...

Thanks in advance.

Romain Bremaud
Les clefs du net

Ivo Lukac

Tuesday 07 December 2010 1:52:55 am

Hi Romain,

I would recommend using http://projects.ez.no/eztika as it deals with special characters and non-latin alphabets much better than xpdf, in my experience

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

Romain Bremaud

Tuesday 07 December 2010 6:05:28 am

Thanks for your help. It's work with eztika :)

But It was hard to configure it because I work on a window's environnement. But now it's work ^^

Thanks

Romain Bremaud
Les clefs du net

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 18 2025 10:42:07
Script start
Timing: Jan 18 2025 10:42:07
Module start 'layout'
Timing: Jan 18 2025 10:42:07
Module start 'content'
Timing: Jan 18 2025 10:42:08
Module end 'content'
Timing: Jan 18 2025 10:42:08
Script end

Main resources:

Total runtime0.9165 sec
Peak memory usage4,096.0000 KB
Database Queries61

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0099 588.0781152.6406
Module start 'layout' 0.00990.0050 740.718839.5000
Module start 'content' 0.01490.9001 780.2188538.7813
Module end 'content' 0.91500.0015 1,319.000012.1172
Script end 0.9164  1,331.1172 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00330.3547160.0002
Check MTime0.00130.1373160.0001
Mysql Total
Database connection0.00120.135510.0012
Mysqli_queries0.864194.2848610.0142
Looping result0.00080.0890590.0000
Template Total0.864994.420.4325
Template load0.00190.202520.0009
Template processing0.863194.172320.4315
Template load and register function0.00010.014510.0001
states
state_id_array0.00230.245810.0023
state_identifier_array0.00140.158120.0007
Override
Cache load0.00150.1643210.0001
Sytem overhead
Fetch class attribute can translate value0.00050.059020.0003
Fetch class attribute name0.00130.139240.0003
XML
Image XML parsing0.00060.067720.0003
class_abstraction
Instantiating content class attribute0.00000.000940.0000
General
dbfile0.00090.0976170.0001
String conversion0.00000.001340.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
3content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
5content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
2content/datatype/view/ezxmltags/link.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/link.tplEdit templateOverride template
1content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 13
 Number of unique templates used: 6

Time used to render debug report: 0.0001 secs