Forums / Install & configuration / Using IFilters for indexing binary files

Using IFilters for indexing binary files

Author Message

Jonathan Cutting

Thursday 20 January 2005 1:08:34 pm

Perhaps someone has already covered this but I find no mention of it in the documentation. For those of you working on Windows, a convenient method for indexing binary files is the IFilter mechanism used in Microsoft Indexing Service.

The Microsoft Platform SDK has an executable in the bin directory called FiltDump.exe. It takes the name of a file as an argument and uses the registered IFilter, if any, to print the file's text content to stdout.

For example, the command

filtdump -b test.doc

 

will dump the contents of test.doc to stdout using the IFilter registered for .doc files. The -b switch turns off error messages and other extraneous information. Note that Indexing Service must be installed but it need not be running for this to work.

IFilters for HTML, Word, Excel, Visio, Powerpoint, and plain text are available from Microsoft. An IFilter for PDF is available from Adobe. Others - including StarOffice/OpenOffice, DWG, etc. are available commercially.

Now, I've tried to implement this in ezPublish 3.5.0 (Windows installer version) but without success. I've overridden binaryfile.ini, I've cleared all caches, I've rebuilt the search index manually with the --clean option, and I've marked binary file attributes as searchable in classes of interest. Still no luck.

My binaryfile.ini overrides:

[HandlerSettings]
MetaDataExtractor[application/pdf]=IFilter
MetaDataExtractor[application/msword]=IFilter

[IFilterHandlerSettings]
TextExtractionTool=filtdump -b

I've tried locating filtdump.exe in a number of different places, including in the ezpublish root and in a directory on the system search path. I have no evidence that it's being executed at all. I've also tried making it run a batch script:

@ECHO OFF
ECHO %1
filtdump -b %1

 

Still no luck.

Can someone please help me understand what needs to be done to make this work? Where should filtdump.exe be located? Do Apache or PHP need to be configured any differently? Again, I'm using the basic Windows installer for 3.5.0 - nothing special.

Jonathan

eZ debug

Timing: Jan 19 2025 03:32:41
Script start
Timing: Jan 19 2025 03:32:41
Module start 'content'
Timing: Jan 19 2025 03:32:42
Module end 'content'
Timing: Jan 19 2025 03:32:42
Script end

Main resources:

Total runtime1.0697 sec
Peak memory usage4,096.0000 KB
Database Queries183

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0059 589.1094180.8047
Module start 'content' 0.00590.8542 769.9141415.3281
Module end 'content' 0.86010.2096 1,185.2422333.7734
Script end 1.0697  1,519.0156 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00860.8010210.0004
Check MTime0.00180.1713210.0001
Mysql Total
Database connection0.00080.070310.0008
Mysqli_queries0.993092.83211830.0054
Looping result0.00190.17761810.0000
Template Total1.034296.720.5171
Template load0.00200.185520.0010
Template processing1.032296.488220.5161
Template load and register function0.00020.017610.0002
states
state_id_array0.00070.069210.0007
state_identifier_array0.00120.114820.0006
Override
Cache load0.00150.1422190.0001
Sytem overhead
Fetch class attribute can translate value0.00120.110620.0006
Fetch class attribute name0.00050.049010.0005
XML
Image XML parsing0.00040.037220.0002
class_abstraction
Instantiating content class attribute0.00000.000310.0000
General
dbfile0.00740.6946210.0004
String conversion0.00000.000530.0000
Note: percentages do not add up to 100% because some accumulators overlap

CSS/JS files loaded with "ezjscPacker" during request:

CacheTypePacklevelSourceFiles
CSS0extension/community/design/community/stylesheets/ext/jquery.autocomplete.css
extension/community_design/design/suncana/stylesheets/scrollbars.css
extension/community_design/design/suncana/stylesheets/tabs.css
extension/community_design/design/suncana/stylesheets/roadmap.css
extension/community_design/design/suncana/stylesheets/content.css
extension/community_design/design/suncana/stylesheets/star-rating.css
extension/community_design/design/suncana/stylesheets/syntax_and_custom_tags.css
extension/community_design/design/suncana/stylesheets/buttons.css
extension/community_design/design/suncana/stylesheets/tweetbox.css
extension/community_design/design/suncana/stylesheets/jquery.fancybox-1.3.4.css
extension/bcsmoothgallery/design/standard/stylesheets/magnific-popup.css
extension/sevenx/design/simple/stylesheets/star_rating.css
extension/sevenx/design/simple/stylesheets/libs/fontawesome/css/all.min.css
extension/sevenx/design/simple/stylesheets/main.v02.css
extension/sevenx/design/simple/stylesheets/main.v02.res.css
JS0extension/ezjscore/design/standard/lib/yui/3.17.2/build/yui/yui-min.js
extension/ezjscore/design/standard/javascript/jquery-3.7.0.min.js
extension/community_design/design/suncana/javascript/jquery.ui.core.min.js
extension/community_design/design/suncana/javascript/jquery.ui.widget.min.js
extension/community_design/design/suncana/javascript/jquery.easing.1.3.js
extension/community_design/design/suncana/javascript/jquery.ui.tabs.js
extension/community_design/design/suncana/javascript/jquery.hoverIntent.min.js
extension/community_design/design/suncana/javascript/jquery.popmenu.js
extension/community_design/design/suncana/javascript/jScrollPane.js
extension/community_design/design/suncana/javascript/jquery.mousewheel.js
extension/community_design/design/suncana/javascript/jquery.cycle.all.js
extension/sevenx/design/simple/javascript/jquery.scrollTo.js
extension/community_design/design/suncana/javascript/jquery.cookie.js
extension/community_design/design/suncana/javascript/ezstarrating_jquery.js
extension/community_design/design/suncana/javascript/jquery.initboxes.js
extension/community_design/design/suncana/javascript/app.js
extension/community_design/design/suncana/javascript/twitterwidget.js
extension/community_design/design/suncana/javascript/community.js
extension/community_design/design/suncana/javascript/roadmap.js
extension/community_design/design/suncana/javascript/ez.js
extension/community_design/design/suncana/javascript/ezshareevents.js
extension/sevenx/design/simple/javascript/main.js

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
1content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
4content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
3content/datatype/view/ezxmltags/literal.tpl<No override>extension/community/design/standard/templates/content/datatype/view/ezxmltags/literal.tplEdit templateOverride template
1pagelayout.tpl<No override>extension/sevenx/design/simple/templates/pagelayout.tplEdit templateOverride template
 Number of times templates used: 10
 Number of unique templates used: 5

Time used to render debug report: 0.0001 secs