Indexing ALL ms office files ?

Author Message

Jean-Yves Zinsou

Friday 13 March 2009 1:48:51 am

Hello dear ezcommunity,
i have been searching around and was not able to find a clear answer to my quesion.

Can someone tell me if they have succeeded in indexing all msoffice files including docx files? Or show me some track to follow?

Note: the eZpublish will be hosted on my client server running windows :-(

Do Androids Dream of Electric Sheep?
I dream of eZpubliSheep....
------------------------------------------------------------------------
http://www.alma.fr

Paul Borgermans

Friday 13 March 2009 8:34:32 am

You may have a look at http://projects.ez.no/eztika

There are currently some problems for CJK documents though

hth
Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Jean-Yves Zinsou

Friday 13 March 2009 8:47:13 am

Thanks a lot Paul,
what does CJK mean ?

Do Androids Dream of Electric Sheep?
I dream of eZpubliSheep....
------------------------------------------------------------------------
http://www.alma.fr

Paul Borgermans

Friday 13 March 2009 9:53:20 am

There are some known issues with CJK = Chinese, Japanese, Korean font sets, probably all asian languages (just tested CJK for now)

For pdf indexing CJK, best use xpdf and use a wrapper script/.bat that you configure in binaryfile.ini with the following content:

<path to>pdftotext -enc "UTF-8" $1 -

hth
Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Jean-Yves Zinsou

Friday 13 March 2009 10:25:11 am

Thanks a lot Paul ,

You made my day !! ;-)

Do Androids Dream of Electric Sheep?
I dream of eZpubliSheep....
------------------------------------------------------------------------
http://www.alma.fr

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 18 2025 15:59:34
Script start
Timing: Jan 18 2025 15:59:34
Module start 'layout'
Timing: Jan 18 2025 15:59:34
Module start 'content'
Timing: Jan 18 2025 15:59:34
Module end 'content'
Timing: Jan 18 2025 15:59:34
Script end

Main resources:

Total runtime0.0154 sec
Peak memory usage2,048.0000 KB
Database Queries3

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0059 588.0313152.6406
Module start 'layout' 0.00590.0025 740.671939.4766
Module start 'content' 0.00840.0047 780.148493.3516
Module end 'content' 0.01310.0022 873.500038.3047
Script end 0.0153  911.8047 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.002616.8024140.0002
Check MTime0.00128.1143140.0001
Mysql Total
Database connection0.00127.545110.0012
Mysqli_queries0.002113.674230.0007
Looping result0.00000.091510.0000
Template Total0.001912.110.0019
Template load0.00106.699910.0010
Template processing0.00085.347510.0008
Override
Cache load0.00063.891210.0006
General
dbfile0.00074.863680.0001
String conversion0.00000.051240.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 1
 Number of unique templates used: 1

Time used to render debug report: 0.0001 secs