Monday 01 August 2011 9:07:18 am
eZPublish does have this feature and you should be seeing your pdfs indexed - with a bunch of caveats. What happens is that when a pdf is saved (or you update your search index), the pdf is run through the tool defined by
[PDFHandlerSettings] TextExtractionTool=pstotext in your binaryfile.ini If you don't have this tool on your machine, then your pdfs won't be indexed. If you search for TextExtractionTool or pdftotext in these forums you'll see a couple other possible tools - such as: http://share.ez.no/forums/extensions/ez-find/solr-indexing-error If you have whatever tool you are using and if you're pdfs aren't being indexed, then it probably means that your pdfs aren't structurally text - the content is actually an image (or series of images) saved in the pdf container. It means that you're not going to be able to index using pdftotext - a good test is to run whatever tool you have on the command line against the file that isn't be indexed to see what actually comes out. If nothing comes out you'll have to use some other tool - like eztika (I've never used it) or, something like tesseract to extract the text.
Certified eZPublish developer
http://ez.no/certification/verify/396111
Available for ezpublish troubleshooting, hosting and custom extension development: http://www.leidentech.com
|