Monday 03 July 2006 1:33:13 am
I managed to solve it this weekend. There were two problems, and in the various configurations I have tried, always one of them appeared, until I tried the right combination! The first problem is a mistake in the documentation. http://ez.no/products/ez_publish/documentation/configuration/configuration/search_engine/configuring_binary_file_indexing mentioned the following code:
[HandlerSettings]
MetaDataExtractor[application/pdf]=pdf
I copied that setting to my binaryfile.ini file, effectively destroying PDF parsing. Of course, I should have left it at the default value:
[HandlerSettings]
MetaDataExtractor[application/pdf]=ezpdf
The second problem I had was related to pdftotext. I've found out that the command used by eZ Publish (pdftotext example.pdf) does not produce any output. To get this to work, I had to modify kernel/classes/datatypes/ezbinaryfile/plugins/ezpdfparser.php:
passthru( "$textExtractionTool $fileName -" );
|