Author
|
Message
|
kracker (the)
|
Monday 23 October 2006 10:41:12 am
I have seen an error attempting to follow the article: <i>http://ez.no/layout/set/printarticle/community/articles/indexing_multiple_binary_file_types</i> <b>Settings: Binary File : binaryfile.ini.append.php</b> # cat settings/override/binaryfile.ini.append.php
# Here you can add handlers for new datatypes.
[HandlerSettings]
MetaDataExtractor[text/plain]=plaintext
MetaDataExtractor[application/pdf]=binaryfile
MetaDataExtractor[application/msword]=binaryfile
MetaDataExtractor[application/vnd.ms-excel]=binaryfile
MetaDataExtractor[application/vnd.ms-powerpoint]=binaryfile
# The full path to your log file (used for debugging/testing)
[BinaryFileHandlerSettings]
LogFile=var/log/index.log
Using eZ publish 3.7.4 Any suggestion or insight into following search index update <i>fatal</i> errors?
//kracker <i>tv : theme : scrubs</i> <b>Attempt #1</b> # /usr/local/php4.4.4/bin/php -d memory_limit=456M -C update/common/scripts/updatesearchindex.php -s example_com-dev --logfiles --verbose --sql --db-user=db --db-password=db --db-database=example_com_dev_376
Using siteaccess example_com-dev for search index update
Starting object re-indexing
Number of objects to index: 5467
...................................................................... 1.280409731114%
...................................................................... 2.5608194622279%
...................................................................... 3.8412291933419%
...................................................................... 5.1216389244558%
...................................................................... 6.4020486555698%
...................................................................... 7.6824583866837%
...................................................................... 8.9628681177977%
...................................................................... 10.243277848912%
...................................................................... 11.523687580026%
...................................................................... 12.80409731114%
...................................................................... 14.084507042254%
...................................................................... 15.364916773367%
...................................................................... 16.645326504481%
...................................................................... 17.925736235595%
...................................................................... 19.206145966709%
...................................................................... 20.486555697823%
...................................................................... 21.766965428937%
...................................................................... 23.047375160051%
...................................................................... 24.327784891165%
...................................................................... 25.608194622279%
...................................................................... 26.888604353393%
...................................................................... 28.169014084507%
...................................................................... 29.449423815621%
...................................................................... 30.729833546735%
...................................................................... 32.010243277849%
...................................................................... 33.290653008963%
...................................................................... 34.571062740077%
...................................................................... 35.851472471191%
...................................................................... 37.131882202305%
...................................................................... 38.412291933419%
...................................................................... 39.692701664533%
...................................................................... 40.973111395647%
...................................................................... 42.253521126761%
...................................................................... 43.533930857875%
...................................................................... 44.814340588988%
...................................................................... 46.094750320102%
...................................................................... 47.375160051216%
...................................................................... 48.65556978233%
...................................................................... 49.935979513444%
...................................................................... 51.216389244558%
...................................................................... 52.496798975672%
...................................................................... 53.777208706786%
...................................................................... 55.0576184379%
...................................................................... 56.338028169014%
...................................................................... 57.618437900128%
...................................................................... 58.898847631242%
...................................................................... 60.179257362356%
...................................................................... 61.45966709347%
...................................................................... 62.740076824584%
...................................................................... 64.020486555698%
...................................................................... 65.300896286812%
...................................................................... 66.581306017926%
...................................................................... 67.86171574904%
...................................................................... 69.142125480154%
...................................................................... 70.422535211268%
...................................................................... 71.702944942382%
...................................................................... 72.983354673496%
...................................................................... 74.263764404609%
...................................................................... 75.544174135723%
...................................................................... 76.824583866837%
...................................................................... 78.104993597951%
...................................................................... 79.385403329065%
...................................................................... 80.665813060179%
...................................................................... 81.946222791293%
...................................................................... 83.226632522407%
...................................................................... 84.507042253521%
...................................................................... 85.787451984635%
...................................................................... 87.067861715749%
...............PHP Fatal error: Call to a member function on a non-object in /web/dev/example.com/ezpublish-3.7.6/kernel/search/plugins/ezsearchengine/ezsearchengine.php on line 73
Fatal error: Call to a member function on a non-object in /web/dev/example.com/ezpublish-3.7.6/kernel/search/plugins/ezsearchengine/ezsearchengine.php on line 73
Fatal error: eZ publish did not finish its request
The execution of eZ publish was abruptly ended, the debug output is present below.
# date
Mon Oct 23 10:05:02 CDT 2006
<b>Attempt #2</b> # /usr/local/php4.4.4/bin/php -d memory_limit=456M -C update/common/scripts/updatesearchindex.php -s example_com-dev --logfiles --verbose --sql --db-user=db --db-password=db --db-database=example_com_dev_376
Using siteaccess example_com-dev for search index update
Starting object re-indexing
Number of objects to index: 5467
...................................................................... 1.280409731114%
...................................................................... 2.5608194622279%
...................................................................... 3.8412291933419%
...................................................................... 5.1216389244558%
...................................................................... 6.4020486555698%
...................................................................... 7.6824583866837%
...................................................................... 8.9628681177977%
...................................................................... 10.243277848912%
...................................................................... 11.523687580026%
...................................................................... 12.80409731114%
...................................................................... 14.084507042254%
...................................................................... 15.364916773367%
...................................................................... 16.645326504481%
...................................................................... 17.925736235595%
...................................................................... 19.206145966709%
...................................................................... 20.486555697823%
...................................................................... 21.766965428937%
...................................................................... 23.047375160051%
...................................................................... 24.327784891165%
...................................................................... 25.608194622279%
...................................................................... 26.888604353393%
...................................................................... 28.169014084507%
...................................................................... 29.449423815621%
...................................................................... 30.729833546735%
...................................................................... 32.010243277849%
...................................................................... 33.290653008963%
...................................................................... 34.571062740077%
...................................................................... 35.851472471191%
...................................................................... 37.131882202305%
...................................................................... 38.412291933419%
...................................................................... 39.692701664533%
...................................................................... 40.973111395647%
...................................................................... 42.253521126761%
...................................................................... 43.533930857875%
...................................................................... 44.814340588988%
...................................................................... 46.094750320102%
...................................................................... 47.375160051216%
...................................................................... 48.65556978233%
...................................................................... 49.935979513444%
.....................................................................Segmentation fault
# date
Mon Oct 23 09:05:02 CDT 2006
Member since: 2001.07.13 || http://ezpedia.se7enx.com/
|
kracker (the)
|
Monday 23 October 2006 10:41:51 pm
Might anyone have a suggestion on where I might need to change setting to enable further debug information to deduce the cause of the error? The end result .. crash (of unclear cause) seems very similar to the one described in the article. <i>"In addition, we found that both pstotext and xpdf caused the same problem with very large PDF files - the SQL INSERT statement got too large (this is related to the eZ publish indexer, not the parsing tools) and would crash the indexer, resulting in no further content being indexed. Addressing that, in addition to handling multiple binary types in one file, led us to write our own custom parsing plugin."</i>
//kracker <i>tv : robot.chicken</i>
Member since: 2001.07.13 || http://ezpedia.se7enx.com/
|
kracker (the)
|
Tuesday 24 October 2006 10:06:42 am
<b>Questions</b> 1. <i>The eZ publish 'var/' size question</i> - Does the process segfault because because of the 3500 MB of PDF, XLS, PPT, DOC files in this installation's 'var/' directory? 2. <i>The eZ publish 'version' question</i> - Does the process segfault because I'm trying to use 3.7.4 instead the very latest 3.8.x ? 3. <i>The eZ publish 'var/' file character contents question</i> - Does the process segfault because of the character content?
//kracker <i>KMK - Koast II Koast - I've Had It</i>
Member since: 2001.07.13 || http://ezpedia.se7enx.com/
|
Paul Borgermans
|
Tuesday 24 October 2006 2:19:51 pm
Hi Cracker Now you have a a nut to crack ;-)
(referring to your last post)
1: no
2: no 3: maybe, but unlikely Try to adapt the updatesearch index script as I've done for the lucene plugin: this will give you a clue where the crash happened (in terms of object to index): Dowload the file http://pubsvn.ez.no/community/trunk/extension/lucene/trunk/bin/php/updatesearchindexlucene.php Look for the line
print ("Indexing " . $innerNode->attribute( 'name' ) . " with node id " . $innerNode->attribute( 'node_id' ) . " and url_alias " . $innerNode->attribute( 'url_alias' ) . $endl );
And add this before the eZSearch::addObject( $object ); line of your update script or a copy of it). I actually identified even corrupt objects/nodes in particular sites with this update script when using the lucene plugin (which calls every node placement and attribute of objects: it is also a unique debugging tool for corrupted databases indeed :-)) hth Paul
eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans
|
kracker (the)
|
Wednesday 25 October 2006 8:45:48 pm
<i>@Paul</i> Heh (nice quote), as always I thank you for the very helpful direction. <i>@Everyone</i> After taking a closer look into the eZ publish error log, <i>var/log/error.log</i> I found the following error which gave a description and the final query. <i>"Query error: MySQL server has gone away. Query: INSERT INTO ......."</i>
<i>http://www.phpmyvisites.us/faq/message-error-mysql-should-change-mysql-configuration-51.html</i>
<i>http://www.oooforum.org/forum/viewtopic.phtml?p=180203</i>
<i>http://dev.mysql.com/doc/refman/4.1/en/gone-away.html</i> <i>http://dev.mysql.com/doc/refman/4.1/en/server-system-variables.html</i> Yet seemingly even after changing a popular mysql configuration setting the problem persists. I added the setting, max_allowed_packet = 750M.
Cheers, //kracker <i>http://www.imdb.com/title/tt0105977/</i>
Member since: 2001.07.13 || http://ezpedia.se7enx.com/
|
Felipe Jaramillo
|
Wednesday 01 November 2006 7:13:10 pm
Hi kracker, What a coincidence, just today Ive been doing some tests and it seems i'm having the same problem. Im trying to upload a trimmed text version of the Php manual which is ~750K to a test installation of eZ 3.7.4.
1. If i try to upload without DelayedIndexing, i get a blank page and nothing gets indexed.
2. After enabling delayed indexing, the file uploads correctly. 3. If i run 'php runcronjobs.php' it goes to the new object, but stays there forever.
Running cronjobs/unpublish.php
Running cronjobs/rssimport.php
Running cronjobs/indexcontent.php
Starting processing pending search engine modifications
Indexing object ID #94
I tried using the longer 'update/common/scripts/updatesearchindex.php' and i get:
Using siteaccess shop_admin for search index update
Starting object re-indexing
Number of objects to index: 34
..................Segmentation fault
Yet, when I check, it DOES manage to index at least some part of the file (i can search succsessfully for words and phrases that appear even at the end of the document) I'm sure we'll find the solution somehow. By the way, the article on indexing_multiple_binary_file_types is very good. Regards, Felipe
I tried changing the mysql conf: set-variable=max_allowed_packet=150M But it seems to make no difference.
Felipe Jaramillo
eZ Certified Extension Developer
http://www.aplyca.com | Bogotá, Colombia
|
kracker (the)
|
Wednesday 01 November 2006 10:40:18 pm
<i>@Felipe</i> Hey Felipe! Thanks for testing and reporting back your findings on this subject. I really do appreciate it. I'm certain others have had these kinds of problems in the past and certainly in the future. Now's the time to weigh in and talk about it. I've narrowed the problem a bit between eZ -> DB (MySQL) and most of my test results were similar to the ones you report. I am working now on taking a different, new look at this very subject again here in the near future. I will have to report back my findings and end-game solution. The answers are here right now, looking at us clearly in the eye, waiting to be seen ... so naturally I'm looking towards the solution in front of me.
<i>Cheers, //kracker /me says on irc://freenode.net #ezpublish, `the documentation defines the solution ....`</i> KMK - In God We Trust (feat. Rude Boy)</i>
Member since: 2001.07.13 || http://ezpedia.se7enx.com/
|
Felipe Jaramillo
|
Thursday 02 November 2006 7:46:20 am
Hi kracker, After further tests I can confirm the same behaviour in 3.7.5 on Windows XP. I don't really see the answer that close. Seems that we'd have to look into the indexing code more carefully. I did manage to get the html file indexing working using a modified copy of the word indexer and using "lynx --dump" as the command to execute. Regards, Felipe
Felipe Jaramillo
eZ Certified Extension Developer
http://www.aplyca.com | Bogotá, Colombia
|
kracker (the)
|
Monday 06 November 2006 4:30:22 pm
<i>It's a bug ... from a certain point of view ...</i> I'll update again once I get just a little closer.
I found today, randomly browsing an issue entry which read to me surprisingly similar. <i>http://issues.ez.no/IssueView.php?Id=9300&activeItem=2</i> I need to test this further first to see what if any results it yields. A question passed my way today. Would I rather have a slower more reliable search (avoid bug via code) or fix the real reason for the bug (in php). What would you chose to solve?
//kracker
<i>intristicly - phone calls from the past unannounced ...</i> <i>ps. damn, references are money makers!</i>
Member since: 2001.07.13 || http://ezpedia.se7enx.com/
|
kracker (the)
|
Monday 06 November 2006 4:47:22 pm
Well... The results were more or less the same, I expected it to die before 90% and it last failed around 83%, this time it completed 85%. The following is the history of the attempt with <i>xdebug</i> enabled.
# /web/ini/bin/build_ez_search_index_for_example_com.sh
Using siteaccess example-dev for search index update
Starting object re-indexing
{eZSearchEngine: Cleaning up search data}
Number of objects to index: 5465
...................................................................... 1.2808783165599%
...................................................................... 2.5617566331199%
...................................................................... 3.8426349496798%
<snip />
...................................................................... 80.695333943275%
...................................................................... 81.976212259835%
...................................................................... 83.257090576395%
...................................................................... 84.537968892955%
...................................................................... 85.818847209515%
............................
Notice: Uninitialized string offset: 0 in /web/dev/example_com/ezpublish-3.8.5/lib/ezxml/classes/ezxml.php on line 178
Call Stack:
0.0009 55024 1. {main}() /web/dev/example_com/ezpublish-3.8.5/update/common/scripts/updatesearchindex.php:0
1256.9776 12000000 2. ezsearch::addobject() /web/dev/example_com/ezpublish-3.8.5/update/common/scripts/updatesearchindex.php:165
1256.9778 12000048 3. ezsearchengine->addobject() /web/dev/example_com/ezpublish-3.8.5/kernel/classes/ezsearch.php:87
1256.9960 12025624 4. ezcontentobjectattribute->metadata() /web/dev/example_com/ezpublish-3.8.5/kernel/search/plugins/ezsearchengine/ezsearchengine.php:85
1256.9961 12025720 5. ezxmltexttype->metadata() /web/dev/example_com/ezpublish-3.8.5/kernel/classes/ezcontentobjectattribute.php:1134
1256.9963 12026520 6. ezxml->domtree() /web/dev/example_com/ezpublish-3.8.5/kernel/classes/datatypes/ezxmltext/ezxmltexttype.php:344
Notice: Uninitialized string offset: -1 in /web/dev/example_com/ezpublish-3.8.5/lib/ezxml/classes/ezxml.php on line 256
Call Stack:
0.0009 55024 1. {main}() /web/dev/example_com/ezpublish-3.8.5/update/common/scripts/updatesearchindex.php:0
1256.9776 12000000 2. ezsearch::addobject() /web/dev/example_com/ezpublish-3.8.5/update/common/scripts/updatesearchindex.php:165
1256.9778 12000048 3. ezsearchengine->addobject() /web/dev/example_com/ezpublish-3.8.5/kernel/classes/ezsearch.php:87
1256.9960 12025624 4. ezcontentobjectattribute->metadata() /web/dev/example_com/ezpublish-3.8.5/kernel/search/plugins/ezsearchengine/ezsearchengine.php:85
1256.9961 12025720 5. ezxmltexttype->metadata() /web/dev/example_com/ezpublish-3.8.5/kernel/classes/ezcontentobjectattribute.php:1134
1256.9963 12026520 6. ezxml->domtree() /web/dev/example_com/ezpublish-3.8.5/kernel/classes/datatypes/ezxmltext/ezxmltexttype.php:344
Notice: Uninitialized string offset: -1 in /web/dev/example_com/ezpublish-3.8.5/lib/ezxml/classes/ezxml.php on line 342
Call Stack:
0.0009 55024 1. {main}() /web/dev/example_com/ezpublish-3.8.5/update/common/scripts/updatesearchindex.php:0
1256.9776 12000000 2. ezsearch::addobject() /web/dev/example_com/ezpublish-3.8.5/update/common/scripts/updatesearchindex.php:165
1256.9778 12000048 3. ezsearchengine->addobject() /web/dev/example_com/ezpublish-3.8.5/kernel/classes/ezsearch.php:87
1256.9960 12025624 4. ezcontentobjectattribute->metadata() /web/dev/example_com/ezpublish-3.8.5/kernel/search/plugins/ezsearchengine/ezsearchengine.php:85
1256.9961 12025720 5. ezxmltexttype->metadata() /web/dev/example_com/ezpublish-3.8.5/kernel/classes/ezcontentobjectattribute.php:1134
1256.9963 12026520 6. ezxml->domtree() /web/dev/example_com/ezpublish-3.8.5/kernel/classes/datatypes/ezxmltext/ezxmltexttype.php:344
.......................................... 87.099725526075%
.............
Fatal error: Call to a member function on a non-object in /web/dev/example_com/ezpublish-3.8.5/kernel/search/plugins/ezsearchengine/ezsearchengine.php on line 74
Call Stack:
0.0009 55024 1. {main}() /web/dev/example_com/ezpublish-3.8.5/update/common/scripts/updatesearchindex.php:0
1266.4872 12292752 2. ezsearch::addobject() /web/dev/example_com/ezpublish-3.8.5/update/common/scripts/updatesearchindex.php:165
1266.4873 12292800 3. ezsearchengine->addobject() /web/dev/example_com/ezpublish-3.8.5/kernel/classes/ezsearch.php:87
Fatal error: eZ publish did not finish its request
The execution of eZ publish was abruptly ended, the debug output is present below.
//kracker
<i>if you don't use xdebug... then you don't know ...</i>
Member since: 2001.07.13 || http://ezpedia.se7enx.com/
|
Felipe Jaramillo
|
Tuesday 07 November 2006 2:51:11 pm
Ive looked into the search engine indexing code and found two things which may cause the problem: \kernel\search\plugins\ezsearchengine
LINE 105: $wordArray = split( " ", $text );
This splits the text all at once. In my experience this takes a long, long time. I guess maybe it would be better to do it in different parts. The place where it breaks is when words are indexed in 1000 word groups. i think it goes up to 127.000 words before breaking:
LINE 129:
$db =& eZDB::instance();
$db->begin();
for( $arrayCount = 0; $arrayCount < $wordCount; $arrayCount += 1000 )
{
$placement = $this->indexWords( $contentObject, array_slice( $indexArray, $arrayCount, 1000 ), $wordIDArray, $placement );
}
$db->commit();
I did try putting the db->begin() and db->commit() inside the for loop, to favour several transactions instead of a huge one, but it didn't help. On windows, PHP crashes after reaching the 127.000th word. I upload the complete PHP manual in html which is about 12 MB and then do indexing through the shell. It's interesting to see that the text gets stripped off the html tags:
Line 94:
$text = eZSearchEngine::normalizeText( strip_tags( $metaDataPart['text'] ), true );
So this way, you don't need to do a 'lynx --dump' to index html files as they will get stripped of html tags anyway. Therefore to index html, one could use the plaintext plugin:
settings/override/binaryfile.ini.append.php
[HandlerSettings]
MetaDataExtractor[text/html]=plaintext
Hope this helps a bit. Regards, Felipe
Felipe Jaramillo
eZ Certified Extension Developer
http://www.aplyca.com | Bogotá, Colombia
|
Kristian Hole
|
Tuesday 07 November 2006 3:59:07 pm
Felipe: I did a test with the PHP-manual like you did. I get a segmentation fault in PHP as you did. This is caused by a reference count overflow in PHP. You can see the description here: http://ilia.ws/archives/5-Top-10-ways-to-crash-PHP.html I have been provided with a patch for PHP which tells me exactly which variable that is causing the reference count overflow and results in PHP segfaulting. I am currently looking into getting help from PHP-core-developers to interpret the result, and possibly making a modification to the script that avoids the problem.
Kristian
http://ez.no/ez_publish/documenta...tricks/show_which_templates_are_used
http://ez.no/doc/ez_publish/techn...te_operators/miscellaneous/attribute
|
Felipe Jaramillo
|
Thursday 09 November 2006 3:08:02 pm
Hi Kristian, Thanks for your input. I look forward to the results of your research as this is an essential feature in eZ. Felipe
Felipe Jaramillo
eZ Certified Extension Developer
http://www.aplyca.com | Bogotá, Colombia
|
Kristian Hole
|
Wednesday 15 November 2006 8:33:46 am
Update on the subject: http://lists.ez.no/pipermail/sdk-public/2006-November/002463.html
Kristian
http://ez.no/ez_publish/documenta...tricks/show_which_templates_are_used
http://ez.no/doc/ez_publish/techn...te_operators/miscellaneous/attribute
|
Brookins Consulting
|
Saturday 14 April 2007 9:27:08 am
Greetings Felipe! Brookins Consulting has recently released a new eZ Publish extension which provides scripts which will successfully index your eZ publish content objects without errors via an alternative replacement for the stock eZ publish search index update shell scripts.
Project, <i>http://projects.ez.no/updatesearchindex</i>
Release News, <i>http://projects.ez.no/updatesearchindex/news/updatesearchindex_extension_0_0_1_released</i> Download extension, <i>http://projects.ez.no/content/download/757/2989/version/2/file/updatesearchindex-0_0_1.tar.gz</i> Feel free to ask further questions about this solution within the project forum, <i>http://projects.ez.no/updatesearchindex/forum/general/updatesearchindex_forum#msg451</i>
eZ Partner | North American Experience
http://brookinsconsulting.com/experience
|