Index Table 500 error even with post limit iteration set to 1

This topic has 1 reply, 2 voices, and was last updated 7 years, 7 months ago by Ernest Marcinko.

Viewing 2 posts - 1 through 2 (of 2 total)

Author

Posts
October 17, 2018 at 11:00 pm #19511

wteweb
Participant

So when I try to create an index table it fails partway through (seemingly at random places each time) with a 500 Server Error. It’s mostly searching through PDF files. Total keywords is pretty huge when it gets close to the end even without being completely done (236k). Sometimes I can resume indexing and it’ll add 3-5, then error out again. The .pdfs can get pretty big, up to 30mb or so, and there’s almost 400 of them. There’s also a ton of Office files, I just haven’t added ZipArchive to the server yet because I can’t even get it to go through the .pdfs, and I figured it would be worse if it had to chew through 3 times as many files.

Tried changing post iteration to 1. Also set the wordpress memory limit to 640mb, added a higher max execution time in my php.ini after seeing an error in the error log that mentioned that (not getting any errors in the error log after doing this). Tried turning off all other plugins, as well as changing the theme. Changed pool sizes down to almost half to see if the amount of keywords was the issue (this actually made it error out sooner).

October 18, 2018 at 11:38 am #19525

Ernest Marcinko
Keymaster

Hi,

The limit is automatically changed to 1 when indexing file contents. I believe the issue is the large PDF files in size, the server probably cannot handle parsing those and simply fails, returning an error 500 message. For fair use, we recommend a maximum size of 30-60 pages of text at maximum, but it is very highly server dependent. The number of files is not an issue, as the limit is always 1 per iteration.
Higher execution time usually helps, but unfortunately a top time limit is nowadays usually hard-coded into the server, and cannot be increased (especially php-fpm and apache/php, as I found this out recently during testing some things).

There is still one thing you should try, changing the PDF parser method from ‘Auto’ to ‘PDF2Txt’, and if does not work, then try ‘Smalot parser’ as well: https://i.imgur.com/dwkhQdK.png
Usually, the PDF2Txt method is used as a secondary, but it seems faster than the other one (and much less accurate as well). This way, no secondary paring is used, saving some resources, as well as bypassing any possible compatibility issues. There is a small chance that it may go through with either of the parsers selected.
Author

Posts

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.