This topic contains 1 reply, has 2 voices, and was last updated by
- October 17, 2018 at 11:00 pm #19511Participant
So when I try to create an index table it fails partway through (seemingly at random places each time) with a 500 Server Error. It’s mostly searching through PDF files. Total keywords is pretty huge when it gets close to the end even without being completely done (236k). Sometimes I can resume indexing and it’ll add 3-5, then error out again. The .pdfs can get pretty big, up to 30mb or so, and there’s almost 400 of them. There’s also a ton of Office files, I just haven’t added ZipArchive to the server yet because I can’t even get it to go through the .pdfs, and I figured it would be worse if it had to chew through 3 times as many files.
Tried changing post iteration to 1. Also set the wordpress memory limit to 640mb, added a higher max execution time in my php.ini after seeing an error in the error log that mentioned that (not getting any errors in the error log after doing this). Tried turning off all other plugins, as well as changing the theme. Changed pool sizes down to almost half to see if the amount of keywords was the issue (this actually made it error out sooner).October 18, 2018 at 11:38 am #19525Keymaster
The limit is automatically changed to 1 when indexing file contents. I believe the issue is the large PDF files in size, the server probably cannot handle parsing those and simply fails, returning an error 500 message. For fair use, we recommend a maximum size of 30-60 pages of text at maximum, but it is very highly server dependent. The number of files is not an issue, as the limit is always 1 per iteration.
Higher execution time usually helps, but unfortunately a top time limit is nowadays usually hard-coded into the server, and cannot be increased (especially php-fpm and apache/php, as I found this out recently during testing some things).
There is still one thing you should try, changing the PDF parser method from ‘Auto’ to ‘PDF2Txt’, and if does not work, then try ‘Smalot parser’ as well: https://i.imgur.com/dwkhQdK.pngBest,
Usually, the PDF2Txt method is used as a secondary, but it seems faster than the other one (and much less accurate as well). This way, no secondary paring is used, saving some resources, as well as bypassing any possible compatibility issues. There is a small chance that it may go through with either of the parsers selected.
If you like my products, don't forget to rate them on codecanyon :)
You must be logged in to reply to this topic.