Hi,
Thank you for the details, it helped a lot.
I tried and downloaded that PDF file to run some tests on it. Some of the content is indeed accessible, but there are lots of invalid/incomplete characters as well, and that conflicted greatly with both parsers.
It took a few hours, but I managed to make it work. It extracted around 3000 words from that document. I still think some words may be missing, but it should be much much better now.
I will make sure to include this improvement in the upcoming release.