This website uses cookies to personalize your experience. By using this website you agree to our cookie policy.

Reply To: Integrating Apache Tika's text extraction with ASP's Index

Home Forums Product Support Forums Ajax Search Pro for WordPress Support Integrating Apache Tika's text extraction with ASP's Index Reply To: Integrating Apache Tika's text extraction with ASP's Index

#34895
nickchomey18nickchomey18
Participant

Glad to hear that what I’ve done makes sense and that you’re interested in it!

Im very inexperienced with development, so I can’t imagine that I can help you much with making it a one-click config, but it would be great if you are able to figure it out such that the average user doesn’t need to use ssh or ask their host to set it up. Perhaps the relevant libraries can be included directly in the plugin somehow? I suspect not, however.

Or, at the very least, it would be useful if, from the plugin backend, you could set the path to the jar file as well as which file extensions and mime types are to be processed, rather than editing the plugin code directly.

You can also set up Tika as a server, but that was far beyond my capabilities and seemingly unnecessary compared to just running the single jar file upon request.

Anyway, for anyone even slightly technically inclined, or who has a host who is amenable to installing packages on the server, the ssh work isn’t more than a few commands for installing Java, tesseract, tesseract training data, creating a folder for Tika and using wget to download Tika. And then the plugin/snippet automatically processes any files that are uploaded (so long as the file type is included in its code). This includes files uploaded as part of buddypress/buddyboss activity feed and other media collections.

One final thing to note – if using Openlitespeed (and perhaps litespeed Enterprise), running Tika from WordPress creates memory related errors. After considerable debugging with my host/control panel (runcloud) as well as some input from litespeed, the solution was to just change a couple of parameters in the litespeed webapp config. https://forum.openlitespeed.org/threads/ols-is-preventing-wordpress-from-properly-communicating-with-apache-tika.5008/#post-11724

Please let me know if you have any questions about how I’ve configured it!