This website uses cookies to personalize your experience. By using this website you agree to our cookie policy.

Reply To: Some issues with Index table search of custom fields

Home Forums Product Support Forums Ajax Search Pro for WordPress Support Some issues with Index table search of custom fields Reply To: Some issues with Index table search of custom fields

#35320
nickchomey18nickchomey18
Participant

Thanks, but there aren’t any keyword exclusions. I have stop words enabled, but I assume that excludes them from the index table to begin with and I’ve confirmed that the words that I’m searching for are in the table (searching for the full word returns a result, after all). (Also, the words that I’m having trouble with are not stop words).

I have a suspicion that the issue has something to do with the fact that there are duplicate similar entries – one of the words in question is “concepts”. As seen in the attached screenshot, there are entries for “concept” and “concepts” for a few different posts. I have no trouble for the posts that have the “concept” term. It is just the two (1179, 1180) that only have “concepts” that won’t return partial matches. But, I’ve looked at various other situations and the results are inconsistent.

Anyway, because of this issue as well as some others that I’ve noticed (e.g. a lot of junk terms in the table with punctuation), I started looking into how you create tokens. I made some modifications that reduced the junk terms (e.g. $str = str_replace( array( “-\n”, “-\r”), “”, $str ); to create one word from when a PDF has a hyphen at the end of a line).

It should be noted that Relevanssi does a CONSIDERABLY better job at tokenizing content than ASP. Their free version is open-source, so perhaps you could look into incorporating some of their code into ASP. Not sure how open-source licensing affects a commercial product…

But I also realized there are plenty of excellent open-source tools out there dedicated to this specific task (and much more), so why try to recreate (a very complex) wheel. So, I am currently looking into integrating a python-based NLP package to generate tokens, stems or even lemmas, as well as store other data that I might make use of in other ways. That would eliminate the “concept/s” issue, reduce table size considerably, and surely improve overall performance.

As with Tika support, I suspect that it would be difficult to include something like this into the ASP plugin, as it requires server-level packages. But it is another thing for you to consider.

So, given all of this, perhaps it is best to put this issue on pause until I’ve made these changes. I’ll report back then.