Reply To: Some posts not shown in results which have special characters

Home Forums Product Support Forums Ajax Search Pro for WordPress Support Some posts not shown in results which have special characters Reply To: Some posts not shown in results which have special characters

#9007

Ernest Marcinko
Keymaster

Hi!

There are multiple reasons why the long dash “–” character is not giving you results in this case.

1. The search instance is using the index table engine, which recognizes non-space-trimmed special characters as word boundaries. For example “xyz – abc” is indexed as “xyz” and “abc” even if the dash character removal is disabled (“xyz-abc” however would be one word). The reason is, that if it was not, then indexing them as whole would only match the beginning words, thus no matching for “abc”.
The tokenization gets much more complicated than that, I rather don't go into more details.
Changing the algorithm would lead to word recognition problems, I'm not sure if it is doable to somehow recognize connected words, and separate words at the same time (if they are separated with spaces as well).

2. The long dash “–” character is stored as short dash in the wordpress database, but it's converted back to long dash when displaying. One possible solution is to add a filter to the search phrase before it's processed to replace long dashes to short dashes. Use it in the functions.php file in the current theme directory:

This should work, but I'm not 100% sure, because different database collations might store the long dash differently.

The reason why exact matches work is because the phrase is not separated into words, but matched as a single term. For example “xyz -” is two words for the database, “xyz” and “-” if exact matches is disabled. However it will not match anything with “-“, because it's not in the index table, as the tokenization algorithm ruled it as word boundary.

Other possibility is to turn back to the regular engine, which will match somethings for “xyz -” even if exact matches are disabled, but still as two words, so the results probably will be less relevant.

Best,
Ernest Marcinko

If you like my products, don't forget to rate them on codecanyon :)