Some posts not shown in results which have special characters

Home Forums Product Support Forums Ajax Search Pro for WordPress Support Some posts not shown in results which have special characters

This topic contains 1 reply, has 2 voices, and was last updated by Ernest Marcinko Ernest Marcinko 7 years, 10 months ago.

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #8976
    kognitoweb
    kognitoweb
    Participant

    Hi,

    Ive realized a bug with characters like “–” when typed in search field, the result doesnt show up.
    For example “Islam – Kultur – Politik” as shown in the attachment.

    The problem first occured with posts with “.” or “,” and when I switched on “Exact matches only” these posts showed in the search results. However this still didnt help with “–”. Is this a bug or can you find out what is the problem? I would rather not activate “Exact matches” because I wanted to use Indexed search, would be great if the search works also that way.
    You can check the titles in the section “Publikationen” as examples, there we have a lot of post titles with different characters:
    http://www.kulturrat.de/publikationen/

    Thanks!

    Attachments:
    You must be logged in to view attached files.
    #9007
    Ernest Marcinko
    Ernest Marcinko
    Keymaster

    Hi!

    There are multiple reasons why the long dash “–” character is not giving you results in this case.

    1. The search instance is using the index table engine, which recognizes non-space-trimmed special characters as word boundaries. For example “xyz – abc” is indexed as “xyz” and “abc” even if the dash character removal is disabled (“xyz-abc” however would be one word). The reason is, that if it was not, then indexing them as whole would only match the beginning words, thus no matching for “abc”.
    The tokenization gets much more complicated than that, I rather don’t go into more details.
    Changing the algorithm would lead to word recognition problems, I’m not sure if it is doable to somehow recognize connected words, and separate words at the same time (if they are separated with spaces as well).

    2. The long dash “–” character is stored as short dash in the wordpress database, but it’s converted back to long dash when displaying. One possible solution is to add a filter to the search phrase before it’s processed to replace long dashes to short dashes. Use it in the functions.php file in the current theme directory:

    This should work, but I’m not 100% sure, because different database collations might store the long dash differently.

    The reason why exact matches work is because the phrase is not separated into words, but matched as a single term. For example “xyz -” is two words for the database, “xyz” and “-” if exact matches is disabled. However it will not match anything with “-“, because it’s not in the index table, as the tokenization algorithm ruled it as word boundary.

    Other possibility is to turn back to the regular engine, which will match somethings for “xyz -” even if exact matches are disabled, but still as two words, so the results probably will be less relevant.

    Best,
    Ernest Marcinko

    If you like my products, don't forget to rate them on codecanyon :)


Viewing 2 posts - 1 through 2 (of 2 total)

The topic ‘Some posts not shown in results which have special characters’ is closed to new replies.