How to strip HTML tags

This topic contains 3 replies, has 2 voices, and was last updated by Ernest Marcinko Ernest Marcinko 7 years, 1 month ago.

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • #12250
    Federico
    Federico
    Participant

    Hello! I noticed a little problem while tinkering with my site.
    In the documentation it says “By default every HTML tag except the ones declared in the field above are going to be removed.”

    My content is full of HTML tags, such as

    "a href", "/a", "sup", "/sup"

    etc.
    If, for example, I search “sup”, I get all the results with the text “sup” in the searched fields, but also those with the HTML tag “<sup>”, which is not the intended behaviour.
    Is it possibile to exclude from the search those tags without causing a commotion? I tried to use the “key exceptions” section, but to no avail. I don’t know if this code you gave to me to strip apostrophes is interfering:

    add_filter('asp_search_phrase_before_cleaning', 'asp_replace_characters', 10, 1);
    add_filter('asp_query_args', 'asp_replace_characters', 10, 1);
    function asp_replace_characters( $s ) {
      $characters = &amp;quot;‘’“”'\&amp;quot;&amp;quot;; // Type characters one after another
      $replace_with = ' ';     // Replace them with this (space by default)
      if ( is_array($s) ) {
        if ( isset($s['s']) &amp;amp;amp;&amp;amp;amp; !$s['_ajax_search'] ) 
          $s['s'] = str_replace(str_split($characters), $replace_with, $s['s']);      
      } else {
        $s = str_replace(str_split($characters), $replace_with, $s);
      }
      return $s; 
    }

    It’s not really a bother, actually, but I can’t seem to understand why it’s happening.

    Thank you,

    Federico

    • This topic was modified 7 years, 1 month ago by Federico Federico.
    • This topic was modified 7 years, 1 month ago by Federico Federico.
    • This topic was modified 7 years, 1 month ago by Federico Federico.
    • This topic was modified 7 years, 1 month ago by Ernest Marcinko Ernest Marcinko.
    #12271
    Ernest Marcinko
    Ernest Marcinko
    Keymaster

    Hi Federico,

    That option refers to the output of the search results.

    On database level the content field is seen as “raw” text, so the query has no way of telling if is comparing the input to HTML or non-HTML content. This and many other factors make searching in WordPress very hard, generally speaking.

    However, there is a secondary engine built in to the plugin, which requires a bit of configuration. What it does, is that it makes a separate database and pre-processes all the selected content, and tries to recognize words, but remove all other unneccessary content (like HTML tags, special characters etc..). Then puts these words to the database separately, with calculated relevance values based on occurences and such. It’s not perfect of course, but it removes most of the unwanted content effectively. Please check the following sections of the documentation for more information on how to use it:
    Index table engine introduction
    Configuring and Generating the index table
    Enabling the index table engine

    Best,
    Ernest Marcinko

    If you like my products, don't forget to rate them on codecanyon :)


    #12273
    Federico
    Federico
    Participant

    The index engine and the stop words section solved my problem.
    Thank you very much!

    Federico

    #12281
    Ernest Marcinko
    Ernest Marcinko
    Keymaster
    You cannot access this content. Best,
    Ernest Marcinko

    If you like my products, don't forget to rate them on codecanyon :)


Viewing 4 posts - 1 through 4 (of 4 total)

The topic ‘How to strip HTML tags’ is closed to new replies.