Ernest Marcinko


Before any of the fields is split up to keywords, it goes through a very long pre-processing method, which removes any suspicios characters, tags, deals with shortcodes as well as HTML tags. In fact, there are two methods applied to the content, stripping down HTML tags, perserving as much relevant content as possible – this should prevent any HTML related content from appearing within the database.

Problems however may arise when there are any unclosed or unconventional tags, which may be hard to deal with. Changing the options will very likely not help in this case.
I’ve put however a filter just before the content is tokenized, so additional functions can be applied before the indexing process. You could try placing this function to the functions.php file in your active theme directory (copy from line 3):

Make sure to re-create the index after this code is applied. This code will forcefully strip any HTML tags before the tokenization process, using the PHP built in strip_tags() function.

IMPORTANT: Before making any direct changes in your site files, please make sure you have a back-up of everything. To edit the files, I highly recommend using an (s)FTP client, instead of the internal file editor in WordPress – as any tiny mistake there can potentially lead to a site error.

