This website uses cookies to personalize your experience. By using this website you agree to our cookie policy.

Reply To: Search in content file

#25536
Ernest MarcinkoErnest Marcinko
Keymaster

Hi,

Okay, I have checked the index table, and try to debug the extracted contents. There seems to be something wrong with either the PDF encryption, or the parser I am not sure. I tried multiple scripts to get the contents but none of them worked, so it might be some sort of a PDF encoding issue.

Anyways, I noticed that most of text is present, but there are duouble spaces here and there between the words, and some random characters.

There might be a way to bypass that via a custom code, but I am not sure. Try adding this custom code to the functions.php in your theme/child theme directory. Before editing, please make sure to have a full site back-up just in case!

add_filter("asp_indexing_string_pre_process", "asp_fix_indexing_string_pre_process", 10, 1);
function asp_fix_indexing_string_pre_process($s) {
    if ( substr_count($s, "  ") > 10 ) {
        $s = str_replace('  ', '||||', $s);
        $s = str_replace(' ', '', $s);
        $s = str_replace('||||', ' ', $s);
    }
	return $s;
}

Once the code is added, please try to re-create the index table. There is a small chance, that some comlete words will be indexed from the PDF files.