This website uses cookies to personalize your experience. By using this website you agree to our cookie policy.

Reply To: Search not working completely

#28364
Ernest MarcinkoErnest Marcinko
Keymaster

Hi Shivam,

Thank you very much for the details.

I may have found the problem. Some of the PDF files contain double spaces after each word, and random spaces within the words, that causing the high number of keywords, mostly gibberish. I’m not sure if this is something to do with the extraction script, or the PDFs or something else. I have tested a potential solution on our local servers via uploading the problematic files, and got much better results.

1. Try adding this custom code to the functions.php in your theme/child theme directory. Before editing, please make sure to have a full site back-up just in case!

add_filter('asp_indexing_string_pre_process', 'asp_custom_double_char_detection', 10, 1);
function asp_custom_double_char_detection($str) {
	if ( substr_count($str, ' ') > 100 ) {
		$str = str_replace('  ', '||||', $str);
		$str = str_replace(' ', '', $str);
		$str = str_replace('||||', ' ', $str);
	}
	return $str;
}

This code will try detecting high number of duplicate whitespaces, and tries to correct the text.

2. Make sure to re-create the index, so the code has the effect. The keywords count should reduce significantly with much more relevant keywords.

3. Optional, but I strongly recommend this keyword logic for your case: https://i.imgur.com/r47o5Ib.png
Because there are a lot of keywords to get results from, this should improve accuracy greatly.