This website uses cookies to personalize your experience. By using this website you agree to our cookie policy.

Reply To: Ignoring diacritical marks in non-Latin scripts during index table generation

Home Forums Product Support Forums Ajax Search Pro for WordPress Support Ignoring diacritical marks in non-Latin scripts during index table generation Reply To: Ignoring diacritical marks in non-Latin scripts during index table generation

#35686
Ernest MarcinkoErnest Marcinko
Keymaster

Hi,

Thank you very much for the details, and the list. I am sorry for the late response.

Normally, the database engine is responsible for vocalization/accent cancellation within matches. This issue turned out to be much more interesting than I thought. Initially I wrote a script to handle the ligatures and such, but upon inserting the data to the database, basically only half of the information was inserted – either the words with the “punctuation” and accent marks, or the ones without them, whichever came first.
At first I thought, that the database simply does not differenciate between the original and the unvocalized versions – which was not true – but still only inserted one version, and treated both as the same words.
Interestingly, searching them did not consider the database as a single word.

Long story short I figured out why, and it was related to specific indexes and how the database treats them – so the only possible solution for this is to remove all of vocalizations and store the keywords that way. Then, when the user does a search, do the same to the input keyword – an voila, everything matches as it should (for most cases).

Try adding this code to the functions.php file in your theme/child theme directory – make sure to have a full server back-up first for safety. For more details you can check the safe coding guidelines.

add_filter('asp_indexing_keywords', 'diacritic_asp_indexing_keywords', 10, 1);
function diacritic_asp_indexing_keywords($keywords) {
	$new_kw_arr = array();
	foreach ( $keywords as $keyword => $arr ) {
		$new_kw = hebrew_unvocalize($keyword);
		if ( $new_kw != '' ) {
			if ( !isset($new_kw_arr[$new_kw]) ) {
				$new_kw_arr[$new_kw] = array($new_kw, 1);
			} else {
				$new_kw_arr[$new_kw][1]++;
			}
		}
	}
	return $new_kw_arr;
}

add_filter('asp_keyword_after_postproc', 'hebrew_unvocalize', 10, 1);
function hebrew_unvocalize( $str ) {
	$hebrew_common_ligatures = array(
		'ײַ' => 'ײ',
		'ﬠ' => 'ע',
		'ﬡ' => 'א',
		'ﬢ' => 'ד',
		'ﬣ' => 'ה',
		'ﬤ' => 'כ',
		'ﬥ' => 'ל',
		'ﬦ' => 'ם',
		'ﬧ' => 'ר',
		'ﬨ' => 'ת',
		'שׁ' => 'ש',
		'שׂ' => 'ש',
		'שּׁ' => 'ש',
		'שּׂ' => 'ש',
		'אַ' => 'א',
		'אָ' => 'א',
		'אּ' => 'א',
		'בּ' => 'ב',
		'גּ' => 'ג',
		'דּ' => 'ד',
		'הּ' => 'ה',
		'וּ' => 'ו',
		'זּ' => 'ז',
		'טּ' => 'ט',
		'יּ' => 'י',
		'ךּ' => 'ך',
		'כּ' => 'כ',
		'לּ' => 'ל',
		'מּ' => 'מ',
		'נּ' => 'נ',
		'סּ' => 'ס',
		'ףּ' => 'ף',
		'פּ' => 'פ',
		'צּ' => 'צ',
		'קּ' => 'ק',
		'רּ' => 'ר',
		'שּ' => 'ש',
		'תּ' => 'ת',
		'וֹ' => 'ו',
		'בֿ' => 'ב',
		'כֿ' => 'כ',
		'פֿ' => 'פ',
		'ﭏ' => 'אל'
	);
	$new_kw = trim( preg_replace('/\p{Mn}/u', '', $str) );
	foreach( $hebrew_common_ligatures as $word1 => $word2 ) {
		$new_kw = trim(str_replace( $word1, $word2, $new_kw ));
	}
	return $new_kw;
}

After adding the code, please re-create the index table (click “Delete index” then “Create new index” buttons).

If all goes well, this should do the trick. Please let me know if this helps at least a tiny bit, as I will include this code then in the next live release.