This website uses cookies to personalize your experience. By using this website you agree to our cookie policy.

Reply To: Some issues with Index table search of custom fields

Home Forums Product Support Forums Ajax Search Pro for WordPress Support Some issues with Index table search of custom fields Reply To: Some issues with Index table search of custom fields

#35366
nickchomey18nickchomey18
Participant

When I say junk, I mean thousands of strings that contain punctuation. It was something like 2000/10000 rows that were junk – none of which were in the Relevanssi index table.

Colons “: ” are not removed from the end of words, along with the other modifications I shared above, including dashes and hyphens. There’s also a lot with [brackets]/(parentheses) that I haven’t yet dealt with – those will never be matched by the search given that it only matches the start or end of a word.

Again, Relevanssi’s TAIKASANA method works perfectly for all of this, so I strongly recommend you check out their code – it is much more effective. I’ve included it here:

function relevanssi_remove_punct( $a ) {
	if ( ! is_string( $a ) ) {
		// In case something sends a non-string here.
		return '';
	}

	$a = preg_replace( '/<(\d|\s)/', '\1', $a );
	$a = html_entity_decode( $a, ENT_QUOTES );
	$a = relevanssi_strip_all_tags( $a );

	$punct_options = get_option( 'relevanssi_punctuation' );

	$hyphen_replacement = ' ';
	$endash_replacement = ' ';
	$emdash_replacement = ' ';
	if ( isset( $punct_options['hyphens'] ) && 'remove' === $punct_options['hyphens'] ) {
		$hyphen_replacement = '';
		$endash_replacement = '';
		$emdash_replacement = '';
	}
	if ( isset( $punct_options['hyphens'] ) && 'keep' === $punct_options['hyphens'] ) {
		$hyphen_replacement = 'HYPHENTAIKASANA';
		$endash_replacement = 'ENDASHTAIKASANA';
		$emdash_replacement = 'EMDASHTAIKASANA';
	}

	$quote_replacement = ' ';
	if ( isset( $punct_options['quotes'] ) && 'remove' === $punct_options['quotes'] ) {
		$quote_replacement = '';
	}

	$ampersand_replacement = ' ';
	if ( isset( $punct_options['ampersands'] ) && 'remove' === $punct_options['ampersands'] ) {
		$ampersand_replacement = '';
	}
	if ( isset( $punct_options['ampersands'] ) && 'keep' === $punct_options['ampersands'] ) {
		$ampersand_replacement = 'AMPERSANDTAIKASANA';
	}

	$decimal_replacement = ' ';
	if ( isset( $punct_options['decimals'] ) && 'remove' === $punct_options['decimals'] ) {
		$decimal_replacement = '';
	}
	if ( isset( $punct_options['decimals'] ) && 'keep' === $punct_options['decimals'] ) {
		$decimal_replacement = 'DESIMAALITAIKASANA';
	}

	$replacement_array = array(
		'ß'                     => 'ss',
		'ı'                     => 'i',
		'₂'                     => '2',
		'·'                     => '',
		'…'                     => '',
		'€'                     => '',
		'®'                     => '',
		'©'                     => '',
		'™'                     => '',
		'&shy;'                 => '',
		"\xC2\xAD"              => '',
		'&nbsp;'                => ' ',
		chr( 194 ) . chr( 160 ) => ' ',
		'×'                     => ' ',
		'’'               => $quote_replacement,
		"'"                     => $quote_replacement,
		'’'                     => $quote_replacement,
		'‘'                     => $quote_replacement,
		'”'                     => $quote_replacement,
		'“'                     => $quote_replacement,
		'„'                     => $quote_replacement,
		'´'                     => $quote_replacement,
		'″'                     => $quote_replacement,
		//'-'                     => $hyphen_replacement,
		'–'                     => $endash_replacement,
		'—'                     => $emdash_replacement,
		'&'                => $ampersand_replacement,
		'&'                 => $ampersand_replacement,
		'&'                     => $ampersand_replacement,
		'@'                     => $at_replacement,
	);

	/**
	 * Filters the punctuation replacement array.
	 *
	 * This filter can be used to alter the way some of the most common punctuation
	 * is handled by Relevanssi.
	 *
	 * @param array $replacement_array The array of punctuation and the replacements.
	 */
	$replacement_array = apply_filters( 'relevanssi_punctuation_filter', $replacement_array );

	$a = preg_replace( '/\.(\d)/', $decimal_replacement . '\1', $a );

	// Replace end-of-line hyphenation with nothing, to create a full word
	$a = str_replace( array( "-\n", "-\r"), "", $a );
	
	$a = str_replace( "\r", ' ', $a );
	$a = str_replace( "\n", ' ', $a );
	$a = str_replace( "\t", ' ', $a );

	$a = stripslashes( $a );

	$a = str_replace( array_keys( $replacement_array ), array_values( $replacement_array ), $a );
	/**
	 * Filters the default punctuation replacement value.
	 *
	 * By default Relevanssi replaces unspecified punctuation with spaces. This
	 * filter can be used to change that behaviour.
	 *
	 * @param string $replacement The replacement value, default ' '.
	 */
	$a = preg_replace( '/[[:punct:]]+/u', apply_filters( 'relevanssi_default_punctuation_replacement', ' ' ), $a );
	$a = preg_replace( '/[[:space:]]+/', ' ', $a );

	$a = str_replace( 'AMPERSANDTAIKASANA', '&', $a );
	$a = str_replace( 'HYPHENTAIKASANA', '-', $a );
	$a = str_replace( 'ENDASHTAIKASANA', '–', $a );
	$a = str_replace( 'EMDASHTAIKASANA', '—', $a );
	$a = str_replace( 'DESIMAALITAIKASANA', '.', $a );

	$a = trim( $a );

	return $a;
}

But, just to clarify, I’m much more impressed with ASP than Relevanssi (which is why I’m here and using it). Its just a few things like this that could use improvement. But, again, I will be incorporating an NLP package to augment the power considerably with lemmas, since lemmas > stems > tokens > nothing.

  • This reply was modified 4 years, 7 months ago by nickchomey18nickchomey18.