Home › Forums › Product Support Forums › Ajax Search Pro for WordPress Support › Some issues with Index table search of custom fields › Reply To: Some issues with Index table search of custom fields
When I say junk, I mean thousands of strings that contain punctuation. It was something like 2000/10000 rows that were junk – none of which were in the Relevanssi index table.
Colons “: ” are not removed from the end of words, along with the other modifications I shared above, including dashes and hyphens. There’s also a lot with [brackets]/(parentheses) that I haven’t yet dealt with – those will never be matched by the search given that it only matches the start or end of a word.
Again, Relevanssi’s TAIKASANA method works perfectly for all of this, so I strongly recommend you check out their code – it is much more effective. I’ve included it here:
function relevanssi_remove_punct( $a ) {
if ( ! is_string( $a ) ) {
// In case something sends a non-string here.
return '';
}
$a = preg_replace( '/<(\d|\s)/', '\1', $a );
$a = html_entity_decode( $a, ENT_QUOTES );
$a = relevanssi_strip_all_tags( $a );
$punct_options = get_option( 'relevanssi_punctuation' );
$hyphen_replacement = ' ';
$endash_replacement = ' ';
$emdash_replacement = ' ';
if ( isset( $punct_options['hyphens'] ) && 'remove' === $punct_options['hyphens'] ) {
$hyphen_replacement = '';
$endash_replacement = '';
$emdash_replacement = '';
}
if ( isset( $punct_options['hyphens'] ) && 'keep' === $punct_options['hyphens'] ) {
$hyphen_replacement = 'HYPHENTAIKASANA';
$endash_replacement = 'ENDASHTAIKASANA';
$emdash_replacement = 'EMDASHTAIKASANA';
}
$quote_replacement = ' ';
if ( isset( $punct_options['quotes'] ) && 'remove' === $punct_options['quotes'] ) {
$quote_replacement = '';
}
$ampersand_replacement = ' ';
if ( isset( $punct_options['ampersands'] ) && 'remove' === $punct_options['ampersands'] ) {
$ampersand_replacement = '';
}
if ( isset( $punct_options['ampersands'] ) && 'keep' === $punct_options['ampersands'] ) {
$ampersand_replacement = 'AMPERSANDTAIKASANA';
}
$decimal_replacement = ' ';
if ( isset( $punct_options['decimals'] ) && 'remove' === $punct_options['decimals'] ) {
$decimal_replacement = '';
}
if ( isset( $punct_options['decimals'] ) && 'keep' === $punct_options['decimals'] ) {
$decimal_replacement = 'DESIMAALITAIKASANA';
}
$replacement_array = array(
'ß' => 'ss',
'ı' => 'i',
'₂' => '2',
'·' => '',
'…' => '',
'€' => '',
'®' => '',
'©' => '',
'™' => '',
'­' => '',
"\xC2\xAD" => '',
' ' => ' ',
chr( 194 ) . chr( 160 ) => ' ',
'×' => ' ',
'’' => $quote_replacement,
"'" => $quote_replacement,
'’' => $quote_replacement,
'‘' => $quote_replacement,
'”' => $quote_replacement,
'“' => $quote_replacement,
'„' => $quote_replacement,
'´' => $quote_replacement,
'″' => $quote_replacement,
//'-' => $hyphen_replacement,
'–' => $endash_replacement,
'—' => $emdash_replacement,
'&' => $ampersand_replacement,
'&' => $ampersand_replacement,
'&' => $ampersand_replacement,
'@' => $at_replacement,
);
/**
* Filters the punctuation replacement array.
*
* This filter can be used to alter the way some of the most common punctuation
* is handled by Relevanssi.
*
* @param array $replacement_array The array of punctuation and the replacements.
*/
$replacement_array = apply_filters( 'relevanssi_punctuation_filter', $replacement_array );
$a = preg_replace( '/\.(\d)/', $decimal_replacement . '\1', $a );
// Replace end-of-line hyphenation with nothing, to create a full word
$a = str_replace( array( "-\n", "-\r"), "", $a );
$a = str_replace( "\r", ' ', $a );
$a = str_replace( "\n", ' ', $a );
$a = str_replace( "\t", ' ', $a );
$a = stripslashes( $a );
$a = str_replace( array_keys( $replacement_array ), array_values( $replacement_array ), $a );
/**
* Filters the default punctuation replacement value.
*
* By default Relevanssi replaces unspecified punctuation with spaces. This
* filter can be used to change that behaviour.
*
* @param string $replacement The replacement value, default ' '.
*/
$a = preg_replace( '/[[:punct:]]+/u', apply_filters( 'relevanssi_default_punctuation_replacement', ' ' ), $a );
$a = preg_replace( '/[[:space:]]+/', ' ', $a );
$a = str_replace( 'AMPERSANDTAIKASANA', '&', $a );
$a = str_replace( 'HYPHENTAIKASANA', '-', $a );
$a = str_replace( 'ENDASHTAIKASANA', '–', $a );
$a = str_replace( 'EMDASHTAIKASANA', '—', $a );
$a = str_replace( 'DESIMAALITAIKASANA', '.', $a );
$a = trim( $a );
return $a;
}
But, just to clarify, I’m much more impressed with ASP than Relevanssi (which is why I’m here and using it). Its just a few things like this that could use improvement. But, again, I will be incorporating an NLP package to augment the power considerably with lemmas, since lemmas > stems > tokens > nothing.
-
This reply was modified 4 years, 7 months ago by
nickchomey18.