This website uses cookies to personalize your experience. By using this website you agree to our cookie policy.

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 38 total)
  • Author
    Posts
  • in reply to: Some issues with Index table search of custom fields #35401
    nickchomey18nickchomey18
    Participant

    Great to hear! It definitely must be difficult to balance having a clean, easy-to-use interface with lots of powerful features.

    I’m happy to do my own custom work where it falls outside of what is reasonable for you to include for the average user – especially with regards to things that need packages installed on the server, like Tika and spaCy. But its good that you’ll be adding support for Tika, since it is a pretty “Core” feature that is supported by various competitors. Adding this support through an add-on system seems appropriate to reduce bloat in the core plugin for those who don’t need the features – it is what many other plugins do (SearchWP is a good example). The add-on system should also open the door for further integrations/extensions by other plugins/users.

    I’m glad that I could be helpful and I look forward to seeing what you come up with!

    in reply to: Some issues with Index table search of custom fields #35385
    nickchomey18nickchomey18
    Participant

    I understand completely – it is very hard to create a tool that works for all use-cases, so you’ve chosen certain ones over others. I do think options for choosing how to handle each type of punctuation would be useful, and Relevanssi’s code seems like a good starting point. Someone who wants to index “AB:123-44_42;567” could do so, and I could remove all of that. One of the many reasons I chose ASP is for its enormous amount of customizability, and this would be another example of that.

    However, the relevant question for you is “would many other ASP users sufficiently appreciate this to make it worth your time?” I don’t know.

    But, don’t take me into consideration for this decision. Regardless of how you proceed, I plan to replace ASP’s tokenization with a python NLP package (spaCy) – as a world-class NLP tool, I assume it does an excellent job with tokenization, but more importantly, it produces lemmas that should improve my table size and search performance considerably. It also creates all sorts of other interesting and useful data – identification of Parts of Speech, Named Entities, and more. Moreover, I’m also just curious to tinker with it all.

    Unlike with Apache Tika, I really doubt this is something that would ever be worth you working to integrate with ASP, but I figured I’d mention it.

    The extensive hooks in ASP make all of this relatively easy to do, and I’m sure you won’t mind adding some others should they be necessary.

    in reply to: Some issues with Index table search of custom fields #35366
    nickchomey18nickchomey18
    Participant

    When I say junk, I mean thousands of strings that contain punctuation. It was something like 2000/10000 rows that were junk – none of which were in the Relevanssi index table.

    Colons “: ” are not removed from the end of words, along with the other modifications I shared above, including dashes and hyphens. There’s also a lot with [brackets]/(parentheses) that I haven’t yet dealt with – those will never be matched by the search given that it only matches the start or end of a word.

    Again, Relevanssi’s TAIKASANA method works perfectly for all of this, so I strongly recommend you check out their code – it is much more effective. I’ve included it here:

    function relevanssi_remove_punct( $a ) {
    	if ( ! is_string( $a ) ) {
    		// In case something sends a non-string here.
    		return '';
    	}
    
    	$a = preg_replace( '/<(\d|\s)/', '\1', $a );
    	$a = html_entity_decode( $a, ENT_QUOTES );
    	$a = relevanssi_strip_all_tags( $a );
    
    	$punct_options = get_option( 'relevanssi_punctuation' );
    
    	$hyphen_replacement = ' ';
    	$endash_replacement = ' ';
    	$emdash_replacement = ' ';
    	if ( isset( $punct_options['hyphens'] ) && 'remove' === $punct_options['hyphens'] ) {
    		$hyphen_replacement = '';
    		$endash_replacement = '';
    		$emdash_replacement = '';
    	}
    	if ( isset( $punct_options['hyphens'] ) && 'keep' === $punct_options['hyphens'] ) {
    		$hyphen_replacement = 'HYPHENTAIKASANA';
    		$endash_replacement = 'ENDASHTAIKASANA';
    		$emdash_replacement = 'EMDASHTAIKASANA';
    	}
    
    	$quote_replacement = ' ';
    	if ( isset( $punct_options['quotes'] ) && 'remove' === $punct_options['quotes'] ) {
    		$quote_replacement = '';
    	}
    
    	$ampersand_replacement = ' ';
    	if ( isset( $punct_options['ampersands'] ) && 'remove' === $punct_options['ampersands'] ) {
    		$ampersand_replacement = '';
    	}
    	if ( isset( $punct_options['ampersands'] ) && 'keep' === $punct_options['ampersands'] ) {
    		$ampersand_replacement = 'AMPERSANDTAIKASANA';
    	}
    
    	$decimal_replacement = ' ';
    	if ( isset( $punct_options['decimals'] ) && 'remove' === $punct_options['decimals'] ) {
    		$decimal_replacement = '';
    	}
    	if ( isset( $punct_options['decimals'] ) && 'keep' === $punct_options['decimals'] ) {
    		$decimal_replacement = 'DESIMAALITAIKASANA';
    	}
    
    	$replacement_array = array(
    		'ß'                     => 'ss',
    		'ı'                     => 'i',
    		'₂'                     => '2',
    		'·'                     => '',
    		'…'                     => '',
    		'€'                     => '',
    		'®'                     => '',
    		'©'                     => '',
    		'™'                     => '',
    		'&shy;'                 => '',
    		"\xC2\xAD"              => '',
    		'&nbsp;'                => ' ',
    		chr( 194 ) . chr( 160 ) => ' ',
    		'×'                     => ' ',
    		'’'               => $quote_replacement,
    		"'"                     => $quote_replacement,
    		'’'                     => $quote_replacement,
    		'‘'                     => $quote_replacement,
    		'”'                     => $quote_replacement,
    		'“'                     => $quote_replacement,
    		'„'                     => $quote_replacement,
    		'´'                     => $quote_replacement,
    		'″'                     => $quote_replacement,
    		//'-'                     => $hyphen_replacement,
    		'–'                     => $endash_replacement,
    		'—'                     => $emdash_replacement,
    		'&'                => $ampersand_replacement,
    		'&'                 => $ampersand_replacement,
    		'&'                     => $ampersand_replacement,
    		'@'                     => $at_replacement,
    	);
    
    	/**
    	 * Filters the punctuation replacement array.
    	 *
    	 * This filter can be used to alter the way some of the most common punctuation
    	 * is handled by Relevanssi.
    	 *
    	 * @param array $replacement_array The array of punctuation and the replacements.
    	 */
    	$replacement_array = apply_filters( 'relevanssi_punctuation_filter', $replacement_array );
    
    	$a = preg_replace( '/\.(\d)/', $decimal_replacement . '\1', $a );
    
    	// Replace end-of-line hyphenation with nothing, to create a full word
    	$a = str_replace( array( "-\n", "-\r"), "", $a );
    	
    	$a = str_replace( "\r", ' ', $a );
    	$a = str_replace( "\n", ' ', $a );
    	$a = str_replace( "\t", ' ', $a );
    
    	$a = stripslashes( $a );
    
    	$a = str_replace( array_keys( $replacement_array ), array_values( $replacement_array ), $a );
    	/**
    	 * Filters the default punctuation replacement value.
    	 *
    	 * By default Relevanssi replaces unspecified punctuation with spaces. This
    	 * filter can be used to change that behaviour.
    	 *
    	 * @param string $replacement The replacement value, default ' '.
    	 */
    	$a = preg_replace( '/[[:punct:]]+/u', apply_filters( 'relevanssi_default_punctuation_replacement', ' ' ), $a );
    	$a = preg_replace( '/[[:space:]]+/', ' ', $a );
    
    	$a = str_replace( 'AMPERSANDTAIKASANA', '&', $a );
    	$a = str_replace( 'HYPHENTAIKASANA', '-', $a );
    	$a = str_replace( 'ENDASHTAIKASANA', '–', $a );
    	$a = str_replace( 'EMDASHTAIKASANA', '—', $a );
    	$a = str_replace( 'DESIMAALITAIKASANA', '.', $a );
    
    	$a = trim( $a );
    
    	return $a;
    }

    But, just to clarify, I’m much more impressed with ASP than Relevanssi (which is why I’m here and using it). Its just a few things like this that could use improvement. But, again, I will be incorporating an NLP package to augment the power considerably with lemmas, since lemmas > stems > tokens > nothing.

    • This reply was modified 4 years, 7 months ago by nickchomey18nickchomey18.
    in reply to: Some issues with Index table search of custom fields #35359
    nickchomey18nickchomey18
    Participant

    Yes, I more or less understand how your index works and it is extremely similar to Relevanssi. I have no problem with the methodology. Its just a mystery that certain words are not working for partial matching. I’ll reassess when I’ve made the changes that I am looking to incorporate.

    My comments on the tokenization were just an aside to explain what I’m doing, but here are a few modifications that I made before starting to explore using an NLP package. They made a major difference in reducing junk entries – many thousands of entries full of punctuation were removed (ones which mostly do not exist in the Relevanssi index).

    // Replace end-of-line hyphenation with nothing, to create a full word. Needs to be before the line for removing line breaks.
    $str = str_replace( array( “-\n”, “-\r”), “”, $str );

    // Replace hyphens, and em and en dashes with space.
    $str = str_replace( array( “—”, “–”, “-“), ” “, $str );

    These were added to the array for various symbols.
    “: “,
    “__”,
    //quote replacement
    “’”,
    “‘”,
    “’”,
    “‘”,
    “””,
    ““”,
    “„”,
    “´”,
    “″”,

    I particularly like the user customizability options of Relevanssi for which things to replace and with what (see attached screenshot). And their TAIKASANA magic method for preserving decimals, ampersands and hyphens/dashes while removing all other punctuation with

    $a = preg_replace( ‘/[[:punct:]]+/u’, apply_filters( ‘relevanssi_default_punctuation_replacement’, ‘ ‘ ), $a );

    is quite clever.

    Anyway, this is all beside the point of this ticket. If you test and incorporate them, I would be very surprised if they created any negative side-effects for anyone, especially if combined with user-selectable options in the backend. I hope this has been helpful.

    I will update the ticket when I have made my changes with the NLP tool.

    in reply to: Some issues with Index table search of custom fields #35320
    nickchomey18nickchomey18
    Participant

    Thanks, but there aren’t any keyword exclusions. I have stop words enabled, but I assume that excludes them from the index table to begin with and I’ve confirmed that the words that I’m searching for are in the table (searching for the full word returns a result, after all). (Also, the words that I’m having trouble with are not stop words).

    I have a suspicion that the issue has something to do with the fact that there are duplicate similar entries – one of the words in question is “concepts”. As seen in the attached screenshot, there are entries for “concept” and “concepts” for a few different posts. I have no trouble for the posts that have the “concept” term. It is just the two (1179, 1180) that only have “concepts” that won’t return partial matches. But, I’ve looked at various other situations and the results are inconsistent.

    Anyway, because of this issue as well as some others that I’ve noticed (e.g. a lot of junk terms in the table with punctuation), I started looking into how you create tokens. I made some modifications that reduced the junk terms (e.g. $str = str_replace( array( “-\n”, “-\r”), “”, $str ); to create one word from when a PDF has a hyphen at the end of a line).

    It should be noted that Relevanssi does a CONSIDERABLY better job at tokenizing content than ASP. Their free version is open-source, so perhaps you could look into incorporating some of their code into ASP. Not sure how open-source licensing affects a commercial product…

    But I also realized there are plenty of excellent open-source tools out there dedicated to this specific task (and much more), so why try to recreate (a very complex) wheel. So, I am currently looking into integrating a python-based NLP package to generate tokens, stems or even lemmas, as well as store other data that I might make use of in other ways. That would eliminate the “concept/s” issue, reduce table size considerably, and surely improve overall performance.

    As with Tika support, I suspect that it would be difficult to include something like this into the ASP plugin, as it requires server-level packages. But it is another thing for you to consider.

    So, given all of this, perhaps it is best to put this issue on pause until I’ve made these changes. I’ll report back then.

    in reply to: Some issues with Index table search of custom fields #35290
    nickchomey18nickchomey18
    Participant

    I have investigated a bit more and it does seem to be doing partial matching for custom fields. But the problem is actually much more mysterious – it is only certain words that aren’t doing partial matches, and it doesn’t matter if the content is stored in wp_posts.post_content or wp_postmeta.meta_value. Nor does it matter (nor should it matter) which attachment it is associated with. I really can’t find a pattern.

    Are there certain characters or suffixes that are excluded or treated differently?

    in reply to: Some issues with Index table search of custom fields #35261
    nickchomey18nickchomey18
    Participant

    Thank you. Yes, I’m aware it doesn’t do mid-word matches. This is not working for strings that start a word. There’s maybe 5000 rows in the table right now – very small. (by the way, how many can it reasonably handle? Or is it simply limited by your server processing power?)

    Any suggestions on what configuration I could check? Again, it returns partial matches (start or end of a word) when the match is in the post title or content, but not an associated custom field.

    Thanks for the explanation about custom field selection. I understand.

    Sorry, re: the highlighting – I’m wondering how to display the custom field content excerpt in the live search results. Is it a matter of adding code to the template files?

    nickchomey18nickchomey18
    Participant

    I have found some bugs in this mechanism.

    1. The attachment_query arguments don’t work with an index table search. Line 37 of ajax-search-pro\includes\classes\search\class-asp-search-attachments.php replaces the rest of the function with class-asp-search-indextable.php, which only checks for $args[‘cpt_query’]

    I did the following, but perhaps you will have a better solution for the next update.

    
    if ($words == "attachment"){
        if ( isset($args['attachment_query']) && is_array($args['attachment_query']) ) {
        $this->query = str_replace(
            array('{args_fields}', '{args_join}', '{args_where}', '{args_orderby}'),
            array($args['attachment_query']['fields'], $args['attachment_query']['join'], $args['attachment_query']['where'], $args['attachment_query']['orderby']),
            $this->query
        );
        } else {
            $this->query = str_replace(
                array('{args_fields}', '{args_join}', '{args_where}', '{args_orderby}'),
                '',
                $this->query
            );
        }
        if ( isset($args['attachment_query'], $args['attachment_query']['groupby']) && $args['attachment_query']['groupby'] != '' ) {
            $this->query = str_replace('{args_groupby}', $args['attachment_query']['groupby'], $this->query);
        } else {
            $this->query = str_replace('{args_groupby}', "$wpdb->posts.ID", $this->query);
        }
    }	
    else {
        if ( isset($args['cpt_query']) && is_array($args['cpt_query']) ) {
            $_mod_q = $args['cpt_query'];
            foreach ($_mod_q as $qk => $qv) {
                $_mod_q[$qk] = str_replace($wpdb->posts . '.ID', 'asp_index.doc', $_mod_q[$qk]);
            }
            $this->query = str_replace(
                array('{args_fields}', '{args_join}', '{args_where}', '{args_orderby}'),
                array($_mod_q['fields'], $_mod_q['join'], $_mod_q['where'], $_mod_q['orderby']),
                $this->query
            );						
        } 
        else {
            $this->query = str_replace(
                array('{args_fields}', '{args_join}', '{args_where}', '{args_orderby}'),
                '',
                $this->query
            );
        }					
    }

    2. I get the following error (excerpted) :

    WordPress database error: [You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near ‘DISTINCT asp_index.doc as id, asp_index.blogid as blogid, ‘pagepost’ a’ at line 4]
    SELECT pm.meta_value as activityid, DISTINCT asp_index.doc as id, asp_index.blogid as blogid, ‘pagepost’ as

    The problem being with the Improved title query str_replace – it adds DISTINCT to the query, but the {args_fields} and $add_select hooks are inserted before that.

    I fixed it by moving these to the bottom of the SELECT section, before FROM, but I don’t know if this is the appropriate place to do it. I had to move the comma in the snippet you provided above to be at the front rather than trailing.

    nickchomey18nickchomey18
    Participant

    Thanks very much for the clarification – that makes complete sense. Anyway, I think I have what I need for now to figure this all out. I should be leaving you alone for a little while 😉

    in reply to: Hamburger responsive menu button on this site doesn't work #35155
    nickchomey18nickchomey18
    Participant

    Thanks. But what I was really hoping for was that the menu button could be fixed or some other responsive menu implementation used – as far as I can tell, there is no way to see or use the buttons for My Tickets, My Profile, etc… while on mobile (or a shrunken desktop window).

    nickchomey18nickchomey18
    Participant

    Very slick! That’s much easier than using numerous hooks, functions etc… I’m sure that I can figure out what I need from this. Thanks very much!

    Having said that, I might not even use this method anymore. Given that ASP only uses the Index Table for posts in wp_posts, ASP’s BuddyPress search is effectively the same as BuddyBoss’: wpdb->get_results. They have a tremendous amount of redundancy in their code (Query Monitor tells me that they have something like 100 duplicate queries…), but it is all set up to do the filtering for friend, group and post privacy; media such as documents, photos and videos; and also has the results formatted for their HTML generation.

    So, at least for now, I’ll probably just use ASP for Index Table post types, pass the results to BB through creating a global variable with the ‘asp_results’ filter, format those results as needed for BB, and then perform the other searches with BB’s search. I’ll have to do some modifications for bbpress Topic and Reply results, which are in the Index Table, but it will be far less than re-creating all of the other stuff.

    But perhaps later (both out of curiosity and to streamline everything) I will use the asp_query_args that you’ve provided to move everything over to ASP

    Thanks again for your help. Hopefully you’ll soon be able to document these args for the sake of others.

    P.s. Again, if you’re curious about integrating more of the BuddyBoss stuff (respect privacy, documents etc…) into ASP, the code is all open-source here: https://github.com/buddyboss/buddyboss-platform/tree/release/src/bp-search.

    And, for example, here is the BuddyPress activities code, and if you look at lines 69-118, 139-142, you’ll see the code for filtering for group, friend, and post privacy. On closer inspection, it is not particularly complicated. It could (and should) be made into a function in the main class so that it isn’t called so many times. https://github.com/buddyboss/buddyboss-platform/blob/release/src/bp-search/classes/class-bp-search-activities.php

    Anyway, I am quite confident that many BuddyBoss users would buy ASP if this was integrated, to say nothing of BuddyPress users who don’t have any search functionality. Because, as it stands, the ASP BuddyPress search is not particularly useful without being able to adequately respect privacy.

    nickchomey18nickchomey18
    Participant

    Hah, I didn’t even consider using a filter as an action hook and just returning an unmodified value! So, using ‘asp_results’ to generate a new global variable works just fine! That’s seems to be the solution to this whole thread – there’s no need to copy the hook to the override code (unless you think that is necessary anyway). I should be able to tinker away on my own on the BuddyBoss side of the code to get it to work now.

    As for the comment that these threads are beyond a regular support query – I hope you can clarify what you mean by that. I have no intention to overstep my bounds and take advantage of you. I have seen many tickets where you go as far as writing custom code for people, and all I’ve been essentially asking for in this thread is to know which hook I can use. Or, for the other thread, to add a few more hooks, which seems to me to be a very minor task/request which would allow me and anyone to have a lot more flexibility in helping ourselves.

    Anyway, for now I will add in my own SQL query hooks and hope that you’ll be able to incorporate them officially in your next release, at which point I can modify my code to align with your hooks.

    nickchomey18nickchomey18
    Participant

    I dug into the code a bit more and it appears that ASP just passes the results to WP_Query’s posts variable, which is then used by the Loop. But it appears to be cleared/inaccessible by the time the BuddyBoss code is initiated based on the URL parameters mentioned above.

    So, I have made a couple different modifications directly in the ASP code that allow me to access the search results elsewhere. I added a global variable to the end of wp-content\plugins\ajax-search-pro\includes\classes\search\class-asp-query.php that is equal to the returned array, $results. This can then be retrieved from within the BB search code. I also put this array in $GLOBAL[‘variable’], which seems to offer the same ability to be retrieved.

    This is obviously not a sustainable solution, given that it won’t persist with updates to ASP. So, it would be ideal if I could create this global variable via a do_action hook.

    In class-asp-search.php, there is a hook “do_action(‘asp_after_search’, $s, $results, $id);” but this appears to only be accessible by the live search. Or, at the very least, it is not accessible by the class-asp-searchoverride.php that is being used in this case.

    So, I am hoping you could perhaps do one of two things:

    1. move the ‘asp_after_search’ hook to a place (class-asp-query?) where it would be accessible to all search methods. After all, this is where ‘asp_before_search’, and 8 other after_contenttype_results hooks are located.

    2. create a new hook either at the end of class-asp-query.php or within class-asp-searchoverride.php

    I could then use one of these hooks to create a global variable of my own, which I could then use from my modified BuddyBoss search function.

    While you are at it, it would be greatly appreciated if you could add the SQL Query filters requested in this ticket: https://wp-dreams.com/forums/topic/can-you-please-add-additional-query_add_select-from-join-where-filters/

    If you have thoughts on the above or any alternative solutions for this problem, I would love to hear them.

    Thanks very much!

    nickchomey18nickchomey18
    Participant

    I don’t know anything about BuddyPress search since I use BuddyBoss, but as I detailed above

    I have confirmed that if I use “?s={phrase}&bp_search=1&view=content” as ASP’s Custom Redirect URL, it will initiate the [BuddyBoss search functions and html template generation]

    So, all I need to know is in what manner ASP passes its results to the search.php page, so that I can grab them from within the BuddyBoss Search functions and bypass the BuddyBoss search query. Can you please help me with this?

    nickchomey18nickchomey18
    Participant

    Actually, I foresee various issues with this.

    If I initiate the new ASP_Query from within the BB code, that would require me to leave the search form as the BB search form, rather than change it to an ASP search form with a shortcode. This means that I would lose the ajax live search functionality, asp settings etc…

    So, I was hoping to replace the BB search form with an ASP shortcode, which would allow me to leverage all of ASP’s front-end functionality, and then pass the results to the BB search component, stripping out BB’s query and leaving the rest of the HTML generation intact.

    At best, it seems like if I use the code you’ve provided while also using an ASP search form, ASP will generate two (or perhaps even 3) queries, wouldn’t it? One for the ajax live search, perhaps another one when pressing enter/clicking the search button, and certainly one more when a new ASP_Query is initiated in the bb do_search function (presumably without the use of any front-end filter settings).

    Am I wrong about any of this? It seems like it would be ideal if the existing ASP search could use an ASP action or filter to pass its arguments/results to one of the 3 functions I attached in the previous comment. Or, perhaps the results are stored in one of the PHP Sueprglobal variables (e.g. $_POST), from where they can be grabbed by BB’s do_action?

    Perhaps another way to look at this is more generally – how does ASP typically pass results to our custom search.php html template file? Shouldn’t I be able to use that mechanism?

    • This reply was modified 4 years, 7 months ago by nickchomey18nickchomey18.
Viewing 15 posts - 1 through 15 (of 38 total)