Home › Forums › Product Support Forums › Ajax Search Pro for WordPress Support › Fix search issue in PDFs
This topic contains 12 replies, has 2 voices, and was last updated by Ernest Marcinko 1 year, 3 months ago.
- AuthorPosts
- November 17, 2021 at 10:10 am #35601
The plugin is not searching inside PDFs, I have enabled all the PDF options inside the plugin.
Eg:Search “packing sector”.
On the https://www.staging9.canadaid.ca/traceability/newsletters/ page, there is a link to “Abattoir Insights” pdf which contains “packing sector” it’s not showing up in the search results.Also, it doesn’t search in IFrames despite that option being turned on. Eg: https://www.staging9.canadaid.ca/who-we-are/
November 18, 2021 at 1:34 pm #35622Hi,
Thank you for the details, it helped a lot.
The issue was, that the attachment search was turned off, I turned it on under the General Options -> Attachments search panel: https://i.imgur.com/w4bfxq7.png
I am seeing the PDF file in the results now.The iframe extraction is a highly experimental feature, and as it is stated it may not work in all cases. On that page, there is an iframe with a custom flipbook script, embedded with a PDF reader of some sort. I’m afraid there is no way of indexing that, as it is also embedded in the iframe. That feature works great with HTML text content, it can extract most of that, but complex data – like embeds are not possible to fetch.
Best,
Ernest Marcinko
If you like my products, don't forget to rate them on codecanyon :)
November 23, 2021 at 7:55 am #35675You cannot access this content.November 23, 2021 at 5:25 pm #35689You are right, I think I found the problem.
By default, the plugin stops the search process whenever finds the sufficient amount of results to conserve performance. There seems to be an issue with this when multiple sources are selected, specifically the attachment and post type sources at the same time – I can replicate this on our test servers as well.
The quickest and most effective solution for now was, that I increased the number of results a bit, to 40 at a time, that should increase the potential results pool, and improve the matches a lot: https://i.imgur.com/tvEqmtE.png
It is an effective bypass to the issue for now.I will make sure to resolve this in the upcoming release completely – it should be out within a week.
Best,
Ernest Marcinko
If you like my products, don't forget to rate them on codecanyon :)
November 27, 2021 at 5:35 am #35774You cannot access this content.December 2, 2021 at 7:00 pm #35849You cannot access this content.December 3, 2021 at 9:38 am #35850Can you please update to 4.21.6. We have addressed an issue related to this. After the update, the plugin should return the results individually from both the post types and the attachments, when using the index table engine.
Best,
Ernest Marcinko
If you like my products, don't forget to rate them on codecanyon :)
December 4, 2021 at 10:13 am #35863You cannot access this content.December 4, 2021 at 2:52 pm #35869You cannot access this content. Best,
Ernest Marcinko
If you like my products, don't forget to rate them on codecanyon :)
December 7, 2021 at 8:48 am #35891You cannot access this content.December 7, 2021 at 11:20 am #35901On my end it is the 3rd result, here: https://i.imgur.com/z5K31Ii.png
Best,
The two preceeding results have the “sector” keyword in them 2 times, so they come as more relevant.
Ernest Marcinko
If you like my products, don't forget to rate them on codecanyon :)
December 7, 2021 at 7:56 pm #35914You cannot access this content.December 8, 2021 at 10:26 am #35920Q1: The index table engine can not do exact matching such as matching the words in order – as the text is extracted from the file contents to a table, where the occurences are also counted. Therefore not the full text is stored, but each keyword with additional field information. Exact matching on files is not possible, as the information needs to be extracted first to the index table. Searching the file contents directly would be extremely slow.
Q2: The search “calendar of events” – The “Events” result is a post type result. Currently, the plugin is configured to return Media files and Post types as results. For each result group a separate query needs to be executed.
Currently, first the attachments, then the post type results are displayed in order. So the first 10 results are the matching attachments, then come the 40 post type results (when match): https://i.imgur.com/MHPWCGA.png
You can change the mixed results ordering here, if you want to see the post type results first, then the attachment results after.
The “EVENTS” is #15 on the list, because 10 media file results preceed it, then more relevant matches because of the “of” keyword occurence on the first 4 results (both in titles and contents).Based on your queries, I made the following changes to get the best possible matches from each group. I turned on the stop-words on the index table, as well as increased the minimum character count for words to 3: https://i.imgur.com/Nj9K83f.png
This should improve the matches greatly, as many unrelated common words get filtered out. Around 25 000 common words were removed from the index.Then, for very strict matches, I recommend using either only the “AND” logic or the “AND with exact keyword matches” logic: https://i.imgur.com/IHoknQp.png
Best,
The secondary “OR” logic currently activated fills up the results with fuzzy matches from any of the keyphrases. I don’t think you need that, as it yields most of the matches you may not want to see.
Ernest Marcinko
If you like my products, don't forget to rate them on codecanyon :)
- AuthorPosts
You must be logged in to reply to this topic.