Home › Forums › Product Support Forums › Ajax Search Pro for WordPress Support › Show extract from pdf content
- This topic has 22 replies, 2 voices, and was last updated 2 years, 9 months ago by
Romain Soulcié.
-
AuthorPosts
-
June 7, 2023 at 9:02 am #42987
Romain Soulcié
ParticipantHello,
I am using the ASP plugin to search inside pdf media files,
thanks for this powerful plugin !I have been through the options but I still have questions, maybe I missed something
When a pdf media appears in search results, I’d like more context if possible, like :
– a content extract that would show in which sentence the keyword appears
– and/or the page number of the pdf file where the keyword first appears.Is there a setting that would show such info ?
last question : is there a way to open directly the pdf file when a pdf result is clicked, rather than showing the media page ?Thanks a lot,
June 7, 2023 at 9:10 am #42991Romain Soulcié
ParticipantHello again,
in fact I’d like my search to work just like on the demo :
https://ajaxsearchpro.com/file-search/What do I need to show the exact extract where the keyword appears ?
June 7, 2023 at 12:55 pm #43011Ernest Marcinko
KeymasterHi,
Thank you for your kind words!
It is possible to display context if the media service parser is used. That can extract a best possible accurate text content (using an external Apache Tika server) and store it locally with the media file fields. The local parser can’t do that unfortunately, PHP generally is very limited for extracting data from PDF files. Mostly they get the text, but it’s very unstructured, not usable for accurate display.
The plugin will automatically use that as the media description field and will try to display as close to the context as possible. The Free subscription has no file limits, only thefilesizes are limited to 10MB.is there a way to open directly the pdf file when a pdf result is clicked, rather than showing the media page ?
Yes, via this option.June 8, 2023 at 4:55 pm #43037Romain Soulcié
ParticipantThanks a lot, M.Marcinko,
- pdf now open directly when I click on a result
- I subscribed a free licence for media service parser in order to test and show the other members ! severalmedia pdf are over 10MB so I hope to convince them we need a standard licence
How do I build the index now that I have a media parser licence ?
I clicked “Create new index” but it seems like it is going to take time… will it automatically use the media parser ?Tnaks again,
June 12, 2023 at 8:52 am #43043Ernest Marcinko
KeymasterYes – it will use the media service automatically if you have activated the license on the index table panel.
After the index you will see how many files were successfully indexed by the media service.July 7, 2023 at 10:16 pm #44640Romain Soulcié
ParticipantHi,
I now use the media service parser pro, it makes the search so powerful in big pdf files, this is impressive !
I can see that the highlighting and the “show extract” work for some results, but not for others (attached screenshot) :
Is there something I can do to enhance it ?Is there a way to display next to each result the number of times the searched keyword appears in this content ?
Thank you,
July 10, 2023 at 12:42 pm #44647Ernest Marcinko
KeymasterThank you!
I can see that the highlighting and the “show extract” work for some results, but not for others (attached screenshot) : Is there something I can do to enhance it ?Probably to some extent. If it shows for some and not for others, then it is most likely that for longer texts the highlighter is not reaching the match. Try increasing this option to 99999
That should increase the overall context for the matches.Is there a way to display next to each result the number of times the searched keyword appears in this content ?
I’m afraid no, that information is not possible to obtain.July 21, 2023 at 5:51 pm #44776Romain Soulcié
ParticipantHi Mr Marcinko,
thanks a lot, it works better : now more results show the right extract, with highlighted keyword.
Some results still do not, I guess some of my pdfs are really too loooong.About the number of occurrences : thank you for your answer …it was mainly curiosity, the search is great without it.
July 24, 2023 at 6:35 pm #44789Ernest Marcinko
KeymasterYou are very welcome!
If you want, you can still try to increase that option by an order of a magnitude to see if that changes anything, to
999999July 24, 2023 at 10:23 pm #44796Romain Soulcié
ParticipantIs there a precise limit, or can I try any huge number ?
July 25, 2023 at 10:07 am #44798Ernest Marcinko
KeymasterThere is no programmatical limit – the PHP process itself limit it though, as it needs quite a bit of memory to make the string operations. You can enter as big as you want, PHP should take care if the limit is too high.
July 29, 2023 at 5:12 pm #44864Romain Soulcié
ParticipantHi again,
I have an issue with a pdf file which is under the 10MB limit :
It is part of my media files, it is correctly indexed (I can see the label “Parsed content by Ajax Search Pro Media Parser service”)
but this file is never returned in search results, even when I search terms that correctly appear in the “Content (not editable)”Index seems complete (attached screenshot : when I press “continue existing index” I get the answer “Success” )
This is the file :
https://societedesetudesdulot.org/wp-content/uploads/2023/01/Bulletin-de-la-SEL-T133-2012.pdfWhat can I do to include this pdf in the pool ? I only found this one but now I wonder if there are other invisible medias !
Thanks for your help,
July 31, 2023 at 11:58 am #44878Ernest Marcinko
KeymasterYou cannot access this content.
July 31, 2023 at 2:16 pm #44885Romain Soulcié
ParticipantYou cannot access this content.
August 2, 2023 at 11:31 am #44901Ernest Marcinko
KeymasterYou cannot access this content.
-
AuthorPosts
- You must be logged in to reply to this topic.