Show extract from pdf content

Home Forums Product Support Forums Ajax Search Pro for WordPress Support Show extract from pdf content

This topic contains 22 replies, has 2 voices, and was last updated by RomSoul RomSoul 6 months, 2 weeks ago.

Viewing 15 posts - 1 through 15 (of 23 total)
  • Author
    Posts
  • #42987
    RomSoul
    RomSoul
    Participant

    Hello,

    I am using the ASP plugin to search inside pdf media files,
    thanks for this powerful plugin !

    I have been through the options but I still have questions, maybe I missed something

    When a pdf media appears in search results, I’d like more context if possible, like :
    – a content extract that would show in which sentence the keyword appears
    – and/or the page number of the pdf file where the keyword first appears.

    Is there a setting that would show such info ?
    last question : is there a way to open directly the pdf file when a pdf result is clicked, rather than showing the media page ?

    Thanks a lot,

    Attachments:
    You must be logged in to view attached files.
    #42991
    RomSoul
    RomSoul
    Participant

    Hello again,

    in fact I’d like my search to work just like on the demo :
    https://ajaxsearchpro.com/file-search/

    What do I need to show the exact extract where the keyword appears ?

    #43011
    Ernest Marcinko
    Ernest Marcinko
    Keymaster

    Hi,

    Thank you for your kind words!

    It is possible to display context if the media service parser is used. That can extract a best possible accurate text content (using an external Apache Tika server) and store it locally with the media file fields. The local parser can’t do that unfortunately, PHP generally is very limited for extracting data from PDF files. Mostly they get the text, but it’s very unstructured, not usable for accurate display.
    The plugin will automatically use that as the media description field and will try to display as close to the context as possible. The Free subscription has no file limits, only thefilesizes are limited to 10MB.

    is there a way to open directly the pdf file when a pdf result is clicked, rather than showing the media page ?
    Yes, via this option.

    Best,
    Ernest Marcinko

    If you like my products, don't forget to rate them on codecanyon :)


    #43037
    RomSoul
    RomSoul
    Participant

    Thanks a lot, M.Marcinko,

    • pdf now open directly when I click on a result
    • I subscribed a free licence for media service parser in order to test and show the other members ! severalmedia pdf are over 10MB so I hope to convince them we need a standard licence

    How do I build the index now that I have a media parser licence ?
    I clicked “Create new index” but it seems like it is going to take time… will it automatically use the media parser ?

    Tnaks again,

    #43043
    Ernest Marcinko
    Ernest Marcinko
    Keymaster

    Yes – it will use the media service automatically if you have activated the license on the index table panel.
    After the index you will see how many files were successfully indexed by the media service.

    Best,
    Ernest Marcinko

    If you like my products, don't forget to rate them on codecanyon :)


    #44640
    RomSoul
    RomSoul
    Participant

    Hi,

    I now use the media service parser pro, it makes the search so powerful in big pdf files, this is impressive !
    I can see that the highlighting and the “show extract” work for some results, but not for others (attached screenshot) :
    Is there something I can do to enhance it ?

    Is there a way to display next to each result the number of times the searched keyword appears in this content ?

    Thank you,

    Attachments:
    You must be logged in to view attached files.
    #44647
    Ernest Marcinko
    Ernest Marcinko
    Keymaster

    Thank you!

    I can see that the highlighting and the “show extract” work for some results, but not for others (attached screenshot) :
    Is there something I can do to enhance it ?

    Probably to some extent. If it shows for some and not for others, then it is most likely that for longer texts the highlighter is not reaching the match. Try increasing this option to 99999
    That should increase the overall context for the matches.

    Is there a way to display next to each result the number of times the searched keyword appears in this content ?
    I’m afraid no, that information is not possible to obtain.

    Best,
    Ernest Marcinko

    If you like my products, don't forget to rate them on codecanyon :)


    #44776
    RomSoul
    RomSoul
    Participant

    Hi Mr Marcinko,
    thanks a lot, it works better : now more results show the right extract, with highlighted keyword.
    Some results still do not, I guess some of my pdfs are really too loooong.

    About the number of occurrences : thank you for your answer …it was mainly curiosity, the search is great without it.

    #44789
    Ernest Marcinko
    Ernest Marcinko
    Keymaster

    You are very welcome!

    If you want, you can still try to increase that option by an order of a magnitude to see if that changes anything, to 999999

    Best,
    Ernest Marcinko

    If you like my products, don't forget to rate them on codecanyon :)


    #44796
    RomSoul
    RomSoul
    Participant

    Is there a precise limit, or can I try any huge number ?

    #44798
    Ernest Marcinko
    Ernest Marcinko
    Keymaster

    There is no programmatical limit – the PHP process itself limit it though, as it needs quite a bit of memory to make the string operations. You can enter as big as you want, PHP should take care if the limit is too high.

    Best,
    Ernest Marcinko

    If you like my products, don't forget to rate them on codecanyon :)


    #44864
    RomSoul
    RomSoul
    Participant

    Hi again,
    I have an issue with a pdf file which is under the 10MB limit :
    It is part of my media files, it is correctly indexed (I can see the label “Parsed content by Ajax Search Pro Media Parser service”)
    but this file is never returned in search results, even when I search terms that correctly appear in the “Content (not editable)”

    Index seems complete (attached screenshot : when I press “continue existing index” I get the answer “Success” )

    This is the file :
    https://societedesetudesdulot.org/wp-content/uploads/2023/01/Bulletin-de-la-SEL-T133-2012.pdf

    What can I do to include this pdf in the pool ? I only found this one but now I wonder if there are other invisible medias !

    Thanks for your help,

    Attachments:
    You must be logged in to view attached files.
    #44878
    Ernest Marcinko
    Ernest Marcinko
    Keymaster
    You cannot access this content. Best,
    Ernest Marcinko

    If you like my products, don't forget to rate them on codecanyon :)


    #44885
    RomSoul
    RomSoul
    Participant
    You cannot access this content.
    #44901
    Ernest Marcinko
    Ernest Marcinko
    Keymaster
    You cannot access this content. Best,
    Ernest Marcinko

    If you like my products, don't forget to rate them on codecanyon :)


Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic.