Software That Searches Inside PDF Files

Questions on how we spend our money and our time - consumer goods and services, home and vehicle, leisure and recreational activities
Post Reply
Topic Author
Leesbro63
Posts: 6085
Joined: Mon Nov 08, 2010 4:36 pm

Software That Searches Inside PDF Files

Post by Leesbro63 » Wed Feb 27, 2019 11:13 am

I am curious if there is software that "searches" inside PDF files. In particular, I'm involved with a religious organization that owns a 100 year old cemetery. We have files on each deceased, many with multiple documents for multiple generations. Someone's grandfather might have bought plots 75 years ago and there might be information about many family members buried in those plots over the years, many with different last names. Once we scan each folder, we'd like to be able to "search" for "John Doe" and find out which files have any documents that mention him. Any suggestions are appreciated.

scoreboard
Posts: 21
Joined: Tue Dec 12, 2017 8:19 am

Re: Software That Searches Inside PDF Files

Post by scoreboard » Wed Feb 27, 2019 11:18 am

I do this all the time on my mac computer. Doesn't require any special software at all. Just open the PDF and the "Search" is across the top. Super easy. I believe a PC would require special software, but I don't know what that is.

mrc
Posts: 1399
Joined: Sun Jan 10, 2016 6:39 am

Re: Software That Searches Inside PDF Files

Post by mrc » Wed Feb 27, 2019 11:18 am

The Mac OSX Finder searches for strings within PDF files. But not if the files are scanned, and are essentially images.
Macs are for those who don’t want to know why their computer works | Linux is for those who do | DOS is for those who want to know why their computer doesn’t work | Windows is for those who don’t

User avatar
ResearchMed
Posts: 9455
Joined: Fri Dec 26, 2008 11:25 pm

Re: Software That Searches Inside PDF Files

Post by ResearchMed » Wed Feb 27, 2019 11:22 am

mrc wrote:
Wed Feb 27, 2019 11:18 am
The Mac OSX Finder searches for strings within PDF files. But not if the files are scanned, and are essentially images.
If the images are scanned, and not created as pdf's from original print documents, then sometimes (not always) and depending upon the quallity of the original and of the scanning, one can use OCR (optical character recognition) types of software to *try* to "read" the scanned document.
The worse the quality, the less likely there is to be any useful search outcome, however.

There may be "latest and greatest" SW that is better; my experiences are not extremely recent.

RM
This signature is a placebo. You are in the control group.

User avatar
RickBoglehead
Posts: 5608
Joined: Wed Feb 14, 2018 9:10 am
Location: In a house

Re: Software That Searches Inside PDF Files

Post by RickBoglehead » Wed Feb 27, 2019 11:24 am

scoreboard wrote:
Wed Feb 27, 2019 11:18 am
I do this all the time on my mac computer. Doesn't require any special software at all. Just open the PDF and the "Search" is across the top. Super easy. I believe a PC would require special software, but I don't know what that is.
That isn't what the OP is asking. He doesn't want to open a document and search for a name. He wants to search a group of PDFs and search for a name, without opening them, so he can find a record that has "John Doe" in it.
Avid user of forums on variety of interests-financial, home brewing, F-150, PHEV, home repair, etc. Enjoy learning & passing on knowledge. It's PRINCIPAL, not PRINCIPLE. I ADVISE you to seek ADVICE.

Topic Author
Leesbro63
Posts: 6085
Joined: Mon Nov 08, 2010 4:36 pm

Re: Software That Searches Inside PDF Files

Post by Leesbro63 » Wed Feb 27, 2019 11:30 am

RickBoglehead wrote:
Wed Feb 27, 2019 11:24 am
scoreboard wrote:
Wed Feb 27, 2019 11:18 am
I do this all the time on my mac computer. Doesn't require any special software at all. Just open the PDF and the "Search" is across the top. Super easy. I believe a PC would require special software, but I don't know what that is.
That isn't what the OP is asking. He doesn't want to open a document and search for a name. He wants to search a group of PDFs and search for a name, without opening them, so he can find a record that has "John Doe" in it.
Yes, exactly, thank you.

scout80
Posts: 29
Joined: Wed Feb 27, 2019 11:20 am

Re: Software That Searches Inside PDF Files

Post by scout80 » Wed Feb 27, 2019 11:32 am

If searching inside a PDF you can open the PDF and hit "Ctrl-F" (on a PC) and enter the search term and click the arrow to search down the document. If you want to search multiple PDFs on a PC you have to install an iFilter which is basically a plugin for Windows indexing that allows it to read certain files. Here is a discussion of the topic: https://superuser.com/questions/402673/ ... ows-search - I would just get the Adobe one that are linked in the first response. You just have to know if your system is 32- or 64-bit.

The one caveat to searching inside PDFs, regardless of platform (Mac or PC) or whether you're scanning one document inside of Acrobat or many through the computer interface, is that the PDF needs to be text based and not image based and depending on the scanning method and the time taken by the person scanning it could be either. I might imagine that old documents would be difficult to OCR which may mean that they are image based.

User avatar
prudent
Moderator
Posts: 6977
Joined: Fri May 20, 2011 2:50 pm

Re: Software That Searches Inside PDF Files

Post by prudent » Wed Feb 27, 2019 11:33 am

If I'm understanding correctly, you intend to scan paper documents and wish to search their contents.

In order to do that you will need to put the scans through what is called "OCR" - optical character recognition. That will convert "pictures" of characters into actual text characters (at that point the text can be searched). That can be a daunting effort. There is software that will perform OCR on scanned images but it is far from perfect (and if the text is handwritten, smudged, blurry, etc., forget it).

My gut tells me what might be a better solution would be to apply "tags" to the scanned images. That process is where someone views a scanned image and labels that image with some attributes (name, cemetery location, whatever). Those tags are then searchable even though the entire content of the scanned image is not. The generic name for that type of software is Document Management System.

User avatar
billthecat
Posts: 444
Joined: Tue Jan 24, 2017 2:50 pm

Re: Software That Searches Inside PDF Files

Post by billthecat » Wed Feb 27, 2019 11:39 am

Leesbro63 wrote:
Wed Feb 27, 2019 11:13 am
I am curious if there is software that "searches" inside PDF files. In particular, I'm involved with a religious organization that owns a 100 year old cemetery. We have files on each deceased, many with multiple documents for multiple generations. Someone's grandfather might have bought plots 75 years ago and there might be information about many family members buried in those plots over the years, many with different last names. Once we scan each folder, we'd like to be able to "search" for "John Doe" and find out which files have any documents that mention him. Any suggestions are appreciated.
First, you need to OCR the PDFs since it sounds like they are scanned as images and not already OCR'd. OCR will take the images in the PDF and translate it to searchable text (the PDF looks the same, it just adds an invisible layer with the text). You don't say if you are on a Mac, Windows, or what, so here's a search for how to OCR. https://duckduckgo.com/?q=how+to+ocr&t=osx&ia=web There are many, many tools to OCR PDFs.

Second, searching is easy. On the Mac, you could just search in a window in Finder. Again, you don't say what system you're on, but on a Mac see https://support.apple.com/en-us/HT201732. Presumably Windows is similar.
We cannot direct the winds but we can adjust our sails.

User avatar
Ice-9
Posts: 1454
Joined: Wed Oct 15, 2008 12:40 pm
Location: Rockville, MD

Re: Software That Searches Inside PDF Files

Post by Ice-9 » Wed Feb 27, 2019 11:42 am

You can do this in Adobe Acrobat Reader: https://www.online-tech-tips.com/comput ... s-at-once/

Also, I believe what was mentioned earlier about Macs will work. Not just opening individual PDF files, but using the Mac OS Finder to search content.

If the PDFs are scanned rather than actual text, uploading to a OneDrive account will automatically OCR the scan (may take a couple days before that's available after upload) and you can use the search function to find all documents containing the search term.
Last edited by Ice-9 on Wed Feb 27, 2019 11:44 am, edited 1 time in total.

JohnFiscal
Posts: 757
Joined: Mon Jan 06, 2014 4:28 pm
Location: Florida

Re: Software That Searches Inside PDF Files

Post by JohnFiscal » Wed Feb 27, 2019 11:44 am

OCR was popular 20 years ago when there were more written/typed documents around; now there's less emphasis on it.

My employer (before I retired) had a service that went in-hand with one of our softwares. Industrial companies would have old such records (hand/typed) that were still in-use but these could be "new" up to 30 years or more old. We would scan their forms with our dedicated scanner and OCR software. One guy spent pretty much full time on this. He would have to "hand tune" the results from the OCR files. It was a lot of work to get these old forms into an electronic version. Maybe there are services that would do this for the OP.

As part of that project it might be worth considering creating a database of all the names, dates, relationships, plots, etc.

keithintx
Posts: 12
Joined: Mon Jan 04, 2016 1:35 pm

Re: Software That Searches Inside PDF Files

Post by keithintx » Wed Feb 27, 2019 11:50 am

You might look into X1 search to see if they include this functionality. I use it for emails.

Angst
Posts: 2099
Joined: Sat Jun 09, 2007 11:31 am

Re: Software That Searches Inside PDF Files

Post by Angst » Wed Feb 27, 2019 11:54 am

Leesbro63 wrote:
Wed Feb 27, 2019 11:30 am
RickBoglehead wrote:
Wed Feb 27, 2019 11:24 am
scoreboard wrote:
Wed Feb 27, 2019 11:18 am
I do this all the time on my mac computer. Doesn't require any special software at all. Just open the PDF and the "Search" is across the top. Super easy. I believe a PC would require special software, but I don't know what that is.
That isn't what the OP is asking. He doesn't want to open a document and search for a name. He wants to search a group of PDFs and search for a name, without opening them, so he can find a record that has "John Doe" in it.
Yes, exactly, thank you.
The scanned vs real text question has not clearly been answered to me. Anyone using Windows or a Mac can scan search through a folder full of pdf files for strings of text. It's very easy, so I assume you're talking about scanned documents, not pdfs that have literal text in them. Is this correct?
Last edited by Angst on Wed Feb 27, 2019 12:01 pm, edited 1 time in total.

User avatar
Ged
Posts: 3839
Joined: Mon May 13, 2013 1:48 pm
Location: Roke

Re: Software That Searches Inside PDF Files

Post by Ged » Wed Feb 27, 2019 11:55 am

You can use Google Drive to search for text in images.

https://www.quora.com/Does-any-image-se ... e-an-image

User avatar
RickBoglehead
Posts: 5608
Joined: Wed Feb 14, 2018 9:10 am
Location: In a house

Re: Software That Searches Inside PDF Files

Post by RickBoglehead » Wed Feb 27, 2019 12:19 pm

Angst wrote:
Wed Feb 27, 2019 11:54 am
The scanned vs real text question has not clearly been answered to me. Anyone using Windows or a Mac can scan search through a folder full of pdf files for strings of text. It's very easy, so I assume you're talking about scanned documents, not pdfs that have literal text in them. Is this correct?
I don't think you are correct. The text inside PDFs is not indexed by Windows. I just did a search for terms and not a single result is in a PDF.

This link explains how the OP can search inside a specific group of PDFs IF the words are OCR or original text.

https://www.online-tech-tips.com/comput ... s-at-once/
Avid user of forums on variety of interests-financial, home brewing, F-150, PHEV, home repair, etc. Enjoy learning & passing on knowledge. It's PRINCIPAL, not PRINCIPLE. I ADVISE you to seek ADVICE.

User avatar
billthecat
Posts: 444
Joined: Tue Jan 24, 2017 2:50 pm

Re: Software That Searches Inside PDF Files

Post by billthecat » Wed Feb 27, 2019 12:22 pm

RickBoglehead wrote:
Wed Feb 27, 2019 12:19 pm
Angst wrote:
Wed Feb 27, 2019 11:54 am
The scanned vs real text question has not clearly been answered to me. Anyone using Windows or a Mac can scan search through a folder full of pdf files for strings of text. It's very easy, so I assume you're talking about scanned documents, not pdfs that have literal text in them. Is this correct?
I don't think you are correct. The text inside PDFs is not indexed by Windows. I just did a search for terms and not a single result is in a PDF.

This link explains how the OP can search inside a specific group of PDFs IF the words are OCR or original text.

https://www.online-tech-tips.com/comput ... s-at-once/
it is indexed by Windows if your settings are correct. https://www.howtogeek.com/99406/how-to- ... ws-search/

Of course, it must have been OCR'd first.
We cannot direct the winds but we can adjust our sails.

Angst
Posts: 2099
Joined: Sat Jun 09, 2007 11:31 am

Re: Software That Searches Inside PDF Files

Post by Angst » Wed Feb 27, 2019 12:24 pm

RickBoglehead wrote:
Wed Feb 27, 2019 12:19 pm
Angst wrote:
Wed Feb 27, 2019 11:54 am
The scanned vs real text question has not clearly been answered to me. Anyone using Windows or a Mac can scan search through a folder full of pdf files for strings of text. It's very easy, so I assume you're talking about scanned documents, not pdfs that have literal text in them. Is this correct?
I don't think you are correct. The text inside PDFs is not indexed by Windows. I just did a search for terms and not a single result is in a PDF.

This link explains how the OP can search inside a specific group of PDFs IF the words are OCR or original text.

https://www.online-tech-tips.com/comput ... s-at-once/
I don't know what to say - it works fine for me. Perhaps, limit yourself to just one unique search item? E.g., I have a folder full of credit card pdf statements going back years. I search the folder for "aaa" and immediately I get a list of pdfs (2 for every year) containing my bi-annual insurance premium payment items.

User avatar
RickBoglehead
Posts: 5608
Joined: Wed Feb 14, 2018 9:10 am
Location: In a house

Re: Software That Searches Inside PDF Files

Post by RickBoglehead » Wed Feb 27, 2019 12:29 pm

billthecat wrote:
Wed Feb 27, 2019 12:22 pm
RickBoglehead wrote:
Wed Feb 27, 2019 12:19 pm
Angst wrote:
Wed Feb 27, 2019 11:54 am
The scanned vs real text question has not clearly been answered to me. Anyone using Windows or a Mac can scan search through a folder full of pdf files for strings of text. It's very easy, so I assume you're talking about scanned documents, not pdfs that have literal text in them. Is this correct?
I don't think you are correct. The text inside PDFs is not indexed by Windows. I just did a search for terms and not a single result is in a PDF.

This link explains how the OP can search inside a specific group of PDFs IF the words are OCR or original text.

https://www.online-tech-tips.com/comput ... s-at-once/
it is indexed by Windows if your settings are correct. https://www.howtogeek.com/99406/how-to- ... ws-search/

Of course, it must have been OCR'd first.
Actually, I discovered that Windows 10 64-bit has an error that needs to be fixed to do that, which I just did, and can now search PDF files with Windows 10. https://supportdownloads.adobe.com/deta ... ftpID=5542
Avid user of forums on variety of interests-financial, home brewing, F-150, PHEV, home repair, etc. Enjoy learning & passing on knowledge. It's PRINCIPAL, not PRINCIPLE. I ADVISE you to seek ADVICE.

rkhusky
Posts: 7838
Joined: Thu Aug 18, 2011 8:09 pm

Re: Software That Searches Inside PDF Files

Post by rkhusky » Wed Feb 27, 2019 12:36 pm

Note also that the creator of the pdf can choose to block searching if the appropriate security setting is selected.

User avatar
ResearchMed
Posts: 9455
Joined: Fri Dec 26, 2008 11:25 pm

Re: Software That Searches Inside PDF Files

Post by ResearchMed » Wed Feb 27, 2019 12:39 pm

rkhusky wrote:
Wed Feb 27, 2019 12:36 pm
Note also that the creator of the pdf can choose to block searching if the appropriate security setting is selected.
I didn't know this.

But IF this was selected, couldn't someone just scan it and have OCR-type SW do the searching?
If it's a clear pdf, that should work fairly well.

RM
This signature is a placebo. You are in the control group.

rich126
Posts: 1041
Joined: Thu Mar 01, 2018 4:56 pm

Re: Software That Searches Inside PDF Files

Post by rich126 » Wed Feb 27, 2019 1:09 pm

PDF is a special format that isn't easily searchable.

I've never used this tool so buyer/user beware.
https://www.lucion.com/kb-cant-search-p ... ments.html

Maybe that would help? There appears to be a free trail version.

Good luck.

User avatar
Epsilon Delta
Posts: 8090
Joined: Thu Apr 28, 2011 7:00 pm

Re: Software That Searches Inside PDF Files

Post by Epsilon Delta » Wed Feb 27, 2019 2:51 pm

There are a wide variety of open source tools to manipulate pdf files. Pdfgrep, Pdftotext, Pdfsandwich and Tesseract OCR come to mind. If that is not helpful these are not the tools you are looking for.

Post Reply