freeimages.com / Ginae B. McDonald

OCR scanned PDF files – a must in the researcher’s toolkit!!

Sometimes when downloading sources from databases, particularly older articles and documents, I have found that they have been scanned and uploaded. The ability to highlight or search through the document is reduced or removed due to the fact that the PDF file has been created from a series of images of documents.

OCR (Optical Character Recognition) software can¬†concert these ‘pictures’ of documents in to PDF files that can then allow the highlighting of the text.

Adobe Acrobat Pro has been awesome, but it’s expensive. There are other tools such as ABBYY Finereader, and a number of free upload-and-convert-to-text-or-Word-doc style sites. Evernote and Onenote have the capability to read PDFs so that if you need to search for terms it can locate those terms within the PDF – however, they do not perform OCR on the document and your ability to highlight text is still limited. Essentially for me, Acrobat Pro has been my go-to text recognition software for PDFs, largely because it leaves the PDF in-tact and I can then import the PDF in to Sente. It’s a key step in my process.

Why is this so important?

Personally, I have three main reasons: […]