Back to Blog
PDF ToolsOCR

How to Extract Text from Scanned PDFswith Free OCR — Step by Step

June 14, 202611 min read

If you’ve ever tried to search, copy, or edit text in a scanned PDF and found that nothing happens when you select it, you’ve run into the most common limitation of scanned documents: they’re images, not text. This guide explains how OCR (Optical Character Recognition) solves that problem, when you need it, and how to convert any scanned PDF into a searchable, selectable document for free.

Why a Scanned PDF Isn’t Really "Text"

When you scan a paper document — or take a photo of it with your phone and save it as a PDF — what you get is a picture of the page, wrapped inside a PDF container. To your computer, that page is no different from a photo of a sunset: a grid of pixels with no concept of letters, words, or paragraphs.

This creates real problems. You can’t press Ctrl+F to find a clause in a scanned contract. You can’t copy a paragraph to paste into an email. Screen readers used by visually impaired users can’t read the content aloud. And if you need to translate the document or run it through any text-processing tool, you’re stuck retyping everything by hand.

OCR (Optical Character Recognition) solves this by analyzing the shapes on each page, recognizing them as letters and numbers, and reconstructing the actual text — turning a "picture of words" into real, searchable, selectable words.

How OCR Works (In Plain Terms)

Modern OCR engines go through a few stages for each page of your document:

1. Pre-processing

The image is cleaned up — straightened if it was scanned at an angle, contrast adjusted, and noise removed so text stands out clearly from the background.

2. Layout Analysis

The engine identifies which regions of the page contain text, images, or tables, and determines the reading order — important for multi-column documents.

3. Character Recognition

Each text region is broken into individual characters and words, which are matched against trained models to determine the most likely letters and numbers.

4. Text Layer Output

The recognized text is placed into an invisible layer positioned exactly over the original image, so the page still looks like a scan but now contains real, selectable text underneath.

Step-by-Step: Make Your Scanned PDF Searchable

  1. 1

    Open our OCR tool and upload your scanned PDF — this can be a document you scanned, downloaded, or photographed with your phone and converted to PDF.

  2. 2

    The tool analyzes each page and runs OCR locally in your browser, identifying text regions and recognizing characters.

  3. 3

    Review the recognized text. For most printed documents — invoices, reports, books, forms — recognition accuracy is very high; for low-quality scans or unusual fonts, spot-check a few pages.

  4. 4

    Download the new searchable PDF. It looks identical to the original but now has a text layer behind every page, so Ctrl+F and copy-paste work as expected.

  5. 5

    If you need the raw text separately (for example, to paste into a document or translation tool), use the extracted text output alongside the searchable PDF.

You can also use our Scanned PDF to Searchable PDF tool, which is built specifically for converting batches of scanned pages into a single searchable document.

Common Situations Where OCR Saves the Day

Old Scanned Records

Digitizing years of paper archives — contracts, invoices, certificates — into a searchable digital library.

Photographed Receipts & Invoices

Turn phone photos of receipts into text you can extract line items, totals, and dates from for expense tracking.

Research Papers & Books

Make scanned academic papers or out-of-print books searchable and quotable for research.

Government & Legal Forms

Search through scanned filings, applications, or court documents for specific clauses or names.

Foreign Language Documents

Extract text from scanned foreign-language documents so it can be pasted into a translation tool.

Accessibility

Add a text layer so screen readers can read scanned documents aloud for visually impaired users.

Tips for Getting the Best OCR Results

Scan at a Higher Resolution

OCR accuracy improves significantly with resolution. If you’re scanning fresh documents, use at least 300 DPI — low-resolution scans (especially under 150 DPI) can cause characters to blur together and reduce accuracy.

Straighten Skewed Pages

A page scanned at an angle is much harder for OCR to read. If your scanner doesn’t auto-straighten, try our PDF Rotate tool to correct orientation before running OCR.

Use Good Lighting for Phone Photos

If you’re photographing a document instead of scanning it, avoid shadows and glare. Even lighting and a flat, well-lit surface dramatically improve OCR accuracy on phone-captured pages.

Check Multi-Column Layouts

Newspapers, academic papers, and brochures with multiple columns can sometimes have their reading order mixed up. Spot-check the extracted text order on these layouts after OCR.

Pro Tips for Working with Scanned Documents

  • OCR before you sign. If you need to find a specific clause to sign or initial, run OCR first so you can search the document, then use our PDF Sign tool.
  • Compress after OCR. Searchable PDFs can be slightly larger due to the added text layer — run the result through PDF Compress if file size matters.
  • Combine multiple scans first. If you scanned a multi-page document as separate files, use PDF Merge to combine them before running OCR — it's faster than processing each file separately.
  • Keep both versions. Store the OCR'd searchable version for daily use, but keep the original scan as your archival copy.
  • Don't expect perfect handwriting recognition. If your documents include handwritten notes or signatures, expect those areas to need manual review even after OCR.

A scanned PDF doesn’t have to stay a dead-end image file. With free, browser-based OCR, you can turn any scanned document into a searchable, selectable, accessible PDF in minutes — no software installation, no per-page fees, and no file ever leaving your device.