Excel's "Get Data from PDF" works on born-digital PDFs but not on scans — why?

Excel's PDF import reads the embedded text layer to find tables. Scanned PDFs have no text layer — they are images of text. Excel cannot extract values it cannot see as characters. Fix: OCR the PDF first (ScoutMyTool Make PDF Searchable, Adobe Acrobat OCR, Google Drive OCR via Open With Docs). After OCR, the PDF has an invisible text layer; Excel's importer now finds tables. The OCR step adds 1–3 seconds per page; for batches, run OCR once and use the OCR'd output for all downstream extraction.

My imported table has merged columns or split columns — how do I fix this?

Column detection in PDF tables relies on cell borders (lattice mode) or whitespace gaps (stream mode). PDFs with strong vertical lines between columns import cleanly; PDFs with whitespace-only column separation sometimes detect column boundaries wrong. Excel's import has a "Column delimiters" setting in the import preview — adjust manually if the auto-detection is wrong. For better column detection, use ScoutMyTool PDF to CSV which lets you drag column boundaries visually before extraction.

How do I preserve currency / date formatting on import?

CSV and PDF do not encode formatting — they store the value, not how it displays. A cell showing "$1,234.56" in the PDF imports as "1234.56" (numeric value, no currency symbol) into the spreadsheet. After import, apply currency or date format manually in the spreadsheet. Most spreadsheet tools auto-detect date and currency patterns in imported data and apply default format, but the dollar sign and thousands separator may not match the source. For invoice and statement imports where format matters for downstream display, apply a consistent format template across the imported sheet.

Which is best for financial-data extraction at volume — Excel, Sheets, or Numbers?

Excel for individual-file work and complex post-processing. Get Data from PDF preview lets you inspect, transform, and clean before committing the import. Power Query (built into Excel) handles iterative cleanup at scale. Sheets for collaboration after extraction. Get the data in via Drive's Open With Docs (especially for scanned PDFs where Drive OCR helps), then share for multi-user editing. Numbers for casual one-off conversions — limited compared to Excel and Sheets, but simpler UI for non-power users on Mac. For high-volume financial extraction, also consider tabula-py or Camelot (Python) — scriptable, handles edge cases the GUI tools struggle with.

Can I do PDF-to-spreadsheet entirely in the browser without uploading?

Yes for CSV output. ScoutMyTool PDF to CSV runs in your browser tab using PDF.js — extract tables to CSV, never uploading. Once the CSV is in your downloads folder, import to Excel / Sheets / Numbers as you would any CSV. The client-side path is right for financial statements, payroll exports, and other PII-bearing tabular content that should not transit through a vendor server. Sheets' direct "Open with Docs" path uploads the PDF to Google for OCR and conversion — appropriate for non-sensitive content, not for confidential financials.

Convert PDF to spreadsheet — Excel vs…

6 min read

By ScoutMyTool Editorial Team · Last updated: 2026-05-20

Tabular data trapped in a PDF is the single most common "I need to do something with this" problem in financial workflows. Excel, Google Sheets, and Apple Numbers each handle the PDF-to-spreadsheet bridge differently, and the right choice depends on whether the source is born-digital or scanned, whether you need to collaborate after extraction, and whether the content is sensitive enough to keep on-machine. This article compares the three across the dimensions that matter, plus when to reach for a dedicated table-extraction tool instead.

Excel vs Sheets vs Numbers — feature comparison

Aspect	Excel	Google Sheets	Apple Numbers
Direct PDF import	Yes — Data → Get Data → From PDF	No (use Drive Open With Docs, copy-paste)	Limited (copy-paste only)
Preserves table layout	Good — column detection works on most PDFs	Mediocre — multi-column flattens	Limited
Handles scanned PDFs	No — OCR PDF first	Yes (Drive OCR auto)	No — OCR first
Formulas in source PDF	Lost (PDF stores results, not formulas)	Lost	Lost
Collaboration after	Single-user editing or 365 co-authoring	Real-time multi-user collaboration	iCloud sharing, limited multi-user
Cost	$70/year Personal / $100 Family	Free with Google account	Free with Mac

Step by step — import financial PDF to Excel

OCR if scanned. Use Make PDF Searchable first. Skip for born-digital.
Excel: Data → Get Data → From File → From PDF. Select the PDF; preview tables.
Review the table preview. Adjust column boundaries if auto-detection is wrong. Use Power Query to clean cells if needed.
Load into a worksheet. Apply currency / date formatting per column manually after import.
Validate against source. Sum one numeric column; compare to "Total" row in source PDF; mismatch = re-check extraction.

When to use a dedicated PDF-to-CSV tool instead

Three cases. First, the PDF table has unusual layout (multi-row headers, merged cells, footnotes) that Excel\'s importer mis-parses — dedicated tools (ScoutMyTool PDF to CSV, Tabula, Camelot) give finer control over extraction rules. Second, you need to extract from many PDFs in batch — command-line tools (Camelot, tabula-py) scale to hundreds of files via script. Third, the PDF is confidential and must not leave your machine — ScoutMyTool runs client-side; Excel\'s cloud-tier PDF import may upload to Microsoft for processing.

For one-off conversion of clean financial PDFs, Excel\'s built-in importer is fast and produces good results. For complex layouts, batch processing, or confidential content, the dedicated tool is worth the extra setup. After CSV extraction either way, the data lands in whichever spreadsheet tool you prefer for analysis — Excel, Sheets, or Numbers all accept CSV equivalently.

Common cleanup tasks after extraction

Extracted PDF data almost always needs cleanup before analysis. Three issues. First, currency parsing: extracted values may include "$" and "," that spreadsheet tools treat as text rather than numbers — use TRIM and SUBSTITUTE formulas to strip non-numeric characters before SUMming. Second, date parsing: dates extracted as strings may need DATEVALUE conversion to behave as real dates in formulas. Third, wrapped cells: rows where the leading column is empty often indicate a row that wrapped onto two lines in the source — merge upward to recover the original record.

For high-volume extraction, automate the cleanup with Excel Power Query (Windows and recent Mac) or a Sheets / pandas script. Power Query saves the cleanup steps as a reusable transformation; the next time you extract a similar PDF, the same cleanup applies in one click. The setup pays back at 5+ similar extractions; below that, manual cleanup is faster than building the automation.

FAQ

Excel's "Get Data from PDF" works on born-digital PDFs but not on scans — why?: Excel's PDF import reads the embedded text layer to find tables. Scanned PDFs have no text layer — they are images of text. Excel cannot extract values it cannot see as characters. Fix: OCR the PDF first (ScoutMyTool Make PDF Searchable, Adobe Acrobat OCR, Google Drive OCR via Open With Docs). After OCR, the PDF has an invisible text layer; Excel's importer now finds tables. The OCR step adds 1–3 seconds per page; for batches, run OCR once and use the OCR'd output for all downstream extraction.
My imported table has merged columns or split columns — how do I fix this?: Column detection in PDF tables relies on cell borders (lattice mode) or whitespace gaps (stream mode). PDFs with strong vertical lines between columns import cleanly; PDFs with whitespace-only column separation sometimes detect column boundaries wrong. Excel's import has a "Column delimiters" setting in the import preview — adjust manually if the auto-detection is wrong. For better column detection, use ScoutMyTool PDF to CSV which lets you drag column boundaries visually before extraction.
How do I preserve currency / date formatting on import?: CSV and PDF do not encode formatting — they store the value, not how it displays. A cell showing "$1,234.56" in the PDF imports as "1234.56" (numeric value, no currency symbol) into the spreadsheet. After import, apply currency or date format manually in the spreadsheet. Most spreadsheet tools auto-detect date and currency patterns in imported data and apply default format, but the dollar sign and thousands separator may not match the source. For invoice and statement imports where format matters for downstream display, apply a consistent format template across the imported sheet.
Which is best for financial-data extraction at volume — Excel, Sheets, or Numbers?: Excel for individual-file work and complex post-processing. Get Data from PDF preview lets you inspect, transform, and clean before committing the import. Power Query (built into Excel) handles iterative cleanup at scale. Sheets for collaboration after extraction. Get the data in via Drive's Open With Docs (especially for scanned PDFs where Drive OCR helps), then share for multi-user editing. Numbers for casual one-off conversions — limited compared to Excel and Sheets, but simpler UI for non-power users on Mac. For high-volume financial extraction, also consider tabula-py or Camelot (Python) — scriptable, handles edge cases the GUI tools struggle with.
Can I do PDF-to-spreadsheet entirely in the browser without uploading?: Yes for CSV output. ScoutMyTool PDF to CSV runs in your browser tab using PDF.js — extract tables to CSV, never uploading. Once the CSV is in your downloads folder, import to Excel / Sheets / Numbers as you would any CSV. The client-side path is right for financial statements, payroll exports, and other PII-bearing tabular content that should not transit through a vendor server. Sheets' direct "Open with Docs" path uploads the PDF to Google for OCR and conversion — appropriate for non-sensitive content, not for confidential financials.

Citations

Microsoft — Excel "Get Data from PDF" documentation (Power Query).
Google — Sheets / Drive OCR and table import documentation.
Apple — Numbers documentation.
Tabula — open-source PDF table extraction tool.
Camelot — Python library for PDF table extraction with lattice/stream modes.

Extract PDF tables without uploading

ScoutMyTool PDF to CSV runs in your browser. Financial statements stay on your machine through extraction.

Open PDF-to-CSV tool →

Convert PDF to spreadsheet — Excel vs Google Sheets vs Numbers