Why is getting a table out of a PDF so awkward?

Because a PDF does not store a "table" — it stores characters placed at x/y coordinates on a page, with lines drawn separately. There is usually no underlying grid telling software which text belongs in which cell; the table is something your eye assembles from positioning. Extraction tools have to infer the rows and columns from the spacing and rule lines, which works well on clean, ruled tables and struggles on dense, borderless, or irregular ones. That inference step is why the same tool can nail one PDF and mangle another, and why verifying the output matters. Born-digital PDFs extract far better than scans, which have no text at all until OCR'd.

Should I export to CSV or Excel?

CSV is the universal lowest common denominator: plain text, comma-separated, opens in Excel, Google Sheets, and any script or database importer. It carries only raw values — no formatting, no multiple sheets, no formulas — which is exactly what you want when you are going to import the data somewhere else. XLSX (Excel format) preserves cells across multiple sheets and can carry formatting, so it is better when you want a ready-to-use workbook rather than raw data. A simple rule: choose CSV if the data is heading into another system, XLSX if a person will work in it as a spreadsheet. You can always open a CSV in Excel and save as XLSX later.

How do merged cells and multi-page tables come across?

These are the two classic failure points. Merged or spanning header cells confuse row/column detection, so a header that spans three columns may land in one cell with the others blank — check and fix headers first. Tables that continue across pages often get the header repeated on each page, or get split into separate chunks; you usually need to stitch the pages into one table and drop the repeated headers. Clean, single-page, fully-ruled tables extract almost perfectly; the more a table relies on visual grouping rather than explicit grid lines, the more cleanup you should expect afterward.

Why are my numbers showing up as text or with the wrong format?

Extraction pulls the characters as they appear, so a value like "$1,234.50" or "1.234,50" (European format) comes across as a text string, not a number, and a spreadsheet will not sum it until it is converted. After import, set the column types, strip currency symbols and thousands separators, and watch for locale differences in decimal and thousands marks. Dates are similar — "03/04/2026" is ambiguous and may import as text or as the wrong date. Plan a quick post-import pass to coerce numbers and dates to real types; it is faster than fighting the extractor to format them perfectly.

Can I extract tables from a scanned PDF?

Not directly — a scan is an image with no text layer, so there is nothing to extract until you OCR it. Run OCR first to recognise the text, then extract the table; accuracy depends on scan quality and the table's complexity. Expect to verify numbers carefully, since OCR most often misreads exactly the digits and decimal points that matter in a data table. For a critical scanned table, OCR plus a careful manual check against the original is the safe path — never trust an unverified OCR-then-extract of financial or scientific data.

How do I verify the extracted data is correct?

Always reconcile against the source before you rely on it. Check the row and column counts match the PDF, spot-check several cells (especially around merged headers and page breaks), and if the table has totals, confirm the extracted numbers still sum to them — a column that no longer adds up is the fastest way to catch a shifted or dropped value. For wide tables, verify the first and last columns made it. Treating extraction as "extract, then verify" rather than "extract and trust" is what keeps a convenience step from quietly introducing errors into your analysis.

Is it safe to convert a confidential PDF table online?

Financial statements, customer lists, and internal data are sensitive, so prefer a tool that processes files locally. ScoutMyTool extracts tables to CSV/Excel entirely in your browser tab, so the document never leaves your machine. For anything you would not publish openly, confirm the tool does not upload before using it, or use an offline tool.

How to convert a PDF to a spreadsheet…

6 min read

By ScoutMyTool Editorial Team · Last updated: 2026-05-21

I once retyped a 200-row financial table out of a PDF by hand because I assumed there was no other way — and introduced three transcription errors that took longer to find than the typing took. There was a better way, of course: extract the table straight into a spreadsheet. The catch is that PDF tables fight back, because a PDF does not really store a table at all — it stores text at coordinates, and the grid is something you infer by eye. This guide explains how to get tables out of a PDF into CSV, Excel, or Google Sheets reliably: choosing the right format, handling merged cells and multi-page tables, fixing the number formats that always come across wrong, and verifying the data so you never ship a column that no longer adds up.

CSV, Excel, or Sheets — which target

Format	Opens in	Keeps	Use when
CSV	Excel, Sheets, anything	Raw values only	Maximum portability, import anywhere
XLSX (Excel)	Excel, Sheets	Cells, sheets, formatting	You want a ready-to-use workbook
Google Sheets	Sheets (import CSV/XLSX)	Cells; formulas re-applied	Collaboration in the cloud
TSV	Excel, Sheets, scripts	Raw values, tab-delimited	Data contains commas
Multiple CSVs	Anything	One file per table/page	Several distinct tables

Step by step — PDF table to spreadsheet

Confirm the PDF has real text. Try selecting text in the table. If you cannot, it is a scan — OCR it first with PDF OCR before extracting, and plan to verify the numbers.
Pick your target format. CSV for importing elsewhere, Excel for a ready-to-use workbook. Convert with PDF to CSV or PDF to Excel.
Fix headers and merged cells first. Spanning headers confuse row/column detection; correct the header row before touching the data so everything below lines up.
Stitch multi-page tables. If a table runs across pages, combine the chunks into one and remove the repeated header rows so you have a single clean table. See extracting complex tables.
Coerce number and date types. Strip currency symbols and thousands separators, set column types, and check decimal/locale formats so values become real numbers your spreadsheet can sum.
Verify against the source. Match row/column counts, spot-check cells near merges and page breaks, and confirm any column totals still add up — a broken total is the fastest sign of a shifted value.
Save or import. Keep the CSV/XLSX, or import the CSV into Google Sheets for collaboration. Archive the source PDF so you can re-extract if needed.

PDF to Excel: the workbook-focused conversion.
PDF to spreadsheet: the broader overview.
Extracting complex tables: merged cells and multi-page tables.
Extract invoice data: a common structured-data case.
PDF to CSV tool: extract tables in your browser.
PDF to Excel tool: get a ready workbook.
All ScoutMyTool PDF tools: the full toolkit.

FAQ

Why is getting a table out of a PDF so awkward?: Because a PDF does not store a "table" — it stores characters placed at x/y coordinates on a page, with lines drawn separately. There is usually no underlying grid telling software which text belongs in which cell; the table is something your eye assembles from positioning. Extraction tools have to infer the rows and columns from the spacing and rule lines, which works well on clean, ruled tables and struggles on dense, borderless, or irregular ones. That inference step is why the same tool can nail one PDF and mangle another, and why verifying the output matters. Born-digital PDFs extract far better than scans, which have no text at all until OCR'd.
Should I export to CSV or Excel?: CSV is the universal lowest common denominator: plain text, comma-separated, opens in Excel, Google Sheets, and any script or database importer. It carries only raw values — no formatting, no multiple sheets, no formulas — which is exactly what you want when you are going to import the data somewhere else. XLSX (Excel format) preserves cells across multiple sheets and can carry formatting, so it is better when you want a ready-to-use workbook rather than raw data. A simple rule: choose CSV if the data is heading into another system, XLSX if a person will work in it as a spreadsheet. You can always open a CSV in Excel and save as XLSX later.
How do merged cells and multi-page tables come across?: These are the two classic failure points. Merged or spanning header cells confuse row/column detection, so a header that spans three columns may land in one cell with the others blank — check and fix headers first. Tables that continue across pages often get the header repeated on each page, or get split into separate chunks; you usually need to stitch the pages into one table and drop the repeated headers. Clean, single-page, fully-ruled tables extract almost perfectly; the more a table relies on visual grouping rather than explicit grid lines, the more cleanup you should expect afterward.
Why are my numbers showing up as text or with the wrong format?: Extraction pulls the characters as they appear, so a value like "$1,234.50" or "1.234,50" (European format) comes across as a text string, not a number, and a spreadsheet will not sum it until it is converted. After import, set the column types, strip currency symbols and thousands separators, and watch for locale differences in decimal and thousands marks. Dates are similar — "03/04/2026" is ambiguous and may import as text or as the wrong date. Plan a quick post-import pass to coerce numbers and dates to real types; it is faster than fighting the extractor to format them perfectly.
Can I extract tables from a scanned PDF?: Not directly — a scan is an image with no text layer, so there is nothing to extract until you OCR it. Run OCR first to recognise the text, then extract the table; accuracy depends on scan quality and the table's complexity. Expect to verify numbers carefully, since OCR most often misreads exactly the digits and decimal points that matter in a data table. For a critical scanned table, OCR plus a careful manual check against the original is the safe path — never trust an unverified OCR-then-extract of financial or scientific data.
How do I verify the extracted data is correct?: Always reconcile against the source before you rely on it. Check the row and column counts match the PDF, spot-check several cells (especially around merged headers and page breaks), and if the table has totals, confirm the extracted numbers still sum to them — a column that no longer adds up is the fastest way to catch a shifted or dropped value. For wide tables, verify the first and last columns made it. Treating extraction as "extract, then verify" rather than "extract and trust" is what keeps a convenience step from quietly introducing errors into your analysis.
Is it safe to convert a confidential PDF table online?: Financial statements, customer lists, and internal data are sensitive, so prefer a tool that processes files locally. ScoutMyTool extracts tables to CSV/Excel entirely in your browser tab, so the document never leaves your machine. For anything you would not publish openly, confirm the tool does not upload before using it, or use an offline tool.

Citations

IETF RFC 4180 — “Common Format and MIME Type for Comma-Separated Values (CSV) Files.” datatracker.ietf.org/doc/html/rfc4180
Wikipedia — “Comma-separated values,” the portable plain-text table format. en.wikipedia.org/wiki/Comma-separated_values
Wikipedia — “Office Open XML” (ISO/IEC 29500), the standard behind the .xlsx Excel format. en.wikipedia.org/wiki/Office_Open_XML

Get your data out — without retyping

Extract PDF tables straight to CSV or Excel with ScoutMyTool’s in-browser tools — your financial and customer data never leaves your machine.

Open PDF to CSV →

How to convert a PDF to a spreadsheet (CSV, Excel, Sheets)

Introduction

CSV, Excel, or Sheets — which target

Step by step — PDF table to spreadsheet

Related reading and tools

FAQ

Citations

Get your data out — without retyping