6 min read
How to extract numerical data from PDF charts and graphs
By ScoutMyTool Editorial Team ยท Last updated: 2026-05-22
Introduction
People ask how to โextract the data from this chartโ expecting a button that reads a graph back into numbers, and the honest answer is that one does not exist โ because a chart in a PDF is a picture of the data, not the data itself. The numbers that drew it are not stored in the file. But you can still recover them, by the right route: find the underlying data table or source dataset (exact), read on-chart data labels (exact), or, as a last resort, digitize the points off the axes (approximate). This guide explains why charts resist extraction, the realistic ways to get the numbers, and how to verify what you recover before trusting it.
Routes to the numbers, by accuracy
| Route | Accuracy | When |
|---|---|---|
| Find the underlying data table | Exact | Paper/report includes a data table or appendix |
| Get the source dataset | Exact | Authors/source publish the data |
| Digitize points off the axes | Approximate | Only the chart image exists |
| Read values from labels | Exact (labeled) | Chart has data labels on points/bars |
Step by step โ recover chart data
- Look for a data table first. Scan the document and any appendix for the underlying figures; if present, extract that table to a spreadsheet with PDF to CSV (see extracting complex tables) โ exact data, done.
- Seek the source dataset. For research or reports, the data may be published or available from the authors โ exact and far better than reading the chart.
- Read on-chart data labels. If values are printed on bars or points, read them (OCR with PDF OCR can help on a low-res chart) โ these are exact.
- Isolate the chart if digitizing. Extract the chart image with Extract Images or render the page with PDF to PNG for a clean, high-resolution image to work from.
- Digitize as a last resort. Establish the axis scale from reference points, then read each data pointโs position โ accepting the result is approximate.
- Verify against the chart. Plot your recovered numbers and confirm they reproduce the chartโs shape and any shown totals; cross-check a few points.
- Label the provenance. Mark digitized data as approximate so downstream users know it was read off a graph, not sourced exactly.
Related reading and tools
- PDF to spreadsheet: extract a data table exactly.
- Extracting complex tables: the table behind the chart.
- Extract images from a PDF: isolate the chart to digitize.
- Academic research workflow: sourcing the underlying dataset.
- Extract footnotes: another inference-based extraction.
- PDF to CSV tool: extract the data table in your browser.
- All ScoutMyTool PDF tools: the full toolkit.
FAQ
- Can I extract the exact data from a chart in a PDF?
- Usually not directly โ and this is the key thing to understand. A chart in a PDF is a picture (or vector drawing) of the data, not the data itself; the numbers that produced it are generally not stored in the file. So there is no "extract the data" button that reads a chart back into numbers, the way there is for a data table. What you can do is recover the data by other means: find a data table or appendix that accompanies the chart, obtain the source dataset, read values from on-chart data labels if present, or โ when only the image exists โ digitize the points by reading them off the axes, which is approximate. Set expectations accordingly: a chart is the answer drawn, not the data stored.
- What is the most accurate way to get the numbers?
- Find the actual data rather than reverse-engineering the picture. Many reports and papers include the underlying figures โ a data table, an appendix, or supplementary files โ alongside the chart; if so, extract those (they are exact). For published research, the source dataset is often available from the authors or a repository. If the chart has data labels printed on the bars or points, read those โ they are exact too. Only when none of that exists, and you have just the image, do you fall back to digitizing, which estimates values from pixel positions and is inherently approximate. Always prefer the real numbers over reading them off a graph.
- How does "digitizing" a chart work, and how accurate is it?
- Digitizing means estimating data values from the chart image: you establish the axis scale (by marking known reference points on each axis), then read each data point's position and convert it to a value using that scale. Specialised chart-digitizer tools assist this. Accuracy is limited by the chart's resolution, how precisely you can place points, and the chart type โ a clean line or scatter plot digitizes reasonably; a cramped or low-resolution one less so. The result is an estimate, fine for approximate re-analysis or recreating a trend, but not a substitute for exact source data. Treat digitized values as approximate and label them as such.
- Does it matter whether the chart is vector or a raster image?
- For getting the numbers, not much โ neither stores the source data in a readily-extractable way. A vector chart is drawn from shape instructions (lines, points) and a raster chart is pixels, but in both cases the file has the visual, not a table of values behind it. (In rare cases a vector chart's coordinates could in principle be reverse-engineered, but this is unreliable and not how charts are normally recovered.) So the routes are the same regardless: prefer the underlying data, read labels, or digitize the image. If the chart is a low-resolution raster, OCR can at least help read printed data labels and axis numbers.
- What if the chart sits next to a data table in the PDF?
- Then you are in luck โ extract the table, not the chart. Many documents show a chart and its data table together (or in an appendix), and the table holds the exact numbers. Extract it to a spreadsheet as you would any PDF table, and verify against the chart visually. This is by far the best outcome and worth checking for first: scan the document for a corresponding table before attempting to read the chart. A surprising number of "extract data from this chart" tasks are really "extract this nearby table" tasks once you look.
- How do I verify the data I recovered?
- Sanity-check against the chart: do your recovered values reproduce the chart's shape (the same peaks, trends, and relative magnitudes)? If a total or known value is shown, confirm your numbers match it. For digitized data, cross-check a few points you can read confidently against your estimates. For an extracted data table, verify the table reproduces the chart. The point is that recovered chart data โ especially digitized โ is error-prone, so plot it back or compare it to the original before using it in analysis. Recovered-and-verified beats recovered-and-trusted, particularly when the numbers feed a decision.
- Is it safe to process a confidential document's charts online?
- Prefer a tool that processes files locally if the document is confidential. ScoutMyTool extracts tables, images, and text, and OCRs, entirely in your browser tab, so the document never leaves your machine. For anything you would not publish openly, confirm the tool does not upload before using it.
Citations
- Wikipedia โ โChart,โ on how charts visually represent (rather than store) data. en.wikipedia.org/wiki/Chart
- Wikipedia โ โData extraction,โ the general practice of recovering data from documents. en.wikipedia.org/wiki/Data_extraction
- Wikipedia โ โPDFโ (ISO 32000), the format that stores the chart image, not its source data. en.wikipedia.org/wiki/PDF
Get the real numbers, not a guess
Extract the underlying data table or isolate the chart with ScoutMyToolโs in-browser tools โ your document never leaves your machine.
Open PDF to CSV โ