Zum Hauptinhalt springen

PDF Number Extractor

Highlight QA serials, BOM IDs, and inspection numbers locally with WASM parsing and CSV export.

Open utility

pdf-number-extractor utility screenshot

What it does​

Drop PDF files, filter the detected numbers with regex or length rules, and export a clean list of matches. Parsing happens in-browser with WASM so QA serials, BOM IDs, and inspection numbers never leave your machine.

When to use it (and when not to)​

Use it when:

  • You need to pull serial numbers or IDs out of production PDFs for QA or traceability.
  • You want to pre-fill spreadsheets with inspection IDs without retyping.
  • You prefer an offline workflow with no uploads or queues.

Avoid it when:

  • Your PDFs are scanned images that require OCR rather than text parsing.
  • You need full-text search across thousands of pages at once.

Inputs and outputs​

Inputs​

InputDescription
PDF filesSingle or multiple PDFs dropped into the browser.
Regex or filter rulesInclude or exclude patterns to isolate the numbers you need.
Minimum/maximum lengthOptional constraints to avoid noise such as page numbers.
Export optionsChoose CSV export or copy to clipboard.

Outputs​

OutputFormatNotes
Highlighted matchesOn-screenVisual overlay of matched numbers for quick QA.
CSV exportCSVList of extracted numbers per file ready for spreadsheets.
Clipboard copyTextQuick copy of matches without downloading a file.

How to use​

  1. Drag and drop the PDF files into the utility.
  2. Add regex or length filters to target the IDs you need.
  3. Review the highlighted matches on-screen.
  4. Export the results as CSV or copy them to your clipboard.
  5. Clear the session when finished; nothing is uploaded.

Example dataset: Upload an inspection report containing IDs like SN-2025-104; use a regex such as SN-\d4-\d3 and export the CSV.

  • Expected output: Produces a CSV listing SN-2025-104 style matches plus an on-screen highlight for visual confirmation.

Accuracy and verification​

  • Works on text-based PDFs; scanned images need OCR elsewhere.
  • Filters are case-sensitive only if you set them that way; double-check patterns before exporting.
  • No network calls are made, so keep a local copy of your results if you need an audit trail.
  • Spot-check a few lines against the PDF to ensure the regex did not over-filter.

FAQ​

  • Does it OCR scanned PDFs? No. It reads existing text; use OCR first if the PDF is an image.
  • Can I process multiple files? Yes. Drop a stack of PDFs and export combined results.
  • Is any data uploaded? No. Parsing happens locally in your browser.
  • Can I tune the regex? Yes. Standard JavaScript regex syntax is supported.
  • Can I save the highlights? Export the CSV and keep the PDF copy; highlights are for on-screen review only.

Changelog​

  • Initial documentation.

Feedback / bug report​

  • Open a GitHub issue
  • Email or DM with the slug pdf-number-extractor so we can reproduce the issue
LinkedInGitHub