Browser-native file parsing

Messy files in.
Clean records out.

DPE is a headless, schema-driven parser that turns CSV, fixed-width print-outs, spreadsheets, XML/JSON, Word & ODT documents and raw EDI into normalized JSON — entirely in the browser. One declarative schema. One call. Twelve formats.

Run the live demo ↓ Read the quick start

12: formats, one API
0: servers — data never leaves the page
~18KB: engine, deps on demand

supplier_prices.csv

=== SCRAP YARD DAILY REPORT ===
Generated: 2026-05-21 06:00:00
----------------------------------------
id|name|price|weight_lbs
SCRAP-001|Copper Wire #1|3.85|250
SCRAP-002|Copper Wire #2|3.55|180
SCRAP-003|Brass Yellow|2.45|420
SCRAP-004|Aluminum Cans|0.55|1200
----------------------------------------
END OF REPORT - 5 records

parse()

result.dataJSON

[
  { "id": "SCRAP-001", "buy": 3.85 },
  { "id": "SCRAP-002", "buy": 3.55 },
  { "id": "SCRAP-003", "buy": 2.45 },
  { "id": "SCRAP-004", "buy": 0.55 }
]

01 — Live engine

Try it on a real specimen.

No mock-ups. The panels below run the actual DPE engine in your browser, on genuinely messy supplier files. Pick a specimen, hit parse, watch the pipeline resolve it.

Pipe-delimited daily report with banner & footer cruft — stripped, then remapped to clean keys.

supplier_prices.csvraw input

loading…

schema

loading…

engine idle — press parse

Output appears here.

02 — Coverage

Twelve formats. One entry point.

The format key declares what the file is — the extension is irrelevant. DPE picks the right parser, delegates the heavy lifting to battle-tested libraries, and hands back the same normalized shape every time.

Delimited & fixed-width text

The legacy stuff: comma/pipe/tab CSVs and rigid print-outs from systems that predate JSON.

csv
prn
txt
fixed

PapaParse · column slicing

Structured

Tree formats with dot-path extraction — reach deep into a nested envelope and pull the record array out.

xml
json

fast-xml-parser · native JSON

Spreadsheets

What buyers actually email you. Pick a sheet by name or take the first; rows become objects.

xlsx
xls
ods

SheetJS

Documents

Price tables buried inside a Word or OpenDocument file. Paragraphs and table rows, extracted.

docx
odt

Mammoth · JSZip

Passthrough escape hatch

For anything DPE doesn't structurally parse — raw EDIFACT, X12, or some bespoke dump. Splits the file line-by-line into {line, text} so you can still strip envelopes and map by position. Nothing is ever a dead end.

passthrough

03 — How it works

A four-stage pipeline, fully inspectable.

Every stage of the transformation is returned to you — so when a real-world file misbehaves, you can see exactly where it went sideways.

inputText
Read

Blob, File, ArrayBuffer or string — normalized to text, honoring your encoding (windows-1252, cp437…).
→
filteredText
Strip

dropRegex kills banner lines, page headers and footers before the parser ever sees them.
→
raw
Parse

The format-specific parser produces structured records — pre-mapping, exactly as the file describes them.
→
data
Map

Rename and reshape to your target keys — by name, dot-path, or 1-indexed position.

…and meta reports the receipts: rowsInrowsOutdroppedLinesencodingmappingModedurationMs

04 — Quick start

DPE is free and MIT-licensed, hosted in the open on GitHub. Clone the repository, or grab it as a ZIP straight from that same repo page — whichever you prefer. Everything you need to wire it up ships inside the download, so the setup notes travel with the code.

There's no package to install — DPE isn't published to npm. You take the built engine out of the download and vendor it into your own project, exactly as the included guide walks you through. No CDN to depend on, no account to create, nothing phoning home. Questions, or want to talk a use case through first? That's what dpe@smisco.biz is for.

One call. Two ways to load it.

DPE ships as built artifacts — drop in the IIFE for a plain <script> page, or vendor the ESM build into your bundler.

<!-- load deps from a CDN, then the engine -->
<script src="…/papaparse.min.js"></script>
<script src="dist/dpe.iife.js"></script>

<script>
const result = await DataParser.parse(file, {
  format: 'csv',
  delimiter: '|',
  dropRegex: ['^=', '^Generated:', '^END OF'],
  mapping: { id: 'id', name: 'name', buy: 'price' }
});

console.log(result.data);   // clean, mapped rows
console.log(result.meta);   // rowsIn, rowsOut, droppedLines, durationMs…
</script>

// vendor dist/dpe.esm.js; your bundler resolves the deps
import DataParser from './vendor/dpe.esm.js';

const result = await DataParser.parse(blob, {
  format: 'xlsx',
  sheetName: 'Prices',
  mapping: { id: 'sku', buy: 'buy_price' }
});

if (result.errors.length) console.warn(result.errors);
return result.data;

What comes back

{
  data,         // mapped records
  raw,          // pre-mapping records
  inputText,    // as received
  filteredText, // after dropRegex
  errors,       // {severity, message, row?}
  meta          // diagnostics
}

The promise rejects only on unrecoverable failures — bad schema, unreadable file, a parser blowing up. Row-level hiccups land in errors and resolve normally. Flip strict: true to upgrade them to throws.

05 — Why DPE

Built for the files real suppliers send.

Stays in the browser

Client-side, start to finish. Sensitive pricing files never touch a server — there isn't one.

One declarative schema

Describe the file; don't write a parser. The same object that works in the Studio is what runs in production.

Universal mapping

Remap any format to your keys — by name, dot-path (Pricing.Buy.#text), or 1-indexed column position.

Cruft, gone

dropRegex strips the banners and page headers that pad legacy print-outs, before parsing.

Honest about failure

Warnings vs. errors, kept distinct. Three typed rejection shapes when things truly break — never a silent null.

Tiny & dependency-aware

An ~18 KB coordination layer. The heavy parsers load only for the formats you actually use.

06 — Schema Studio

Tune the schema without writing code.

Drop in a real supplier file, dial in the delimiter, columns, drop-patterns and mapping, and watch the four pipeline stages update live. When it's right, copy the exact schema object straight into your app — what you see in the Studio is what parse() receives in production.

Open Schema Studio ↗

Schema Studio →