Browser-native file parsing

Messy files in.
Clean records out.

DPE is a headless, schema-driven parser that turns CSV, fixed-width print-outs, spreadsheets, XML/JSON, Word & ODT documents and raw EDI into normalized JSON — entirely in the browser. One declarative schema. One call. Twelve formats.

12
formats, one API
0
servers — data never leaves the page
~18KB
engine, deps on demand

01 — Live engine

Try it on a real specimen.

No mock-ups. The panels below run the actual DPE engine in your browser, on genuinely messy supplier files. Pick a specimen, hit parse, watch the pipeline resolve it.

Pipe-delimited daily report with banner & footer cruft — stripped, then remapped to clean keys.

supplier_prices.csvraw input
loading…
schema
loading…
engine idle — press parse
Output appears here.

02 — Coverage

Twelve formats. One entry point.

The format key declares what the file is — the extension is irrelevant. DPE picks the right parser, delegates the heavy lifting to battle-tested libraries, and hands back the same normalized shape every time.

Delimited & fixed-width text

The legacy stuff: comma/pipe/tab CSVs and rigid print-outs from systems that predate JSON.

  • csv
  • prn
  • txt
  • fixed

PapaParse · column slicing

Structured

Tree formats with dot-path extraction — reach deep into a nested envelope and pull the record array out.

  • xml
  • json

fast-xml-parser · native JSON

Spreadsheets

What buyers actually email you. Pick a sheet by name or take the first; rows become objects.

  • xlsx
  • xls
  • ods

SheetJS

Documents

Price tables buried inside a Word or OpenDocument file. Paragraphs and table rows, extracted.

  • docx
  • odt

Mammoth · JSZip

Passthrough escape hatch

For anything DPE doesn't structurally parse — raw EDIFACT, X12, or some bespoke dump. Splits the file line-by-line into {line, text} so you can still strip envelopes and map by position. Nothing is ever a dead end.

  • passthrough

03 — How it works

A four-stage pipeline, fully inspectable.

Every stage of the transformation is returned to you — so when a real-world file misbehaves, you can see exactly where it went sideways.

  1. inputText

    Read

    Blob, File, ArrayBuffer or string — normalized to text, honoring your encoding (windows-1252, cp437…).

  2. filteredText

    Strip

    dropRegex kills banner lines, page headers and footers before the parser ever sees them.

  3. raw

    Parse

    The format-specific parser produces structured records — pre-mapping, exactly as the file describes them.

  4. data

    Map

    Rename and reshape to your target keys — by name, dot-path, or 1-indexed position.

…and meta reports the receipts: rowsInrowsOutdroppedLinesencodingmappingModedurationMs

04 — Quick start

DPE is free and MIT-licensed, hosted in the open on GitHub. Clone the repository, or grab it as a ZIP straight from that same repo page — whichever you prefer. Everything you need to wire it up ships inside the download, so the setup notes travel with the code.

There's no package to install — DPE isn't published to npm. You take the built engine out of the download and vendor it into your own project, exactly as the included guide walks you through. No CDN to depend on, no account to create, nothing phoning home. Questions, or want to talk a use case through first? That's what dpe@smisco.biz is for.

One call. Two ways to load it.

DPE ships as built artifacts — drop in the IIFE for a plain <script> page, or vendor the ESM build into your bundler.

<!-- load deps from a CDN, then the engine -->
<script src="…/papaparse.min.js"></script>
<script src="dist/dpe.iife.js"></script>

<script>
const result = await DataParser.parse(file, {
  format: 'csv',
  delimiter: '|',
  dropRegex: ['^=', '^Generated:', '^END OF'],
  mapping: { id: 'id', name: 'name', buy: 'price' }
});

console.log(result.data);   // clean, mapped rows
console.log(result.meta);   // rowsIn, rowsOut, droppedLines, durationMs…
</script>
// vendor dist/dpe.esm.js; your bundler resolves the deps
import DataParser from './vendor/dpe.esm.js';

const result = await DataParser.parse(blob, {
  format: 'xlsx',
  sheetName: 'Prices',
  mapping: { id: 'sku', buy: 'buy_price' }
});

if (result.errors.length) console.warn(result.errors);
return result.data;

05 — Why DPE

Built for the files real suppliers send.

Stays in the browser

Client-side, start to finish. Sensitive pricing files never touch a server — there isn't one.

One declarative schema

Describe the file; don't write a parser. The same object that works in the Studio is what runs in production.

Universal mapping

Remap any format to your keys — by name, dot-path (Pricing.Buy.#text), or 1-indexed column position.

Cruft, gone

dropRegex strips the banners and page headers that pad legacy print-outs, before parsing.

Honest about failure

Warnings vs. errors, kept distinct. Three typed rejection shapes when things truly break — never a silent null.

Tiny & dependency-aware

An ~18 KB coordination layer. The heavy parsers load only for the formats you actually use.

06 — Schema Studio

Tune the schema without writing code.

Drop in a real supplier file, dial in the delimiter, columns, drop-patterns and mapping, and watch the four pipeline stages update live. When it's right, copy the exact schema object straight into your app — what you see in the Studio is what parse() receives in production.

Schema Studio interface Schema Studio →