Delimited & fixed-width text
The legacy stuff: comma/pipe/tab CSVs and rigid print-outs from systems that predate JSON.
PapaParse · column slicing
Browser-native file parsing
DPE is a headless, schema-driven parser that turns CSV, fixed-width print-outs, spreadsheets, XML/JSON, Word & ODT documents and raw EDI into normalized JSON — entirely in the browser. One declarative schema. One call. Twelve formats.
01 — Live engine
No mock-ups. The panels below run the actual DPE engine in your browser, on genuinely messy supplier files. Pick a specimen, hit parse, watch the pipeline resolve it.
Pipe-delimited daily report with banner & footer cruft — stripped, then remapped to clean keys.
loading…
loading…
02 — Coverage
The format key declares what the file is — the extension is irrelevant. DPE picks the right parser, delegates the heavy lifting to battle-tested libraries, and hands back the same normalized shape every time.
The legacy stuff: comma/pipe/tab CSVs and rigid print-outs from systems that predate JSON.
PapaParse · column slicing
Tree formats with dot-path extraction — reach deep into a nested envelope and pull the record array out.
fast-xml-parser · native JSON
What buyers actually email you. Pick a sheet by name or take the first; rows become objects.
SheetJS
Price tables buried inside a Word or OpenDocument file. Paragraphs and table rows, extracted.
Mammoth · JSZip
For anything DPE doesn't structurally parse — raw EDIFACT, X12, or some bespoke dump. Splits the file line-by-line into {line, text} so you can still strip envelopes and map by position. Nothing is ever a dead end.
03 — How it works
Every stage of the transformation is returned to you — so when a real-world file misbehaves, you can see exactly where it went sideways.
Blob, File, ArrayBuffer or string — normalized to text, honoring your encoding (windows-1252, cp437…).
dropRegex kills banner lines, page headers and footers before the parser ever sees them.
The format-specific parser produces structured records — pre-mapping, exactly as the file describes them.
Rename and reshape to your target keys — by name, dot-path, or 1-indexed position.
04 — Quick start
DPE is free and MIT-licensed, hosted in the open on GitHub. Clone the repository, or grab it as a ZIP straight from that same repo page — whichever you prefer. Everything you need to wire it up ships inside the download, so the setup notes travel with the code.
There's no package to install — DPE isn't published to npm. You take the built engine out of the download and vendor it into your own project, exactly as the included guide walks you through. No CDN to depend on, no account to create, nothing phoning home. Questions, or want to talk a use case through first? That's what dpe@smisco.biz is for.
DPE ships as built artifacts — drop in the IIFE for a plain <script> page, or vendor the ESM build into your bundler.
<!-- load deps from a CDN, then the engine -->
<script src="…/papaparse.min.js"></script>
<script src="dist/dpe.iife.js"></script>
<script>
const result = await DataParser.parse(file, {
format: 'csv',
delimiter: '|',
dropRegex: ['^=', '^Generated:', '^END OF'],
mapping: { id: 'id', name: 'name', buy: 'price' }
});
console.log(result.data); // clean, mapped rows
console.log(result.meta); // rowsIn, rowsOut, droppedLines, durationMs…
</script>
// vendor dist/dpe.esm.js; your bundler resolves the deps
import DataParser from './vendor/dpe.esm.js';
const result = await DataParser.parse(blob, {
format: 'xlsx',
sheetName: 'Prices',
mapping: { id: 'sku', buy: 'buy_price' }
});
if (result.errors.length) console.warn(result.errors);
return result.data;
05 — Why DPE
Client-side, start to finish. Sensitive pricing files never touch a server — there isn't one.
Describe the file; don't write a parser. The same object that works in the Studio is what runs in production.
Remap any format to your keys — by name, dot-path (Pricing.Buy.#text), or 1-indexed column position.
dropRegex strips the banners and page headers that pad legacy print-outs, before parsing.
Warnings vs. errors, kept distinct. Three typed rejection shapes when things truly break — never a silent null.
An ~18 KB coordination layer. The heavy parsers load only for the formats you actually use.
06 — Schema Studio
Drop in a real supplier file, dial in the delimiter, columns, drop-patterns and mapping, and watch the four pipeline stages update live. When it's right, copy the exact schema object straight into your app — what you see in the Studio is what parse() receives in production.