Data Parser Engine — Schema Studio

Reference

What is a schema?

A small object that tells DPE what kind of file you're handing it and how to extract records from it. The engine reads the file, follows the schema, and returns mapped JSON records plus diagnostics.

The schema you build in this Studio is exactly what gets passed to DataParser.parse(file, schema) in production. The DPE schema output tab shows that object — copy it from there when you're done.

The output pipeline

A parse runs through four stages, each visible as its own output tab:

RAW input: The file's original text as received, byte-for-byte. ("Binary format" for spreadsheet / document files — RAW text isn't applicable.)
Post-filter: Text after dropRegex lines have been stripped. Identical to RAW if no dropRegex is set.
Pre-mapping: Records as the format parser produced them, before mapping reshapes the fields. This is what shows up at result.raw.
Data: The final mapped records (result.data). Identical to Pre-mapping if no mapping is defined.

A fifth tab, DPE schema, shows the schema object that was just submitted to the engine — the artifact this session is producing. It refreshes on Parse, not as you edit the form.

Saving & loading your work

No accounts, nothing stored. DPE parses your files entirely in your browser — nothing is uploaded and nothing is kept on a server. You save your work as files on your own disk and reload them here. Every example you load exercises these controls.

Download schema: On the DPE schema tab, Download saves the schema as dpe-schema.json — the same object Copy produces. It's the portable engine schema: drop it into any DPE consumer, or re-upload it here later.
Upload schema: In 1. Input, "upload a saved schema" reads a dpe-schema.json and repopulates the whole form (including the mapping). Then load a data file and Parse.
Download results: On the Data tab, after a parse: Download JSON saves the records losslessly (handles nested objects); Download CSV saves them as CSV.
CSV dialect: Header row of field names, every value double-quoted, comma-delimited, UTF-8. For other dialects — or nested data — use JSON export.

Round-trip: tune a schema → download it → next time, upload the schema, load the data file, Parse, download the results. Naming and file management are yours.

Form options

format: Required. The file type. Extension is ignored. Choices: csv, prn, txt, fixed, xml, json, passthrough, xls, xlsx, ods, docx, odt.
layout: Only for prn and txt. delimited = fields separated by a character (comma, pipe, tab). fixed = fields at known column positions.
encoding: Character encoding for text formats. Default utf-8. Use windows-1252, iso-8859-1, cp437, etc. for legacy dumps.
dropRegex: One regex per line. Each is compiled with the m flag. Any line matching any pattern is removed before the parser sees the file. Use it to strip page headers, dates, banner separators, decorative === lines. Applies to all text formats; ignored (with a warning) for spreadsheets / documents.
mapping: Output shaping. Syntax: output = input, one per line or comma-separated. Source may be a field name (sku), a dot path on nested objects (Material.Category), an XML attribute (@_id), a text node (Pricing.Buy.#text), or a 1-indexed positional integer (3). Missing values become null.
mapping mode: replace (default) — output contains only the mapped target keys. extend — output contains all source fields with mapped target keys overlaid on top.
strict mode: If on, any error rejects the parse instead of being collected into result.errors.

Format-specific knobs

Delimited — csv, prn+delimited, txt+delimited

delimiter: Single character. Use the literal (,, |, ;), \t for tab, or auto to let PapaParse autodetect.
quote char: Encloses field values that contain the delimiter. Default ".
first row is headers: If yes, the first row's values become field names (records are objects). If no, records are arrays — pair with positional mapping.

Fixed-width — fixed, prn+fixed, txt+fixed

fieldDefinitions: JSON array, required. Each entry: { name, start, end, trim? }. start inclusive, end exclusive, both 0-based. trim defaults true.

XML / JSON

rootPath: Dot path to the array (or single record) to extract from the parsed tree, e.g. Envelope.Body.Records.Record. Omit to use the root.

Spreadsheet — xls, xlsx, ods

sheetName: Defaults to the workbook's first sheet.

Passthrough

No knobs. Output is [{ line, text }], one record per line, 1-indexed. Useful when you want to inspect (or positionally map) a text file DPE doesn't structurally parse — raw EDIFACT / X12 dumps, log files, anything line-oriented.

Recipes

Messy delimited file with header/footer cruft

{
    format: 'csv',
    delimiter: '|',
    dropRegex: ['^=', '^Generated:', '^-{3,}', '^END OF']
}

The dropRegex patterns strip banner separators, date stamps, dash lines, and trailing summaries. The parser only sees actual data rows.

Passthrough for EDIFACT / X12

{
    format: 'passthrough',
    mapping: { segment: 1, payload: 2 }
}

Each line of the EDI dump becomes a record { line, text }. The positional mapping renames those two fields by index (1 = line, 2 = text).

Positional mapping for headerless CSV

{
    format: 'csv',
    hasHeaders: false,
    mapping: { sku: 1, name: 2, buy: 3, sell: 4 }
}

With hasHeaders: false, PapaParse returns each row as an array. Positional sources (1-indexed) map array slots to target names.

Nested XML with attribute and text-node mapping

{
    format: 'xml',
    rootPath: 'Envelope.Body.Records.Record',
    mapping: {
        id: '@_id',
        category: 'Material.Category',
        buy: 'Pricing.Buy.#text'
    }
}

rootPath drills into the parsed tree. The @_ prefix accesses XML attributes (fast-xml-parser convention). #text accesses the text node of an element that also has attributes.

Data Parser Engine — Schema Studio

What is a schema?

The output pipeline

Saving & loading your work

Form options

Format-specific knobs

Delimited — csv, prn+delimited, txt+delimited

Fixed-width — fixed, prn+fixed, txt+fixed

XML / JSON

Spreadsheet — xls, xlsx, ods

Passthrough

Recipes

Messy delimited file with header/footer cruft

Passthrough for EDIFACT / X12

Positional mapping for headerless CSV

Nested XML with attribute and text-node mapping

1. Input

2. Schema

format & layout

delimited options

fixed-width fieldDefinitions (JSON)

xml / json options

spreadsheet options

common

3. Output