Skip to content

command dump

zmworm edited this page Jun 13, 2026 · 6 revisions

dump

Serialize a document into a replayable batch script — the round-trip mechanism for editing a document by emit → modify → replay.

Synopsis

officecli dump <file> <path> [--format batch] [-o <out>] [--json]

Description

Walks the document and emits a JSON BatchItem[] array that, when replayed via officecli batch, reconstructs the source document. Supports .docx and (since v1.0.85+) .pptx. For pptx, unsupported elements surface as warnings rather than aborting the dump.

The dump is portable: unstable IDs (paraId / rsidR / textId) and derived effective.* readbacks are filtered out. The OpenXML SDK regenerates IDs on save, so emit just stays out of the way.

Arguments

Name Type Required Default Description
file path Yes - Document path — .docx (full coverage including embedded OLE objects, floating/anchored charts, chart userShapes overlays, multi-section header/footer references, data-bound content controls, multi-paragraph SDT inlined-parts, legacy form fields, cross-paragraph field chains in table cells, picture margin-edge relative positions, text-wrapping break clear, footnote/endnote indent overrides) or .pptx (text + tables + pictures + charts + notes + theme/master/layout raw + OLE/3D/video/audio/SmartArt via add-part round-trip + morph/p14/p15 transitions + motion-path animations)
path string Yes - DOM path to dump. / emits the whole document; subtree paths emit just that subtree without bundling sibling resources. Supported: /, /body, /body/p[N], /body/tbl[N], and resource parts /theme, /settings, /numbering, /styles. Subtree emit uses last() xpath predicates so the script is safe to replay onto non-blank documents.

Options

Name Type Required Default Description
--format string No batch Output format. Currently only batch is supported.
-o / --out path No - Write output to file instead of stdout. Stdout output is the path on success.
--json bool No false Standard JSON envelope wrapper (the batch payload itself is always JSON).

What's emitted

v1.0.73 hardened the round-trip extensively: bookmarks (cross-paragraph spans), TOC fields with \t/\b switches, page-background color, hyperlink tooltip/tgtFrame/history, eastAsianLayout, paragraph-mark-only run formatting (markRPr.*), tables in headers/footers, columns + vAlign on inline section breaks, fldSimple/oMath inside hyperlinks/ins/del/footnotes, ruby/smartTag/customXml wrappers, cantSplit rows, tcW percent semantics, asymmetric tcMar padding, w:sym runs, noBreakHyphen/softHyphen, soft <w:br/> line breaks, ListItem SDT, MERGEFIELD whitespace quoting, complex-field HYPERLINKs, comment dates, PAGE field, header/footer types from sections, lineRule (atLeast/exact/auto), char-based indents, w14 ligatures/numForm/numSpacing, ins/del track-change attribution.

Layer Mechanism
/styles Emitted before body so paragraph styleId refs resolve on replay
/body paragraphs Single-run paragraphs collapse into one add p row; multi-run paragraphs split into paragraph + run child rows
Tables and mixed body content Typed add rows
Section page layout set / on the root for page width/height/margins/columns/etc.
Inline section breaks Section breaks inside the body emitted alongside their paragraph
docDefaults and document protection Emitted alongside section layout
Headers and footers Seed paragraph + appended content per-part
Comments / footnote refs / endnote refs Anchored to the body paragraphs they reference
Numbering Emitted wholesale via raw-set when document has list templates
Settings part Emitted wholesale via raw-set
Theme part Emitted wholesale via raw-set
Charts Typed add (chartType + data string) — not raw-set
Pictures Inlined as data URIs through the src= prop

Format keys are forwarded as-is; the OOXML schema reflection fallback in the Add side accepts arbitrary props, so emit doesn't need a per-key allowlist.

Examples

# Whole document to stdout
officecli dump report.docx /

# Write to a batch file
officecli dump report.docx / -o report.batch.json

# Subtree: just one paragraph
officecli dump report.docx /body/p[3]

# Subtree: a single table or a resource part
officecli dump report.docx /body/tbl[1]
officecli dump report.docx /numbering

# Round-trip: dump → batch
officecli dump report.docx / -o /tmp/r.json
officecli create rebuilt.docx --type docx
officecli batch rebuilt.docx --input /tmp/r.json

Notes

  • --out - is treated as stdout (not a file literally named -).
  • With --json, the envelope's data carries outputFile + itemCount metadata, not a bare path.
  • TOC PAGEREF page numbers are preserved on round-trip but not recalculated — run refresh afterward to update them.
  • Envelope warnings: auxiliary parts not covered by the dump emitter (e.g. unsupported pptx custom parts, docx custom XML islands) surface as warnings in the JSON envelope. Replay still succeeds; the warning tells you what won't round-trip.

See Also

  • batch — replay the emitted JSON (defaults to continue-on-error)
  • refresh — recalculate TOC / PAGE fields after replay
  • Word reference

Based on OfficeCLI v1.0.97

Clone this wiki locally