Skip to content

logicossoftware/ts-mdocx

Repository files navigation

ts-mdocx

npm version npm downloads License: MIT

TypeScript implementation of the MDOCX (MarkDown Open Container eXchange) file format.

MDOCX is a single-file container format for bundling Markdown documents with referenced binary media (images, audio, video, etc.), suitable for exchange, archival, and transport.

Features

  • 📦 Full MDOCX v1 support - Read and write MDOCX files per the RFC specification
  • 🌐 Browser & Node.js - Works in both environments with platform-specific builds
  • 🗜️ Multiple compression formats - ZIP, ZSTD, LZ4, Brotli, or uncompressed
  • 🔗 Media reference resolution - Resolve mdocx://media/<ID> URIs and relative paths
  • 📋 Media listing - List media items without decoding markdown contents
  • Validation - Detailed validation with configurable options
  • 🛠️ Builder API - Fluent interface for constructing documents
  • 💻 CLI tool - Extract, create, validate, and inspect MDOCX files (Node.js only)
  • 📝 TypeScript-first - Full type definitions with JSDoc documentation

Installation

npm install ts-mdocx

Or with Bun:

bun add ts-mdocx

Requires Node.js 20+ or Bun 1.0+. Also works in modern browsers.

Quick Start

Reading an MDOCX file

import { readFile } from 'node:fs/promises';
import { readMdocx } from 'ts-mdocx';

const bytes = await readFile('document.mdocx');
const doc = await readMdocx(bytes);

console.log('Metadata:', doc.metadata);
console.log('Markdown files:', doc.markdown.files.length);
console.log('Media items:', doc.media.items.length);

// Access markdown content
for (const file of doc.markdown.files) {
  const text = new TextDecoder().decode(file.content);
  console.log(`${file.path}: ${text.substring(0, 100)}...`);
}

Reading with Bun

Bun can run TypeScript directly and has built-in file APIs:

import { readMdocx } from 'ts-mdocx';

const bytes = await Bun.file('document.mdocx').bytes();
const doc = await readMdocx(bytes);

console.log('Title:', doc.metadata?.title);
for (const file of doc.markdown.files) {
  console.log(`${file.path}: ${new TextDecoder().decode(file.content)}`);
}

Using in a Browser

The library works in modern browsers. Bundlers (Vite, webpack, esbuild) automatically use the browser build.

import { readMdocx, writeMdocxAsync, createBuilder } from 'ts-mdocx';

// Read from a File input
async function handleFileUpload(file: File) {
  const buffer = await file.arrayBuffer();
  const bytes = new Uint8Array(buffer);
  const doc = await readMdocx(bytes);
  
  console.log('Loaded document with', doc.markdown.files.length, 'files');
  return doc;
}

// Create and download a document
async function downloadDocument() {
  const doc = createBuilder()
    .title('Browser Document')
    .addMarkdown('readme.md', '# Hello from the browser!')
    .build();

  // Use writeMdocxAsync in browsers (required for SHA256)
  const bytes = await writeMdocxAsync(doc.markdown, doc.media, {
    metadata: doc.metadata,
    markdownCompression: 'zip'
  });

  // Trigger download
  const blob = new Blob([bytes], { type: 'application/octet-stream' });
  const url = URL.createObjectURL(blob);
  const a = document.createElement('a');
  a.href = url;
  a.download = 'document.mdocx';
  a.click();
  URL.revokeObjectURL(url);
}

Note: In browsers, always use writeMdocxAsync() instead of writeMdocx(). The synchronous version requires Node.js crypto APIs for SHA256 computation.

Writing an MDOCX file

import { writeFile } from 'node:fs/promises';
import { writeMdocxAsync, createBuilder } from 'ts-mdocx';

// Using the builder API
const doc = createBuilder()
  .title('My Document')
  .description('A sample MDOCX file')
  .root('docs/index.md')
  .addMarkdown('docs/index.md', '# Welcome\n\nThis is the main page.')
  .addMarkdown('docs/chapter1.md', '# Chapter 1\n\nContent here...')
  .addMedia('logo', logoBytes, { mimeType: 'image/png', path: 'assets/logo.png' })
  .build();

// Write with ZSTD compression (recommended)
const bytes = await writeMdocxAsync(doc.markdown, doc.media, {
  metadata: doc.metadata,
  markdownCompression: 'zstd',
  mediaCompression: 'zstd'
});

await writeFile('output.mdocx', bytes);

Simple document creation

import { createSimpleDocument, writeMdocx } from 'ts-mdocx';

const doc = createSimpleDocument(
  'readme.md',
  '# Hello World\n\nThis is a simple document.',
  { title: 'Hello', author: 'Jane Doe' }
);

const bytes = writeMdocx(doc.markdown, doc.media, {
  metadata: doc.metadata,
  markdownCompression: 'zip'
});

API Reference

Reading

readMdocx(bytes, limits?)

Read an MDOCX file from bytes.

const doc = await readMdocx(bytes);
// Returns: MdocxDocument

Parameters:

  • bytes: Uint8Array - The MDOCX file contents
  • limits?: ReadLimits - Optional size limits for security

Returns: Promise<MdocxDocument>

listMediaContents(bytes, limits?)

List media items from an MDOCX file without decoding markdown contents.

import { listMediaContents } from 'ts-mdocx';

const items = await listMediaContents(bytes);
for (const item of items) {
  console.log(item.id, item.mimeType, item.data.byteLength);
}

Parameters:

  • bytes: Uint8Array - The MDOCX file contents
  • limits?: ReadLimits - Optional size limits for security

Returns: Promise<MediaItem[]>

Writing

writeMdocx(markdown, media, options?)

Write an MDOCX file synchronously.

const bytes = writeMdocx(markdownBundle, mediaBundle, {
  metadata: { title: 'My Doc' },
  markdownCompression: 'zip',
  mediaCompression: 'zip',
  autoPopulateSha256: true  // Default: true
});

writeMdocxAsync(markdown, media, options?)

Write an MDOCX file asynchronously. Preferred when using ZSTD compression.

const bytes = await writeMdocxAsync(markdownBundle, mediaBundle, {
  markdownCompression: 'zstd'
});

Options:

  • metadata?: MdocxMetadata - Document metadata
  • markdownCompression?: MdocxCompression - Compression for markdown section (default: 'zip')
  • mediaCompression?: MdocxCompression - Compression for media section (default: 'zip')
  • autoPopulateSha256?: boolean - Auto-compute SHA256 for media items (default: true)

Builder API

createBuilder()

Create a new document builder with fluent API.

const doc = createBuilder()
  .title('Document Title')
  .description('Description')
  .root('index.md')
  .addMarkdown('index.md', '# Hello')
  .addMarkdown('chapter1.md', '# Chapter 1', { 
    mediaRefs: ['image1'],
    attributes: { author: 'John' }
  })
  .addMedia('image1', imageBytes, {
    mimeType: 'image/png',
    path: 'images/photo.png'
  })
  .build();

createSimpleDocument(path, content, metadata?)

Quick creation of a single-file document.

const doc = createSimpleDocument('readme.md', '# Hello', { title: 'Readme' });

MarkdownBundleBuilder

Build markdown bundles directly.

const bundle = new MarkdownBundleBuilder()
  .root('main.md')
  .addFile('main.md', '# Main')
  .addFile('sub.md', '# Sub', { attributes: { key: 'value' } })
  .build();

MediaBundleBuilder

Build media bundles with auto-generated IDs.

const bundle = new MediaBundleBuilder()
  .addItem('logo', logoBytes, { mimeType: 'image/png' })
  .addFromPath('assets/icon.svg', iconBytes)  // Auto-generates ID from path
  .build();

Validation

validateMdocx(doc)

Simple validation returning error messages.

const errors = validateMdocx(doc);
if (errors.length > 0) {
  console.error('Validation failed:', errors);
}

validateMdocxDetailed(doc, options?)

Detailed validation with structured results.

const result = validateMdocxDetailed(doc, {
  verifyHashes: true,
  checkPaths: true,
  checkDuplicates: true,
  warnOnMissingOptional: true,
  includeInfo: false
});

console.log('Valid:', result.valid);
console.log('Errors:', result.errorCount);
console.log('Warnings:', result.warningCount);

for (const issue of result.issues) {
  console.log(`${issue.severity}: ${issue.message} [${issue.path}]`);
}

isValidMdocx(doc)

Quick validity check.

if (isValidMdocx(doc)) {
  // Document is valid
}

Media Reference Resolution

MediaResolver

Resolve media references within a document.

const resolver = new MediaResolver(doc);

// Get by ID
const item = resolver.getById('logo');

// Get by path
const item = resolver.getByPath('assets/logo.png');

// Resolve any reference
const item = resolver.resolve('mdocx://media/logo');
const item = resolver.resolve('assets/logo.png', fromMarkdownFile);

// Check existence
if (resolver.hasId('logo')) { ... }

// Get all media referenced by a markdown file
const refs = resolver.getReferencedMedia(markdownFile);

parseMediaReference(ref)

Parse a reference string.

const ref = parseMediaReference('mdocx://media/logo');
// { type: 'id', id: 'logo' }

const ref = parseMediaReference('assets/image.png');
// { type: 'path', path: 'assets/image.png' }

extractMediaReferences(content)

Extract all media references from markdown content.

const refs = extractMediaReferences(markdownContent);
// ['mdocx://media/logo', 'assets/image.png', ...]

Compression

MDOCX supports multiple compression algorithms:

Algorithm ID Best For
'none' Raw gob bytes Debugging, already-compressed content
'zip' ZIP/DEFLATE Maximum interoperability
'zstd' Zstandard Recommended default - best speed/ratio
'lz4' LZ4 Maximum speed
'br' Brotli Maximum compression ratio

ZSTD Initialization

ZSTD compression requires async initialization:

import { initZstd, isZstdCompressionAvailable } from 'ts-mdocx';

// Manual initialization
await initZstd();

// Or use writeMdocxAsync which auto-initializes
const bytes = await writeMdocxAsync(markdown, media, {
  markdownCompression: 'zstd'
});

Browser Support

The library provides a separate browser build that uses web-native APIs:

Feature Node.js Browser
SHA256 node:crypto (sync) Web Crypto API (async)
Brotli node:zlib (sync) brotli-wasm (async)
ZIP fflate fflate
ZSTD fzstd / @bokuweb/zstd-wasm Same
LZ4 lz4js lz4js

Browser considerations:

  • Use writeMdocxAsync() instead of writeMdocx() for auto-populating SHA256 hashes
  • All compression formats work in browsers
  • The CLI is Node.js-only
  • Bundlers automatically select the browser build via package.json exports

CLI Usage

# Using npm script
npm run mdocx -- <command> [options] <file>

# Or if installed globally
mdocx <command> [options] <file>

Commands

validate <file.mdocx>

Validate an MDOCX file.

mdocx validate document.mdocx
mdocx validate --verbose document.mdocx

info <file.mdocx>

Display document information.

mdocx info document.mdocx

Output:

MDOCX Info: document.mdocx
──────────────────────────────────────────────────
Version:          1
Valid:            ✓ Yes
Has Metadata:     Yes
Title:            My Document
Root Path:        docs/index.md

Markdown Files (3):
  • docs/index.md (1.2 KB)
  • docs/guide.md (856 B)
  • docs/reference.md (2.1 KB)

Media Items (2):
  • logo [image/png] (15.3 KB)
  • diagram [image/svg+xml] (8.7 KB)

Totals:
  Markdown:       4.16 KB
  Media:          24.00 KB
  File Size:      12.50 KB

list <file.mdocx>

List files in the container.

mdocx list document.mdocx

extract <file.mdocx> <output-dir>

Extract contents to a directory.

mdocx extract document.mdocx ./output

create <output.mdocx> <files...>

Create an MDOCX file from markdown files.

mdocx create output.mdocx readme.md chapter1.md chapter2.md
mdocx create --compression zstd output.mdocx docs/*.md

dump <file.mdocx>

Dump document structure as JSON.

mdocx dump document.mdocx > structure.json

Options

  • --help, -h - Show help
  • --version, -v - Show version
  • --compression <type> - Compression type: zip, zstd, lz4, br, none
  • --verbose - Verbose output

Types

Core Types

interface MdocxDocument {
  header: MdocxHeader;
  metadata?: MdocxMetadata;
  markdown: MarkdownBundle;
  media: MediaBundle;
}

interface MarkdownBundle {
  bundleVersion: number;  // Must be 1
  rootPath?: string;
  files: MarkdownFile[];
}

interface MarkdownFile {
  path: string;
  content: Uint8Array;
  mediaRefs?: string[];
  attributes?: Record<string, string>;
}

interface MediaBundle {
  bundleVersion: number;  // Must be 1
  items: MediaItem[];
}

interface MediaItem {
  id: string;
  path?: string;
  mimeType?: string;
  data: Uint8Array;
  sha256?: Uint8Array;
  attributes?: Record<string, string>;
}

type MdocxCompression = 'none' | 'zip' | 'zstd' | 'lz4' | 'br';

RFC Compliance

This implementation follows the MDOCX RFC specification v1.0:

  • ✅ Fixed 32-byte header with magic, version, flags
  • ✅ Optional UTF-8 JSON metadata block
  • ✅ Markdown bundle section (gob-encoded)
  • ✅ Media bundle section (gob-encoded)
  • ✅ All compression algorithms (ZIP, ZSTD, LZ4, Brotli)
  • ✅ Uncompressed length prefix for compressed payloads
  • ✅ SHA256 integrity hashes for media items
  • mdocx://media/<ID> URI scheme support
  • ✅ Configurable read limits for security

Security

The library implements security measures per RFC §11:

  • Size limits - Configurable limits on metadata, section, and uncompressed sizes
  • Decompression bomb protection - Enforces strict bounds during decompression
  • Path validation - Rejects absolute paths and .. segments
  • Gob decoding limits - Bounded readers prevent excessive allocations

Default limits:

  • Metadata: 1 MiB
  • Markdown section uncompressed: 256 MiB
  • Media section uncompressed: 2 GiB

Development

# Install dependencies
npm install
# or
bun install

# Build
npm run build
# or
bun run build

# Run tests
npm test
# or
bun test

# Watch mode
npm run test:watch

# Lint
npm run lint

Running with Bun

Bun can run TypeScript source files directly without building:

# Run CLI directly from source
bun run src/cli.ts info document.mdocx

# Run tests with Bun's test runner
bun test

License

MIT

Changelog

See CHANGELOG.md.

Related Projects

  • go-mdocx - Reference Go implementation

About

MarkDown Open Container eXchange (MDOCX) File Format for TypeScript

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors