splitmarks

Split a PDF file at top-level bookmarks into separate PDF files, named after each bookmark.

Installation

Download (recommended)

Download the standalone executable for your platform from the latest release:

Windows: splitmarks.exe
macOS: splitmarks
Linux: splitmarks

On macOS/Linux, make it executable after downloading:

chmod +x splitmarks

Install from source

Requires Python 3.10+:

pip install git+https://github.com/jet52/splitmarks.git

Or clone and install in development mode:

git clone https://github.com/jet52/splitmarks.git
cd splitmarks
pip install -e .

Usage

splitmarks input.pdf [-o OUTPUT_DIR] [-m MATCH] [-v|-vv] [--dry-run] [--no-clobber] [--version]

Arguments

Argument	Description
`input_pdf`	PDF file to split
`-o, --output-dir DIR`	Output directory (default: current directory)
`-m, --match TEXT`	Only extract bookmarks containing TEXT (case-insensitive)
`-v`	Show progress (page counts, bookmark counts)
`-vv`	Also show nested bookmark tree for each output file
`--dry-run`	Preview splits without creating files
`--no-clobber`	Avoid collisions: prepend case number from filename, or auto-increment from 00000000
`--version`	Show version number and exit

Examples

Preview what files would be created:

splitmarks document.pdf --dry-run

Split a PDF into the current directory:

splitmarks document.pdf

Split into a specific directory with verbose output:

splitmarks document.pdf -o ./split_files -v

Extract only bookmarks containing "Memo":

splitmarks document.pdf --match Memo

Extract all briefs (case-insensitive matching):

splitmarks document.pdf -m brief -o ./briefs

Preview with full bookmark tree:

splitmarks document.pdf --dry-run -vv

Batch extract memos from multiple PDFs, avoiding filename collisions:

for f in ./packets/*.pdf; do
  splitmarks "$f" --match Memo --no-clobber -o ./memos
done

How It Works

Opens the PDF and reads its bookmark outline
Splits at top-level bookmarks (each becomes a separate file)
Calculates page ranges for each section (from one bookmark to the next)
Creates a separate PDF file for each section, named after the bookmark title
Preserves nested bookmarks within each split file
Removes unreferenced resources (images, fonts) so each file contains only what its pages need

Filename Handling

Bookmark titles are sanitized for use as filenames:

Spaces and unsafe characters (/\:*?"<>|) are replaced with hyphens
Unicode is normalized
Long names are truncated at word boundaries (max 200 chars)
Duplicate names get a counter: Title.pdf, Title-1.pdf, Title-2.pdf
With --no-clobber: case number prefix uses underscore: 20250390_Bench-Memo.pdf

Requirements

Standalone executables: No dependencies required.

Install from source: Python 3.10+ and pikepdf >= 8.0.0

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
splitmarks.py		splitmarks.py
splitmarks.spec		splitmarks.spec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

splitmarks

Installation

Download (recommended)

Install from source

Usage

Arguments

Examples

How It Works

Filename Handling

Requirements

About

Uh oh!

Releases 3

Packages

Contributors 2

Uh oh!

Languages

jet52/splitmarks

Folders and files

Latest commit

History

Repository files navigation

splitmarks

Installation

Download (recommended)

Install from source

Usage

Arguments

Examples

How It Works

Filename Handling

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Uh oh!

Languages

Packages