Split a PDF file at top-level bookmarks into separate PDF files, named after each bookmark.
Download the standalone executable for your platform from the latest release:
- Windows:
splitmarks.exe - macOS:
splitmarks - Linux:
splitmarks
On macOS/Linux, make it executable after downloading:
chmod +x splitmarksRequires Python 3.10+:
pip install git+https://github.com/jet52/splitmarks.gitOr clone and install in development mode:
git clone https://github.com/jet52/splitmarks.git
cd splitmarks
pip install -e .splitmarks input.pdf [-o OUTPUT_DIR] [-m MATCH] [-v|-vv] [--dry-run] [--no-clobber] [--version]
| Argument | Description |
|---|---|
input_pdf |
PDF file to split |
-o, --output-dir DIR |
Output directory (default: current directory) |
-m, --match TEXT |
Only extract bookmarks containing TEXT (case-insensitive) |
-v |
Show progress (page counts, bookmark counts) |
-vv |
Also show nested bookmark tree for each output file |
--dry-run |
Preview splits without creating files |
--no-clobber |
Avoid collisions: prepend case number from filename, or auto-increment from 00000000 |
--version |
Show version number and exit |
Preview what files would be created:
splitmarks document.pdf --dry-runSplit a PDF into the current directory:
splitmarks document.pdfSplit into a specific directory with verbose output:
splitmarks document.pdf -o ./split_files -vExtract only bookmarks containing "Memo":
splitmarks document.pdf --match MemoExtract all briefs (case-insensitive matching):
splitmarks document.pdf -m brief -o ./briefsPreview with full bookmark tree:
splitmarks document.pdf --dry-run -vvBatch extract memos from multiple PDFs, avoiding filename collisions:
for f in ./packets/*.pdf; do
splitmarks "$f" --match Memo --no-clobber -o ./memos
done- Opens the PDF and reads its bookmark outline
- Splits at top-level bookmarks (each becomes a separate file)
- Calculates page ranges for each section (from one bookmark to the next)
- Creates a separate PDF file for each section, named after the bookmark title
- Preserves nested bookmarks within each split file
- Removes unreferenced resources (images, fonts) so each file contains only what its pages need
Bookmark titles are sanitized for use as filenames:
- Spaces and unsafe characters (
/\:*?"<>|) are replaced with hyphens - Unicode is normalized
- Long names are truncated at word boundaries (max 200 chars)
- Duplicate names get a counter:
Title.pdf,Title-1.pdf,Title-2.pdf - With
--no-clobber: case number prefix uses underscore:20250390_Bench-Memo.pdf
Standalone executables: No dependencies required.
Install from source: Python 3.10+ and pikepdf >= 8.0.0