Skip to content

splits PDF files on top-level bookmarks, preserving nested bookmarks in output files. can extract only selected bookmarks

Notifications You must be signed in to change notification settings

jet52/splitmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

splitmarks

Split a PDF file at top-level bookmarks into separate PDF files, named after each bookmark.

Installation

Download (recommended)

Download the standalone executable for your platform from the latest release:

  • Windows: splitmarks.exe
  • macOS: splitmarks
  • Linux: splitmarks

On macOS/Linux, make it executable after downloading:

chmod +x splitmarks

Install from source

Requires Python 3.10+:

pip install git+https://github.com/jet52/splitmarks.git

Or clone and install in development mode:

git clone https://github.com/jet52/splitmarks.git
cd splitmarks
pip install -e .

Usage

splitmarks input.pdf [-o OUTPUT_DIR] [-m MATCH] [-v|-vv] [--dry-run] [--no-clobber] [--version]

Arguments

Argument Description
input_pdf PDF file to split
-o, --output-dir DIR Output directory (default: current directory)
-m, --match TEXT Only extract bookmarks containing TEXT (case-insensitive)
-v Show progress (page counts, bookmark counts)
-vv Also show nested bookmark tree for each output file
--dry-run Preview splits without creating files
--no-clobber Avoid collisions: prepend case number from filename, or auto-increment from 00000000
--version Show version number and exit

Examples

Preview what files would be created:

splitmarks document.pdf --dry-run

Split a PDF into the current directory:

splitmarks document.pdf

Split into a specific directory with verbose output:

splitmarks document.pdf -o ./split_files -v

Extract only bookmarks containing "Memo":

splitmarks document.pdf --match Memo

Extract all briefs (case-insensitive matching):

splitmarks document.pdf -m brief -o ./briefs

Preview with full bookmark tree:

splitmarks document.pdf --dry-run -vv

Batch extract memos from multiple PDFs, avoiding filename collisions:

for f in ./packets/*.pdf; do
  splitmarks "$f" --match Memo --no-clobber -o ./memos
done

How It Works

  1. Opens the PDF and reads its bookmark outline
  2. Splits at top-level bookmarks (each becomes a separate file)
  3. Calculates page ranges for each section (from one bookmark to the next)
  4. Creates a separate PDF file for each section, named after the bookmark title
  5. Preserves nested bookmarks within each split file
  6. Removes unreferenced resources (images, fonts) so each file contains only what its pages need

Filename Handling

Bookmark titles are sanitized for use as filenames:

  • Spaces and unsafe characters (/\:*?"<>|) are replaced with hyphens
  • Unicode is normalized
  • Long names are truncated at word boundaries (max 200 chars)
  • Duplicate names get a counter: Title.pdf, Title-1.pdf, Title-2.pdf
  • With --no-clobber: case number prefix uses underscore: 20250390_Bench-Memo.pdf

Requirements

Standalone executables: No dependencies required.

Install from source: Python 3.10+ and pikepdf >= 8.0.0

About

splits PDF files on top-level bookmarks, preserving nested bookmarks in output files. can extract only selected bookmarks

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages