Local AI music generation server with browser UI, powered by GGML. Describe a song, get stereo 48kHz audio. Runs on CPU, CUDA, Metal, Vulkan.
Grab one GGUF of each type from Hugging Face and drop them in the models/ folder:
https://huggingface.co/Serveurperso/ACE-Step-1.5-GGUF/tree/main
| Type | Pick one | Size |
|---|---|---|
| LM | acestep-5Hz-lm-4B-Q8_0.gguf | 4.2 GB |
| Text encoder | Qwen3-Embedding-0.6B-Q8_0.gguf | 748 MB |
| DiT | acestep-v15-turbo-Q8_0.gguf | 2.4 GB |
| VAE | vae-BF16.gguf (always this one) | 322 MB |
Three LM sizes available: 0.6B (fast), 1.7B, 4B (best quality). Multiple DiT variants: turbo (8 steps), sft (50 steps, higher quality), base, shift1, shift3, continuous.
Alternative: ./models.sh downloads the default set automatically (needs pip install hf).
git clone --recurse-submodules https://github.com/Serveurperso/acestep.cpp
cd acestep.cpp
Pre-built binaries (until CI is set up): https://www.serveurperso.com/temp/acestep.cpp-win64/
To build from source, install Visual C++ Build Tools (select "Desktop development with C++" workload) and optionally the CUDA Toolkit and/or the Vulkan SDK.
buildcuda.cmd # NVIDIA GPU
buildvulkan.cmd # AMD/Intel GPU (Vulkan)
buildall.cmd # all backends (CUDA + Vulkan + CPU, runtime loading)./buildcuda.sh # NVIDIA GPU
./buildvulkan.sh # AMD/Intel GPU (Vulkan)
./buildcpu.sh # CPU only (with BLAS)
./buildall.sh # all backends (CUDA + Vulkan + CPU, runtime loading)macOS auto-enables Metal and Accelerate BLAS with any of the above.
./server.sh # Linux / macOS
server.cmd # WindowsOpen http://localhost:8085 in your browser. The WebUI handles everything: write a caption, set lyrics and metadata, generate, play, and download tracks.
Models are loaded on first request (zero GPU at startup) and swapped automatically when you pick a different one in the UI.
Drop LoRA adapters in the loras/ folder and restart the server.
Supports PEFT directories and ComfyUI single .safetensors files.
Select the active LoRA from the WebUI.
--models <dir> Model directory (required)
--loras <dir> LoRA adapters directory
--host <addr> Listen address (default: 127.0.0.1)
--port <N> Listen port (default: 8080)
--max-batch <N> LM batch limit 1-9 (default: 1)
--vae-chunk <N> VAE tile size (default: 256, lower = less VRAM)
--mp3-bitrate <N> MP3 kbps (default: 128)
API endpoints
The server exposes three POST endpoints and two GET endpoints:
POST /lm - Generate lyrics and audio codes from a caption. Returns JSON.
POST /synth - Render audio codes into MP3 or WAV (?wav=1).
Accepts JSON or multipart (with source audio for cover/repaint modes).
POST /understand - Reverse pipeline: audio in, metadata + lyrics + codes out. Accepts multipart (audio file) or JSON (codes-only).
GET /health - Returns {"status":"ok"}.
GET /props - Available models, server config, default parameters.
See docs/ARCHITECTURE.md for the full API reference and AceRequest JSON specification.
CLI tools (advanced)
For scripting without the server, ace-lm and ace-synth work as a pipe:
# LM generates lyrics + codes
./build/ace-lm \
--request /tmp/request.json \
--lm models/acestep-5Hz-lm-4B-Q8_0.gguf
# DiT + VAE render to audio
./build/ace-synth \
--request /tmp/request0.json \
--embedding models/Qwen3-Embedding-0.6B-Q8_0.gguf \
--dit models/acestep-v15-turbo-Q8_0.gguf \
--vae models/vae-BF16.ggufSee docs/ARCHITECTURE.md for the full JSON reference, task types, batching, and understand pipeline.
docs/ARCHITECTURE.md covers the complete AceRequest JSON reference, all task types (text2music, cover, repaint, lego, extract, complete), FSM constrained decoding, custom GGML operators, quantization, and architecture internals.
- A Musician's Guide - non-technical guide for music makers
- Tutorial - design philosophy, model architecture, input control, inference hyperparameters
GGML.mp4
DiT-Only-SFT.mp4
ProcessJellyfin.mp4
Instrumental.mp4
House-IA.mp4
Independent C++ implementation based on ACE-Step 1.5 by ACE Studio and StepFun. All model weights are theirs, this is just a native backend.
@misc{gong2026acestep,
title={ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation},
author={Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
howpublished={\url{https://github.com/ace-step/ACE-Step-1.5}},
year={2026},
note={GitHub repository}
}