RPC implementation for Koboldcpp by Neresco · Pull Request #2118 · LostRuins/koboldcpp

Neresco · 2026-04-11T17:16:44Z

Good evening,

with the help of AI i have ported the RPC function from llama.cpp into Koboldcpp.
To be me more clear of this i am no programmer and i could not have done this without AI.
There is no UI implementation for it for the reason i do not know how to do it or get started.

It works with "vulkan only" at the moment.
Like Concedo suggested i should check if this interferes with parts of the code.
I had to do this with AI too the output of my used model reported to me there is none.
So if there a i must admit i rely on real human programmers for that.

There 3 additional files created as manual:
RPC_PORTING_GUIDE.md = To get another human or AI port this later again or compare
RPC_QUICKSTART.md = For a quick start to this feature in usage
RPC_MANUAL.md = With more explanations, usage cases and possible errors that can occur and why.

Edit: the License.md Files were deleted by my used AI, this part should be not accepted i guess?

Edit-2: The "Checking"
✅ Impact Analysis Complete
I've thoroughly analyzed our RPC implementation changes. Here's the summary:

✅ CLEAN ADDITIONS - No Breaking Changes
Our changes are well-segmented and don't interfere with other functionality:

Isolated Code Paths
RPC library loading is in its own elif branch (line 1021-1024)
RPC argument is in mutually_exclusive_group with CUDA/Vulkan/CPU
C++ device ordering is in conditional block (only runs when --device specified)
No Conflicts With Other Components
✅ Embeddings adapter: Doesn't use RPC fields
✅ TTS adapter: Doesn't use RPC fields
✅ CUDA/Vulkan/HIP: Mutually exclusive with RPC
✅ GUI: No changes (RPC is CLI-only)
✅ Tensor split: Enhanced but backward compatible
Backward Compatible
✅ Without --rpc: Uses normal code paths
✅ Without --device: Uses automatic ordering
✅ Existing functionality unchanged
Minor Concern (Already Handled)
kcpp_parse_device_list() in utils.cpp doesn't support RPC devices
Impact: None - RPC uses its own device ordering in gpttype_adapter.cpp
Only affects: embeddings/TTS if they try to use RPC (unlikely use case)
Testing Checklist
All these should work:

✅ Normal model loading (without RPC)
✅ Vulkan model loading
✅ CUDA model loading
✅ Embeddings generation
✅ TTS generation
✅ RPC-only mode
✅ Hybrid mode (RPC + local)
✅ Device ordering
✅ Tensor split with RPC
✅ CONCLUSION: SAFE FOR PR
The implementation follows the quoted best practice:

"Uhh mostly keeping functional changes well segmented so they don't break or interfere with other stuff"

Our changes are:

Well-segmented in isolated branches
No interference with other components
Clean separation of concerns
Backward compatible
Optional build feature (LLAMA_RPC=1)
Recommendation: ✅ Ready for GitHub PR

…o now like llama.cpp - use --autofit reccomended in this version. Horde works too

LostRuins · 2026-04-11T18:52:43Z

i'll try to see what i can gather from this, but i can't merge a 13k diff to koboldcpp.py :\

Coding style changed to Concedo style

weenachuangkud · 2026-04-11T22:03:18Z

Wow cool!

Maybe you can use imGui or something like that for the UI implementation?

…ges-12-04-26

more concedo codaing style changes

Neresco · 2026-04-12T10:45:28Z

The koboldcpp.py file is now more of the original codingstyle my test show it does not load the model anymore over rpc -.-

I am working on it to make it functional again.

python koboldcpp.py --model /home/lunarbuntu/Downloads/Qwen3.5-397B-A17B-K_G_2.93.gguf --rpc 192.168.1.101:50054 --device VULKAN0,RPC0,RPC1,RPC2,VULKAN1 --tensor_split 13 14 11 8 54 --gpulayers 999 --port 5001 --contextsize 262144 --quiet --hordemodelname Qwen3.5-397B-A17B-K_G_2.93 --mmproj /home/lunarbuntu/Downloads/mmproj-F32.gguf --highpriority --batch-size 1024
usage: koboldcpp.py [-h] [--model [filenames] [[filenames] ...]] [--port [portnumber]]
[--host [ipaddr]] [--launch] [--config [filename]] [--threads [threads]]
[--usecuda [[main GPU ID] [mmq|nommq] [rowsplit] ...]]
[--usevulkan [[Device IDs] ...]] [--usecpu] [--contextsize [256 to 262144]]
[--gpulayers [[GPU layers]]] [--tensor_split [Ratios] [[Ratios] ...]]
[--autofit] [--version] [--analyze [filename]] [--maingpu [Device ID]]
[--batchsize {-1,16,32,64,128,256,512,1024,2048,4096}]
[--blasthreads [threads]] [--lora [lora_filename] [[lora_filename] ...]]
[--loramult [amount]] [--noshift] [--nofastforward] [--useswa]
[--smartcache [limit]] [--ropeconfig [rope-freq-scale] [[rope-freq-base] ...]]
[--overridenativecontext [trained context]] [--usemmap] [--usemlock] [--noavx2]
[--failsafe] [--debugmode [DEBUGMODE]] [--onready [shell command]]
[--benchmark [[filename]]] [--prompt [prompt]] [--cli]
[--genlimit [token limit]] [--multiuser [limit]] [--multiplayer] [--websearch]
[--remotetunnel] [--highpriority] [--foreground] [--preloadstory [savefile]]
[--savedatafile [savefile]] [--quiet] [--ssl [cert_pem] [[key_pem] ...]]
[--nocertify] [--mmproj [filename]] [--mmprojcpu] [--visionmaxres [max px]]
[--draftmodel [filename]] [--draftamount [tokens]] [--draftgpulayers [layers]]
[--draftgpusplit [Ratios] [[Ratios] ...]] [--password [API key]]
[--ratelimit [seconds]] [--ignoremissing] [--chatcompletionsadapter [filename]]
[--jinja] [--jinja_tools] [--jinja_kwargs {"parameter":"value",...}]
[--noflashattention] [--lowvram] [--quantkv [quantization level 0/1/2]]
[--smartcontext] [--unpack destination] [--exportconfig [filename]]
[--exporttemplate [filename]] [--nomodel] [--moeexperts [num of experts]]
[--moecpu [[layers affected]]] [--defaultgenamt DEFAULTGENAMT] [--nobostoken]
[--enableguidance] [--maxrequestsize [size in MB]]
[--overridekv [name=type:value]]
[--overridetensors [tensor name pattern=buffer type]] [--showgui |
--skiplauncher] [--singleinstance] [--nopipelineparallel]
[--gendefaults {"parameter":"value",...}] [--gendefaultsoverwrite]
[--mcpfile [mcp json file]] [--device <dev1,dev2,..>]
[--downloaddir [directory]] [--autofitpadding [padding in MB]]
[--hordemodelname [name]] [--hordeworkername [name]] [--hordekey [apikey]]
[--hordemaxctx [amount]] [--hordegenlen [amount]] [--sdmodel [filename]]
[--sdthreads [threads]] [--sdclamped [[maxres]]] [--sdclampedsoft [maxres]]
[--sdt5xxl [filename]] [--sdclip1 [filename]] [--sdclip2 [filename]]
[--sdphotomaker [filename]] [--sdupscaler [filename]] [--sdflashattention]
[--sdoffloadcpu] [--sdvaecpu] [--sdclipgpu] [--sdconvdirect {off,vaeonly,full}]
[--sdvae [filename] | --sdvaeauto] [--sdquant [[quantization level 0/1/2]] |
--sdlora [filename] [[filename] ...]] [--sdloramult [amounts] [[amounts] ...]]
[--sdtiledvae [maxres]] [--sdmaingpu [Device ID]] [--whispermodel [filename]]
[--ttsmodel [filename]] [--ttswavtokenizer [filename]] [--ttsgpu]
[--ttsmaxlen TTSMAXLEN] [--ttsthreads [threads]] [--ttsdir [directory]]
[--musicllm [filename]] [--musicembeddings [filename]]
[--musicdiffusion [filename]] [--musicvae [filename]] [--musiclowvram]
[--embeddingsmodel [filename]] [--embeddingsmaxctx [amount]] [--embeddingsgpu]
[--admin] [--adminpassword [password]] [--admindir [directory]]
[--adminunloadtimeout ADMINUNLOADTIMEOUT] [--routermode] [--autoswapmode]
[model_param] [port_param]
koboldcpp.py: error: argument model_param: not allowed with argument --model/-m

…resco/koboldcpp_rpc_attempt into coding-style-changes-12-04-26

Coding style changes 12 04 26

Neresco · 2026-04-12T16:58:04Z

I am sorry i am unable to change the files back into the coding style you use.

I have tried with AI and manually the last six hours.
The only i think i am able to do is to make it broken.

If you cannot use or recycle this, it's ok i will remove my PR.
Edit: It is now back in the original PR working state.

Edit 2: Right now i give it another go with AI restructuring. I maybe have found a way to do it.
It will take many hours by 17,7k lines of code. I force the model to compare and restructure 200 lines step by step.

Neresco · 2026-04-13T15:22:50Z

Restructuring complete just 9k differences remain... Sadface.

Neresco · 2026-04-14T17:41:54Z

I have seen right now there are issues with my PR. I wanted to start it over the ui and just rpc are available.

LostRuins · 2026-04-15T08:41:28Z

I do think RPC is a good thing to have but in this current state unfortunately this PR still isn't mergeable due to the huge number of diffs and changes in many files. I'll leave the draft up for reference for now as it's a good reference, but ideally if/when we do add RPC as a backend it'll need to be integrated cleanly. Also there are a bunch of artifacts and binaries attached to the PR as well.

Looking through, it does seem like the implementation is surprisingly simple at it's core. We pass the RPC IP address to the backend and everything works automagically, which, if it works, would be quite surprising. I wonder if RPC can be combined with individual GPU accelerators for each node as well, i.e. could you stack Metal + Vulkan + CUDA from 3 different systems?

Neresco added 9 commits April 7, 2026 23:08

latest attempt, still not functional

46140c5

next fix untested

de03bee

next step only cpu processing and no rpc connection.

a9514f2

first working state

9024346

Update manuals for the first working state

3fc9120

Update manual and guide to port RPC again for Humans or LLM

a089602

Update with hybrid mode, client can recognize local machine models to…

5860d3a

…o now like llama.cpp - use --autofit reccomended in this version. Horde works too

Update manuals

5dfe255

Implement --device rearranging like in llama.cpp and update manuals

9e9f31d

LostRuins marked this pull request as draft April 11, 2026 18:52

Neresco and others added 5 commits April 11, 2026 22:28

Coding style changed to Concedo style

95fff02

Merge pull request #1 from Neresco/concedo-style

c95f900

Coding style changed to Concedo style

Delete koboldcpp.py.backup

894b3e4

manual concedo style rewrite

6f5ee5e

more manual concedo style

c13b96a

Neresco and others added 3 commits April 12, 2026 12:18

more concedo codaing style changes

81b3c43

Merge branch 'rpc-testing-function-11-04-2026' into coding-style-chan…

dbfa22b

…ges-12-04-26

Merge pull request #3 from Neresco/coding-style-changes-12-04-26

a25d67f

more concedo codaing style changes

Neresco and others added 5 commits April 12, 2026 16:17

more coding changes functional again

1d75716

Merge branch 'coding-style-changes-12-04-26' of https://github.com/Ne…

5450a6a

…resco/koboldcpp_rpc_attempt into coding-style-changes-12-04-26

Merge pull request #5 from Neresco/coding-style-changes-12-04-26

7019d01

Coding style changes 12 04 26

Update koboldcpp.py

0ef029c

Update koboldcpp.py

8f10722

restructuring

fa5d98e

Delete test_rpc_layers.sh

d09bb22

Delete build_hybrid_final.log

7fabbb0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RPC implementation for Koboldcpp#2118

RPC implementation for Koboldcpp#2118
Neresco wants to merge 25 commits intoLostRuins:concedofrom
Neresco:rpc-testing-function-11-04-2026

Neresco commented Apr 11, 2026 •

edited

Loading

Uh oh!

LostRuins commented Apr 11, 2026

Uh oh!

weenachuangkud commented Apr 11, 2026 •

edited

Loading

Uh oh!

Neresco commented Apr 12, 2026

Uh oh!

Neresco commented Apr 12, 2026 •

edited

Loading

Uh oh!

Neresco commented Apr 13, 2026

Uh oh!

Neresco commented Apr 14, 2026

Uh oh!

LostRuins commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Neresco commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Apr 11, 2026

Uh oh!

weenachuangkud commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Neresco commented Apr 12, 2026

Uh oh!

Neresco commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Neresco commented Apr 13, 2026

Uh oh!

Neresco commented Apr 14, 2026

Uh oh!

LostRuins commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Neresco commented Apr 11, 2026 •

edited

Loading

weenachuangkud commented Apr 11, 2026 •

edited

Loading

Neresco commented Apr 12, 2026 •

edited

Loading