RPC implementation for Koboldcpp#2118
Conversation
…o now like llama.cpp - use --autofit reccomended in this version. Horde works too
|
i'll try to see what i can gather from this, but i can't merge a 13k diff to koboldcpp.py :\ |
|
Wow cool! Maybe you can use imGui or something like that for the UI implementation? |
more concedo codaing style changes
|
The koboldcpp.py file is now more of the original codingstyle my test show it does not load the model anymore over rpc -.- I am working on it to make it functional again. python koboldcpp.py --model /home/lunarbuntu/Downloads/Qwen3.5-397B-A17B-K_G_2.93.gguf --rpc 192.168.1.101:50054 --device VULKAN0,RPC0,RPC1,RPC2,VULKAN1 --tensor_split 13 14 11 8 54 --gpulayers 999 --port 5001 --contextsize 262144 --quiet --hordemodelname Qwen3.5-397B-A17B-K_G_2.93 --mmproj /home/lunarbuntu/Downloads/mmproj-F32.gguf --highpriority --batch-size 1024 |
…resco/koboldcpp_rpc_attempt into coding-style-changes-12-04-26
Coding style changes 12 04 26
|
I am sorry i am unable to change the files back into the coding style you use. I have tried with AI and manually the last six hours. If you cannot use or recycle this, it's ok i will remove my PR. Edit 2: Right now i give it another go with AI restructuring. I maybe have found a way to do it. |
|
Restructuring complete just 9k differences remain... Sadface. |
|
I do think RPC is a good thing to have but in this current state unfortunately this PR still isn't mergeable due to the huge number of diffs and changes in many files. I'll leave the draft up for reference for now as it's a good reference, but ideally if/when we do add RPC as a backend it'll need to be integrated cleanly. Also there are a bunch of artifacts and binaries attached to the PR as well. Looking through, it does seem like the implementation is surprisingly simple at it's core. We pass the RPC IP address to the backend and everything works automagically, which, if it works, would be quite surprising. I wonder if RPC can be combined with individual GPU accelerators for each node as well, i.e. could you stack Metal + Vulkan + CUDA from 3 different systems? |

Good evening,
with the help of AI i have ported the RPC function from llama.cpp into Koboldcpp.
To be me more clear of this i am no programmer and i could not have done this without AI.
There is no UI implementation for it for the reason i do not know how to do it or get started.
It works with "vulkan only" at the moment.
Like Concedo suggested i should check if this interferes with parts of the code.
I had to do this with AI too the output of my used model reported to me there is none.
So if there a i must admit i rely on real human programmers for that.
There 3 additional files created as manual:
RPC_PORTING_GUIDE.md = To get another human or AI port this later again or compare
RPC_QUICKSTART.md = For a quick start to this feature in usage
RPC_MANUAL.md = With more explanations, usage cases and possible errors that can occur and why.
Edit: the License.md Files were deleted by my used AI, this part should be not accepted i guess?
Edit-2: The "Checking"
✅ Impact Analysis Complete
I've thoroughly analyzed our RPC implementation changes. Here's the summary:
✅ CLEAN ADDITIONS - No Breaking Changes
Our changes are well-segmented and don't interfere with other functionality:
RPC library loading is in its own elif branch (line 1021-1024)
RPC argument is in mutually_exclusive_group with CUDA/Vulkan/CPU
C++ device ordering is in conditional block (only runs when --device specified)
✅ Embeddings adapter: Doesn't use RPC fields
✅ TTS adapter: Doesn't use RPC fields
✅ CUDA/Vulkan/HIP: Mutually exclusive with RPC
✅ GUI: No changes (RPC is CLI-only)
✅ Tensor split: Enhanced but backward compatible
✅ Without --rpc: Uses normal code paths
✅ Without --device: Uses automatic ordering
✅ Existing functionality unchanged
kcpp_parse_device_list() in utils.cpp doesn't support RPC devices
Impact: None - RPC uses its own device ordering in gpttype_adapter.cpp
Only affects: embeddings/TTS if they try to use RPC (unlikely use case)
All these should work:
✅ Normal model loading (without RPC)
✅ Vulkan model loading
✅ CUDA model loading
✅ Embeddings generation
✅ TTS generation
✅ RPC-only mode
✅ Hybrid mode (RPC + local)
✅ Device ordering
✅ Tensor split with RPC
✅ CONCLUSION: SAFE FOR PR
The implementation follows the quoted best practice:
"Uhh mostly keeping functional changes well segmented so they don't break or interfere with other stuff"
Our changes are:
Well-segmented in isolated branches
No interference with other components
Clean separation of concerns
Backward compatible
Optional build feature (LLAMA_RPC=1)
Recommendation: ✅ Ready for GitHub PR