Skip to content

[CPP_RPC] Implement tracker into the tvm_rpc utility#19830

Closed
cbalint13 wants to merge 1 commit into
apache:mainfrom
cbalint13:tvm_rpc
Closed

[CPP_RPC] Implement tracker into the tvm_rpc utility#19830
cbalint13 wants to merge 1 commit into
apache:mainfrom
cbalint13:tvm_rpc

Conversation

@cbalint13

@cbalint13 cbalint13 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Implements the tracker service into the existing tvm_cpp tool.
This is a standalone & portable alternative to the current python only tvm.exec.rpc_tracker counterpart.


Implementation details

  • Implementation follows the TVM tracker full specifications, including the mandatory priority scheduling
  • It can be considered experimental due to limited tests done involving a setup with two remote edge nodes

Brief usage:

$ ./tvm_rpc
{...} Command line usage
 server       - Start the server
   {...}
 tracker      - Start the tracker
  --host        - The listen adddress of the RPC tracker, Default=0.0.0.0 (any)
  --port        - The port of the RPC tracker, Default=9190
  --port-end    - The end search port of the RPC tracker, Default=9199
  --silent      - Whether to run in silent mode. Default=False

  Examples
  ./tvm_rpc server --host=0.0.0.0 --port=9090 --port-end=9099 --tracker=127.0.0.1:9190 --key=rasp

  ./tvm_rpc tracker --host=0.0.0.0 --port=9190 --port-end=9199

Live example:

$ ./tvm_rpc tracker
[11:37:03] {...}/main.cc:136: host        = 0.0.0.0
[11:37:03] {...}/main.cc:137: port        = 9190
[11:37:03] {...}/main.cc:138: port_end    = 9199
[11:37:03] {...}/main.cc:139: silent      = False
[11:37:03] {...}/main.cc:378: Starting CPP Tracker, Press Ctrl+C to stop.
[11:37:03] {...}/rpc_tracker.cc:231: Bind to 0.0.0.0:9190

<-- a remote edge [192.168.1.10] tvm_rpc server having 'opi5' key connects to tracker

[11:37:06] {...}/rpc_tracker.cc:265: New session from 192.168.1.10:46392
[11:37:06] {...}/rpc_tracker.cc:315: Handshake with 192.168.1.10:46392 successful
[11:37:06] {...}/rpc_tracker.cc:345: Received key 'server:opi5' from 192.168.1.10:46392

<-- a tuner process [127.0.0.1] launches multiple peering sessions using 'opi5' key
---> tracker always responds the tuner with any available sessions from its prio-scheduler

[11:38:18] {...}/rpc_tracker.cc:265: New session from 127.0.0.1:38706
[11:38:18] {...}/rpc_tracker.cc:315: Handshake with 127.0.0.1:38706 successful
[11:38:18] {...}/rpc_tracker.cc:341: Request using key 'opi5' from 127.0.0.1:38706
[11:38:18] {...}/rpc_tracker.cc:419: Offering matchkey 'opi5:0.692392@192.168.1.10:9000' to 127.0.0.1:38706
[11:38:18] {...}/rpc_tracker.cc:357: End session with 127.0.0.1:38706

[11:38:20] {...}/rpc_tracker.cc:265: New session from 127.0.0.1:38716
[11:38:20] {...}/rpc_tracker.cc:315: Handshake with 127.0.0.1:38716 successful
[11:38:20] {...}/rpc_tracker.cc:341: Request using key 'opi5' from 127.0.0.1:38716
[11:38:20] {...}/rpc_tracker.cc:419: Offering matchkey 'opi5:0.218548@192.168.1.10:9000' to 127.0.0.1:38716
[11:38:20] {...}/rpc_tracker.cc:357: End session with 127.0.0.1:38716

{...}
{...}
{...}

<-- another remote edge [192.168.1.5] tvm_rpc server having 'riscv' key steps in to the tracker

[11:44:58] {...}/rpc_tracker.cc:265: New session from 192.168.1.5:43390
[11:44:58] {...}/rpc_tracker.cc:315: Handshake with 192.168.1.5:43390 successful
[11:44:58] {...}/rpc_tracker.cc:345: Received key 'server:riscv' from 192.168.1.5:43390

<-- a remote ```python3 -m tvm.exec.query_rpc_tracker --host 127.0.0.1 --port 9190``` query
---> tracker gives a summary of its available peers and usage stats

[11:45:11] {...}/rpc_tracker.cc:265: New session from 127.0.0.1:41410
[11:45:11] {...}/rpc_tracker.cc:315: Handshake with 127.0.0.1:41410 successful
[11:45:11] {...}/rpc_tracker.cc:351: Summary requested from 127.0.0.1:41410
[11:45:11] {...}/rpc_tracker.cc:357: End session with 127.0.0.1:41410

---> the query peer receives the summary and display it as:

$ python3 -m tvm.exec.query_rpc_tracker --host 127.0.0.1 --port 9190
Tracker address 127.0.0.1:9190

Server List
------------------------------
server-address           key
------------------------------
192.168.1.10:9000        server:opi5
192.168.1.5:9000         server:riscv
------------------------------

Queue Status
-----------------------------
key     total  free  pending
-----------------------------
opi5    1      1     0      
riscv   1      1     0      
-----------------------------

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a C++ RPC Tracker for the TVM RPC application. It introduces the RPCTracker and PriorityScheduler classes, adds command-line support for starting a tracker via ./tvm_rpc tracker, and extends socket utilities in src/support/socket.h to retrieve remote IP addresses. There are no review comments, so I have no additional feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@cbalint13 cbalint13 force-pushed the tvm_rpc branch 5 times, most recently from 0f219b1 to 53d1997 Compare June 18, 2026 10:04

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a C++ RPC Tracker implementation for TVM, adding command-line support to start the tracker and extending socket utilities with remote IP retrieval. The code review identified several critical issues in the new tracker implementation, including exception safety risks and use-after-move bugs in connection handling, scheduling blocks caused by failed callbacks, inverted socket state checks, unhandled socket binding failures, and potential undefined behavior from uninitialized memory in socket utilities.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread apps/cpp_rpc/rpc_tracker.cc Outdated
Comment thread apps/cpp_rpc/rpc_tracker.cc
Comment thread apps/cpp_rpc/rpc_tracker.cc
Comment thread apps/cpp_rpc/rpc_tracker.cc Outdated
Comment thread apps/cpp_rpc/rpc_tracker.cc Outdated
Comment thread apps/cpp_rpc/rpc_tracker.cc
Comment thread apps/cpp_rpc/rpc_tracker.cc Outdated
Comment thread src/support/socket.h Outdated
@cbalint13 cbalint13 marked this pull request as draft June 18, 2026 11:08
@cbalint13 cbalint13 marked this pull request as ready for review June 18, 2026 12:26
@tqchen

tqchen commented Jun 18, 2026

Copy link
Copy Markdown
Member

Thanks @cbalint13 ! i feel this direction starts to increase the complexity as we bring in more variants. One original motivation of the server is that some env may not have python env, while tracker we do not have the concern. Having a single implementation helps the abilty to maintain while reduces overall complexity of project. So personally i think such implementation would be better stay off tree.

main.cc
rpc_env.cc
rpc_server.cc
rpc_tracker.cc

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given tracker is somethng that do not have concern of portability(as it usually runs on a server with python), i think it is better to limit the on-tree implementation to python only

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tqchen
Yes indeed, my goal here was to be a stand-alone thing, so it can run on edge/small device.
Embedding it into the existing tvm_rpc makes it light-weight & handy and also runs cross platform.

@cbalint13

Copy link
Copy Markdown
Contributor Author

Thanks @cbalint13 ! i feel this direction starts to increase the complexity as we bring in more variants. One original motivation of the server is that some env may not have python env, while tracker we do not have the concern. Having a single implementation helps the abilty to maintain while reduces overall complexity of project. So personally i think such implementation would be better stay off tree.

It is ok, I leave in Draft (closed) here, so if any user find useful can pick it up.

My thought was:

  • single standalone tool that can handle this ttracking part too tvm_rpc
  • tracker can run in python-less enviroments just like tvm_rpc
  • tracker calls api still available via ffi, tuner may lunch the tracker too for itself (why not)
  • if good enough, in time can phase out the python counter part

@cbalint13 cbalint13 marked this pull request as draft June 18, 2026 13:17
@cbalint13 cbalint13 closed this Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants