From d9d6dd9f9fbb620e7162c52eddee9634ffd85046 Mon Sep 17 00:00:00 2001 From: tlopex <820958424@qq.com> Date: Sun, 5 Apr 2026 21:08:42 -0400 Subject: [PATCH 1/2] fnish1 --- docs/arch/index.rst | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/docs/arch/index.rst b/docs/arch/index.rst index cb02c109b7d8..d525beee278f 100644 --- a/docs/arch/index.rst +++ b/docs/arch/index.rst @@ -248,6 +248,31 @@ On the Python side, users interact with the VM through ``relax.VirtualMachine(ex which provides both a direct invocation interface and a stateful set-input / invoke / get-output interface suitable for RPC-based remote execution. +Disco: Distributed Runtime +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Disco is TVM's distributed runtime for executing models across multiple devices. When a model is +too large to fit on a single GPU, the ``relax.distributed`` module annotates how tensors should be +partitioned and placed across a mesh of devices at compile time. Disco then takes over at runtime: +it manages a group of workers, dispatches the compiled program to all of them simultaneously, and +coordinates inter-device communication through collective operations such as allreduce, allgather, +broadcast, and scatter. + +The central abstraction is the ``Session``, which owns the workers and exposes a SPMD-style +programming interface. Every object that lives on workers is represented by a ``DRef`` — a +distributed reference that maps to a concrete value on each worker. When the controller invokes a +``DPackedFunc`` through the session, all workers execute the same PackedFunc call in lockstep, each +operating on its own local shard. Compiled VM modules can be loaded into a session as ``DModule`` +objects and called in the same fashion. The session also provides collective primitives backed by +NCCL or RCCL, so that workers can exchange partial results without routing data through the +controller. + +Three session backends cover different deployment topologies. ``ThreadedSession`` spawns workers as +threads within a single process — this is the most common choice for multi-GPU inference on a +single machine. ``ProcessSession`` launches workers as separate OS processes connected by pipes, +providing stronger isolation. ``SocketSession`` extends the model to multi-node clusters by +connecting workers across machines via TCP sockets. + tvm/node -------- The node module adds additional features on top of the `runtime::Object` for IR data structures. From eab0965bb81f2182f4b03ad89fac99b651cdc7bf Mon Sep 17 00:00:00 2001 From: tlopex <820958424@qq.com> Date: Sun, 5 Apr 2026 21:14:22 -0400 Subject: [PATCH 2/2] fnis2 --- docs/arch/index.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/arch/index.rst b/docs/arch/index.rst index d525beee278f..95ba2789bcff 100644 --- a/docs/arch/index.rst +++ b/docs/arch/index.rst @@ -261,7 +261,7 @@ broadcast, and scatter. The central abstraction is the ``Session``, which owns the workers and exposes a SPMD-style programming interface. Every object that lives on workers is represented by a ``DRef`` — a distributed reference that maps to a concrete value on each worker. When the controller invokes a -``DPackedFunc`` through the session, all workers execute the same PackedFunc call in lockstep, each +``DPackedFunc`` through the session, all workers execute the same PackedFunc call synchronously, each operating on its own local shard. Compiled VM modules can be loaded into a session as ``DModule`` objects and called in the same fashion. The session also provides collective primitives backed by NCCL or RCCL, so that workers can exchange partial results without routing data through the