`vllm.v1.worker.encoder_cudagraph_defs` ¶

Data transfer objects for encoder CUDA graph management.

Classes:

EncoderCudaGraphCaptureInputs –

Everything needed for one CUDA graph capture.
EncoderCudaGraphConfig –

Configuration for encoder CUDA graph management.
EncoderCudaGraphReplayBuffers –

New buffer values for graph replay, computed by the model from
EncoderItemSpec –

Description of a single encoder input item.

`EncoderCudaGraphCaptureInputs` `dataclass` ¶

Everything needed for one CUDA graph capture.

Returned by prepare_encoder_cudagraph_capture_inputs().

Attributes:

values (dict[str, Tensor]) –

Precomputed tensor buffers that will be recorded into the

Source code in vllm/v1/worker/encoder_cudagraph_defs.py

@dataclass
class EncoderCudaGraphCaptureInputs:
    """Everything needed for one CUDA graph capture.

    Returned by ``prepare_encoder_cudagraph_capture_inputs()``.
    """

    values: dict[str, torch.Tensor]
    """Precomputed tensor buffers that will be recorded into the
    CUDA graph.  The manager stores references to these exact
    tensor objects and copies new data into them before each
    ``graph.replay()`` call (buffer identity invariant)."""

`values` `instance-attribute` ¶

Precomputed tensor buffers that will be recorded into the CUDA graph. The manager stores references to these exact tensor objects and copies new data into them before each graph.replay() call (buffer identity invariant).

`EncoderCudaGraphConfig` `dataclass` ¶

Configuration for encoder CUDA graph management.

Provided by the model at init time via get_encoder_cudagraph_config(). Values are fixed for the lifetime of the manager.

Attributes:

buffer_keys (list[str]) –

Keys for the tensor buffers recorded into the CUDA graph.
enable_dual_path_graph (bool) –

If True, the manager captures two independent graph sets
global_token_per_image (int) –

Tokens per global image (e.g. 272 for DeepSeek-OCR).
local_token_per_patch (int) –

Tokens per local patch (e.g. 100 for DeepSeek-OCR).
max_frames_per_video (int) –

Maximum number of frames per video.
modalities (list[str]) –

Supported modalities (e.g. ["image"]).
out_hidden_size (int) –

Output hidden dim of the vision encoder.
padding_logics (dict[str, EncoderCudaGraphPaddingLogic]) –

Optional per-buffer replay padding/copy logic.

Source code in vllm/v1/worker/encoder_cudagraph_defs.py

@dataclass
class EncoderCudaGraphConfig:
    """Configuration for encoder CUDA graph management.

    Provided by the model at init time via
    ``get_encoder_cudagraph_config()``. Values are fixed for the
    lifetime of the manager.
    """

    modalities: list[str]
    """Supported modalities (e.g. ["image"])."""

    buffer_keys: list[str]
    """Keys for the tensor buffers recorded into the CUDA graph.
    Before replay the manager zeros then slice-copies new data
    into these buffers."""

    out_hidden_size: int
    """Output hidden dim of the vision encoder.
    Used for DP gather buffer allocation."""

    padding_logics: dict[str, EncoderCudaGraphPaddingLogic] = field(
        default_factory=dict
    )
    """Optional per-buffer replay padding/copy logic.
    If absent for a key, the manager zeros the capture buffer and slice-copies
    the replay buffer into it."""

    max_frames_per_video: int = 1
    """Maximum number of frames per video.
    Only relevant when "video" is in ``modalities``.
    Image-only models can use the default of 1."""

    enable_dual_path_graph: bool = False
    """If True, the manager captures two independent graph sets
    (global + local) and runs dual-path graph selection during inference."""

    global_token_per_image: int = 0
    """Tokens per global image (e.g. 272 for DeepSeek-OCR).
    Only used when ``enable_dual_path_graph`` is True."""

    local_token_per_patch: int = 0
    """Tokens per local patch (e.g. 100 for DeepSeek-OCR).
    Only used when ``enable_dual_path_graph`` is True."""

`buffer_keys` `instance-attribute` ¶

Keys for the tensor buffers recorded into the CUDA graph. Before replay the manager zeros then slice-copies new data into these buffers.

`enable_dual_path_graph = False` `class-attribute` `instance-attribute` ¶

If True, the manager captures two independent graph sets (global + local) and runs dual-path graph selection during inference.

`global_token_per_image = 0` `class-attribute` `instance-attribute` ¶

Tokens per global image (e.g. 272 for DeepSeek-OCR). Only used when enable_dual_path_graph is True.

`local_token_per_patch = 0` `class-attribute` `instance-attribute` ¶

Tokens per local patch (e.g. 100 for DeepSeek-OCR). Only used when enable_dual_path_graph is True.

`max_frames_per_video = 1` `class-attribute` `instance-attribute` ¶

Maximum number of frames per video. Only relevant when "video" is in modalities. Image-only models can use the default of 1.

`modalities` `instance-attribute` ¶

Supported modalities (e.g. ["image"]).

`out_hidden_size` `instance-attribute` ¶

Output hidden dim of the vision encoder. Used for DP gather buffer allocation.

`padding_logics = field(default_factory=dict)` `class-attribute` `instance-attribute` ¶

Optional per-buffer replay padding/copy logic. If absent for a key, the manager zeros the capture buffer and slice-copies the replay buffer into it.

`EncoderCudaGraphReplayBuffers` `dataclass` ¶

New buffer values for graph replay, computed by the model from actual batch inputs.

Returned by prepare_encoder_cudagraph_replay_buffers(). Keys match EncoderCudaGraphConfig.buffer_keys.

Attributes:

values (dict[str, Tensor | None]) –

Data to copy into the captured buffers before replay.

Source code in vllm/v1/worker/encoder_cudagraph_defs.py

@dataclass
class EncoderCudaGraphReplayBuffers:
    """New buffer values for graph replay, computed by the model from
    actual batch inputs.

    Returned by ``prepare_encoder_cudagraph_replay_buffers()``.
    Keys match ``EncoderCudaGraphConfig.buffer_keys``.
    """

    values: dict[str, torch.Tensor | None]
    """Data to copy into the captured buffers before replay.
    ``None`` values leave the corresponding captured buffer
    unchanged."""

`values` `instance-attribute` ¶

Data to copy into the captured buffers before replay. None values leave the corresponding captured buffer unchanged.

`EncoderItemSpec` `dataclass` ¶

Description of a single encoder input item.

Returned by get_encoder_cudagraph_item_specs() to describe each image or video in a batch without the manager needing to understand model-specific input formats.

Attributes:

global_output_tokens (int) –

Number of output tokens from the global image path.
input_size (int) –

Number of input patches/rows for this item.
local_output_tokens (int) –

Number of output tokens from the local patch path.
output_tokens (int) –

Number of output tokens after encoder processing (e.g. after

Source code in vllm/v1/worker/encoder_cudagraph_defs.py

@dataclass
class EncoderItemSpec:
    """Description of a single encoder input item.

    Returned by ``get_encoder_cudagraph_item_specs()`` to describe each
    image or video in a batch without the manager needing to understand
    model-specific input formats.
    """

    input_size: int
    """Number of input patches/rows for this item."""

    output_tokens: int
    """Number of output tokens after encoder processing (e.g. after
    spatial merge)."""

    global_output_tokens: int = 0
    """Number of output tokens from the global image path.
    Only used when ``EncoderCudaGraphConfig.enable_dual_path_graph`` is True."""

    local_output_tokens: int = 0
    """Number of output tokens from the local patch path.
    Only used when ``EncoderCudaGraphConfig.enable_dual_path_graph`` is True."""

`global_output_tokens = 0` `class-attribute` `instance-attribute` ¶

Number of output tokens from the global image path. Only used when EncoderCudaGraphConfig.enable_dual_path_graph is True.

`input_size` `instance-attribute` ¶

Number of input patches/rows for this item.

`local_output_tokens = 0` `class-attribute` `instance-attribute` ¶

Number of output tokens from the local patch path. Only used when EncoderCudaGraphConfig.enable_dual_path_graph is True.

`output_tokens` `instance-attribute` ¶

Number of output tokens after encoder processing (e.g. after spatial merge).

vllm.v1.worker.encoder_cudagraph_defs ¶

EncoderCudaGraphCaptureInputs dataclass ¶

values instance-attribute ¶

EncoderCudaGraphConfig dataclass ¶

buffer_keys instance-attribute ¶

enable_dual_path_graph = False class-attribute instance-attribute ¶

global_token_per_image = 0 class-attribute instance-attribute ¶

local_token_per_patch = 0 class-attribute instance-attribute ¶

max_frames_per_video = 1 class-attribute instance-attribute ¶

modalities instance-attribute ¶

out_hidden_size instance-attribute ¶

padding_logics = field(default_factory=dict) class-attribute instance-attribute ¶

EncoderCudaGraphReplayBuffers dataclass ¶

values instance-attribute ¶

EncoderItemSpec dataclass ¶

global_output_tokens = 0 class-attribute instance-attribute ¶

input_size instance-attribute ¶

local_output_tokens = 0 class-attribute instance-attribute ¶

output_tokens instance-attribute ¶

`vllm.v1.worker.encoder_cudagraph_defs` ¶

`EncoderCudaGraphCaptureInputs` `dataclass` ¶

`values` `instance-attribute` ¶

`EncoderCudaGraphConfig` `dataclass` ¶

`buffer_keys` `instance-attribute` ¶

`enable_dual_path_graph = False` `class-attribute` `instance-attribute` ¶

`global_token_per_image = 0` `class-attribute` `instance-attribute` ¶

`local_token_per_patch = 0` `class-attribute` `instance-attribute` ¶

`max_frames_per_video = 1` `class-attribute` `instance-attribute` ¶

`modalities` `instance-attribute` ¶

`out_hidden_size` `instance-attribute` ¶

`padding_logics = field(default_factory=dict)` `class-attribute` `instance-attribute` ¶

`EncoderCudaGraphReplayBuffers` `dataclass` ¶

`values` `instance-attribute` ¶

`EncoderItemSpec` `dataclass` ¶

`global_output_tokens = 0` `class-attribute` `instance-attribute` ¶

`input_size` `instance-attribute` ¶

`local_output_tokens = 0` `class-attribute` `instance-attribute` ¶

`output_tokens` `instance-attribute` ¶