Skip to content

mrope

mrope

Multi-modal RoPE (M-RoPE).

For text-only inputs (position_ids of shape (B, T)) this is bit-exact to :class:RotaryPosEncoding — the three (T, H, W) frequency channels all carry the same positions, and the interleaved selection picks the same value at every index. The module is broken out so the vision path can compose it later.

MRotaryPosEncoding(d_model: int, base: int = 10000, seq_dim: int = 1, partial_rotary_factor: float = 1.0, mrope_section: Optional[Sequence[int]] = None)

Bases: RotaryPosEncoding

RoPE with optional 3-channel (T, H, W) position ids.

Pass positions as (B, T) for text-only; as (3, B, T) for multimodal — the channels are interleaved per mrope_section.