mrope
mrope
¶
Multi-modal RoPE (M-RoPE).
For text-only inputs (position_ids of shape (B, T)) this is bit-exact to
:class:RotaryPosEncoding — the three (T, H, W) frequency channels all carry
the same positions, and the interleaved selection picks the same value at
every index. The module is broken out so the vision path can compose it later.
MRotaryPosEncoding(d_model: int, base: int = 10000, seq_dim: int = 1, partial_rotary_factor: float = 1.0, mrope_section: Optional[Sequence[int]] = None)
¶
Bases: RotaryPosEncoding
RoPE with optional 3-channel (T, H, W) position ids.
Pass positions as (B, T) for text-only; as (3, B, T) for
multimodal — the channels are interleaved per mrope_section.