layers
layers
¶
RotaryPosEncoding(d_model: int, base: int = 10000, seq_dim: int = 1, partial_rotary_factor: float = 1.0)
¶
Unified RoPE supporting standard, Qwen, and NeoX variants.
Differences between variants are controlled by constructor args: - base: 10000 (standard/NeoX) or 1e6 (Qwen) - partial_rotary_factor: 1.0 (standard/Qwen) or <1.0 (NeoX)
QwenRotaryPosEncoding(d_model: int, base: int = 1000000, seq_dim: int = 1)
¶
NeoXRotaryPosEncoding(d_model: int, base: int = 10000, seq_dim: int = 1, partial_rotary_factor: float = 1.0)
¶
MRotaryPosEncoding(d_model: int, base: int = 10000, seq_dim: int = 1, partial_rotary_factor: float = 1.0, mrope_section: Optional[Sequence[int]] = None)
¶
Bases: RotaryPosEncoding
RoPE with optional 3-channel (T, H, W) position ids.
Pass positions as (B, T) for text-only; as (3, B, T) for
multimodal — the channels are interleaved per mrope_section.