Skip to content
theseus
gpt_neox
Initializing search
theseus
Getting Started
Launch Commands
Tutorials
Tutorials
Overview
Running Experiments
Running Experiments
Overview
Local
Remote (SSH & SLURM)
Remote (Volcano / K8s)
Remote (TPU)
Interactive REPL
Adding Things
Adding Things
Overview
Model
Experiment
Dataset
Evaluation
Analysis Job
Design
Design
Overview
Dispatch Infrastructure
Config System
Mock System
Plot System
Reference
Reference
base
base
axis
chip
hardware
job
topology
cli
config
data
data
datasets
datasets
alpaca
bbq
ccaligned
cfq
clutrr
dataset
dictlearn
fever
fineweb
flan
harmfulqa
longbench
longhealth
mmlu
mnli
mtob
openr1_math
pes2o
pg19
pile
pile_detoxify
pile_injected
qqp
siqa
squad
sst2
winogrande
tokenize
tokenizer
dispatch
dispatch
bootstrap
config
dispatch
mailbox
mailbox
mailbox
sidecar
slurm
solve
ssh
sync
tpu
volcano
evaluation
evaluation
base
datasets
datasets
alpaca
arc_challenge
arithmetic
bbh
bbq
blimp
ccaligned
cfq
clutrr
dictlearn
fever
gsm8k
hellaswag
longbench
longhealth
math
mmlu
mnli
mtob
perplexity_evals
pes2o
pg19
pg19_lengthgen
pile
pile_injected
qqp
siqa
squad
sst2
tinystories
winogrande
experiments
experiments
benchmark
continual
continual
abcd
benchmark
models
models
forking
gpt
gpt_neox
lact
llama
moe
qwen
qwen_3_5
mok
mok
reward
smoke
inference
inference
base
ttt
job
mock
model
model
activations
activations
swiglu
attention
attention
base
forking
gated_delta
grouped
rope
scratching
axes
block
block
block
forking
gpt_neox
gpt_neox
Table of contents
gpt_neox
lact
llama
mamba
moe
qwen
qwen_3_5
scratching
layers
layers
lact
layernorm
mlp
mrope
rmsnorm
rope
masks
models
models
base
contrib
contrib
gpt_neox
llama
marin
qwen
qwen_3_5
qwen_3_5_moe
hybrid
lact
mamba
moe
scratchbubbles
thoughtbubbles
module
moe
moe
base
bias_balanced
shared
plot
quick
registry
training
training
backbone
base
contrastive
flywheel
flywheel
contrastive
padded
pmd
strategy
grpo
kl_divergence
lora
optimizers
optimizers
adamw
muon
ppo
schedules
schedules
cosine_rewarm
wsd
wsds
utils
web
web
app
auth
generate_password_hash
models
routes
routes
api
auth
views
services
services
cache
checkpoints
logs
status
Table of contents
gpt_neox
gpt_neox
gpt_neox
¶
Back to top