Architecture¶
The full Constellation Research Stack architecture is captured in the canonical Notion doc:
This page mirrors the Orion-specific section so the deployed docs site can stand on its own.
Goals¶
Researcher-friendly entry point.
orion train configs/myrun.pyworks end-to-end with smart defaults.Powerful but typed configs with aggressive pre-flight validation. Pydantic + Tyro. Configs are versioned, validated, serialized into the run record. Pre-flight catches doomed runs before they spin up a GPU.
Distributed-first. Single-node 8-GPU on Polaris and 100+ GPU multi-node in cloud are the same code path. SkyPilot picks the host; Lightning + DDP/FSDP (Monarch opt-in) handles parallelism.
Resumable from any failure. Checkpoints contain everything needed to bit-exactly resume, including the complete list of data hashes consumed up to that step.
Rich, queryable run artifacts. Every checkpoint is queryable: data, config, benchmark scores, lineage.
Benchmarks as first-class artifacts and as the in-training eval mechanism. Same
Benchmarkclass powers ad hoc evaluation, periodic-during-training callbacks, and final eval. Supports partial subsets for fast in-training metrics.Native multi-stage training. Pretrain → finetune → finetune is a declarative pipeline of
RunConfigs with full lineage tracking.torch_brainis wrapped, not extended. We use its modules but our own subclasses live inorion.models.*, so iteration speed isn’t gated on upstream PRs.
Configuration¶
from orion.config import RunConfig, TrainingConfig, DataConfig, ModelConfig, InfraConfig, BenchCompute
from orion.models import POYO
from ursa import QuerySpec
run = RunConfig(
name="cognitive_load_v3_baseline",
data=DataConfig(
ursa_query=QuerySpec(
participants=["p042", "p043", "p044"],
modalities=["eeg", "pupil"],
derived=["standard_v2026q2.eeg_clean", "standard_v2026q2.pupil_dilation"],
),
train_window=10.0,
train_split=0.85,
),
model=ModelConfig(cls=POYO, kwargs=dict(dim=512, depth=6, num_latents=64)),
training=TrainingConfig(
batch_size=32, max_steps=200_000,
optimizer="sparse_lamb", lr=3e-4, scheduler="onecycle",
ckpt_every_n_steps=2_000, precision="bf16-mixed",
bench_every_n_steps=10_000, bench_suites=["cognitive_load_eval@1"],
bench_compute=BenchCompute(mode="async", partial_subset=0.1),
),
infra=InfraConfig(accelerator="gpu", nodes=1, gpus_per_node=8),
)
Rich checkpoints¶
A checkpoint is a directory:
ckpt-step000050000/
├── manifest.json # version, hash, parent ckpt hash
├── config.json # full RunConfig snapshot
├── code_version.json # orion+virgo+ursa versions, git commits, dirty flag, module versions
├── env.lock # uv.lock snapshot
├── model/{weights.safetensors,ema.safetensors}
├── optimizer/state.pt
├── scheduler/state.pt
├── grad_scaler/state.pt
├── distributed/rank0.pt ... rankN.pt # FSDP shards if applicable
├── rng/{cpu.pt,cuda_<i>.pt}
├── dataloader/cursor.pt # Lance scan position
├── data_hashes/manifest.json # the list of ALL recording_hashes + time_windows + derived_asset_ids
├── metrics/snapshot.json
├── benchmarks/ # results from periodic bench callbacks
└── lineage.json # parent run, parent ckpt, training steps consumed
The data_hashes/manifest.json is the load-bearing piece: with it, train/test overlap detection is a set intersection on Ursa’s catalog, not a forensic exercise.
Benchmarks¶
A benchmark is the canonical way Orion measures anything. Same class powers ad hoc eval, periodic-during-training callbacks, and final eval.
@orion.benchmark(name="cognitive_load_eval", version=1)
class CognitiveLoadEvalV1:
def held_out_data(self) -> QuerySpec: ...
def metrics(self) -> list[Metric]: ...
def run(self, model, data, *, partial_subset: float = 1.0) -> BenchmarkResult: ...
A BenchmarkResult is content-addressed by (suite_name, suite_version, checkpoint_hash, dataset_hash, partial_subset, partial_seed) and stored in Ursa’s benchmark_results table.
BenchCompute modes: inline (cheapest, blocks training), async (default — snapshot weights to a side SkyPilot job; training continues; metric arrives later), separate (own instance per bench).
Multi-stage training¶
from orion.config import RunConfig, TrainingPipeline, FinetuneFrom
pretrain = RunConfig(name="cognitive_load_v3_pretrain", ...)
finetune_p042 = RunConfig(
name="cognitive_load_v3_finetune_p042",
init_from=FinetuneFrom(run=pretrain, checkpoint="best", weight_overrides=["readout"]),
...,
)
pipeline = TrainingPipeline(
name="cognitive_load_v3_full",
stages=[pretrain, finetune_p042],
)
Lineage propagates automatically: parent, children, full_history, data_hashes superset.
Phasing¶
See Linear for issue-level detail.