Orion¶

Planned — not yet implemented. Orion is design-only today (src/orion/ is a skeleton). Every behavior on this page is the intended design from the architecture sketch (orion-architecture.md, 2026-05-22) and the Research Stack Architecture (Notion) §5; nothing here can be run yet. Implementation is tracked in the Linear Orion project.

Orion is the research / training / benchmarking layer of Constellation’s research stack — the package researchers will actively wield. It sits on top of Ursa (the data catalog) and Virgo (preprocessing), and wraps torch_brain model classes under orion.models.* so iteration speed isn’t gated on upstream PRs.

The design deliberately collapses the surface to two top-level types and one entry point (design-doc §0):

RunConfig — a Pydantic-validated, Tyro-CLI-overridable description of one training run. Its data field is a VirgoQuery; there is no separate data-config layer.
VirgoQuery — the data spec, owned by Virgo. Researchers build it with pi.query(...), compose multiple sources with VirgoQuery + VirgoQuery, and Orion materializes it lazily via pi.stream(query).
orion.train() — the single entry point, polymorphic over RunConfig | list[RunConfig]. A list covers sweeps and multi-stage grouping; there is no TrainingPipeline class and no pipeline CLI verb.

What Orion will provide¶

Pure-PyTorch Trainer (no Lightning) — wraps the model in FSDP2, writes sharded checkpoints via torch.distributed.checkpoint, dispatches a small Callback protocol, and runs under torchrun with elastic restart (same WORLD_SIZE). Parallelism is pluggable behind a ParallelismStrategy protocol.
VirgoQuery-driven dataloader — resolves a VirgoQuery to dict[recording_id, Data] through Virgo’s ProcessingInterface, then model.tokenize(data) inside the dataset converts each temporaldata.Data into the flat tensor dict the model consumes. torch_brain samplers are re-exported; an Augmentation chain runs on Data before tokenize.
Typed loss list — TrainingConfig.loss is a list of BrainFrame-style Loss objects (each a Pydantic model + nn.Module), composed at build time rather than hard-coded in the model.
Rich self-describing checkpoints — bit-exact resume state (model, optimizer, scheduler, RNG, dataloader cursor) plus a data_hashes/manifest.json that answers “what data was this trained on?” and is registered back into Ursa.
Benchmarks as first-class artifacts — a @orion.benchmark decorator produces content-addressed BenchmarkResults stored in Ursa; configurable compute boundaries (inline / async / separate) and partial subsets enable fast in-training metrics.
One config, two launch targets — a discriminated InfraConfig (LocalInfra / SlurmInfra / SkyPilotInfra) launches the same run on Polaris via Slurm or burst to the cloud via SkyPilot.
Self-hosted ClearML run registry — surfaced through orion.registry.* and orion-mcp. At MVP a stub backend writes JSON locally; richer search/diff/lineage light up later.
Implicit multi-stage & lineage — init_from=FinetuneFrom(run="upstream_name") plus a RunConfig.pipeline lineage key chains pretrain → finetune runs with full lineage propagation; no orchestration class required.
Aggressive pre-flight validation — pi.dry_run(query) and config checks run before any GPU is touched, so doomed runs never reach a node.

How a run flows¶

        flowchart LR
    RC["RunConfig<br/>(data = VirgoQuery)"] --> PF["preflight<br/>pi.dry_run(query)"]
    PF --> L["launcher<br/>(Local / Slurm / SkyPilot)"]
    L --> TR["torchrun"]
    TR --> FIT["Trainer.fit()"]
    DL["make_dataloader<br/>pi.stream(query)"] --> FIT
    FIT --> OUT["checkpoints<br/>+ benchmark results"]
    OUT --> URSA["Ursa<br/>(register_checkpoint /<br/>register_benchmark_result)"]
    URSA -. "raw + processed Data" .-> DL

The flow within the package is one-way: a RunConfig carrying a VirgoQuery is validated by preflight, handed to the launcher, run under torchrun, and Trainer.fit() consumes batches from a dataloader backed by pi.stream(query). Checkpoints and benchmark results flow back into Ursa’s catalog, which is also where the data came from.

Where this fits¶

Orion is one of three packages in Constellation’s research stack:

Ursa — data catalog & storage layer
Virgo — DAG-based preprocessing & the VirgoQuery language
Orion (this site) — research / training / benchmarking

Cross-cutting concerns are documented once and linked, not duplicated here:

Observability, timestamps, CI, MCP, cloud-agnostic launch → Research Stack Architecture (Notion) §5.
Secrets & notifications → constellation-utils docs.

Status & phasing¶

Planned. Implementation is sequenced in the Linear Orion project (design-doc §18); M2 is split into a true-MVP M2a and a production-grade M2b:

M1 — Foundations (in progress) — repo skeleton, core deps, OrionModel mixin, creds & notifications, orion.status.* namespace.
M2a — First end-to-end run (true MVP) — RunConfig, preflight, single-device Trainer, OrionDataset, basic rich checkpoint + data_hashes/manifest.json, LocalInfra, stub registry. Demo: train POYO 100 steps on r2-test, kill, resume.
M2b — Production-grade MVP — DDP, multi-rank manifest merge, Augmentation, Slurm-on-Polaris & multi-node SkyPilot, full ClearML, v0.1.0 tag.
M3 — Benchmarking framework.
M4 — Multi-stage training & lineage.
M5 — Production-scale (module version routing, multi-node).
M6 — Polish & onboarding.

Contents