grex

A nested meta-repo manager. Track many git repos as a single graph, sync them in parallel, and drive every operation from a shell, CI, or an LLM agent speaking MCP.

grex is what you reach for when one git repo is no longer enough — when you have a tree of related repos (a workspace, a fleet of services, a set of dotfiles + plugins + tools) and you want one declarative source of truth that says which repos belong, where they live on disk, and how they're kept in sync.

It is not a dev-environment installer, not a package manager, not mise / asdf. It manages repos, not language toolchains.

In 30 seconds

cargo install grex-cli                       # binary is `grex`
grex init                                    # creates grex.jsonl in cwd
grex add https://github.com/you/svc-a        # registers + clones a sub-repo
grex add https://github.com/you/svc-b
grex sync                                    # parallel pull/clone for all
grex status --json                           # machine-readable state

grex.jsonl (intent) and grex.lock.jsonl (resolved state) are the only files you commit to your meta-repo. Everything else grex does — clone, pull, run actions, talk MCP — is reproducible from those two files.

What you get

  • One CLI, twelve frozen verbs. init add rm ls status sync update doctor serve import run exec. Universal --json --plain --dry-run --parallel <N> --filter <EXPR> on every verb. See the CLI reference.
  • Pack contract. Any git repo with a .grex/pack.yaml is a pack. Three built-in pack-types ship; the plugin API lets you add more without forking. Read the pack spec.
  • Reproducible manifest. Newline-delimited JSON, schema-versioned per row. See manifest.
  • MCP server built-in. grex serve speaks native MCP 2025-06-18 over stdio — every non-serve verb becomes a tool call, no custom dialect. See MCP reference.
  • Parallel scheduler with a Lean4 invariant proof. Bounded semaphore
    • per-pack .grex-lock + fd-lock manifest guard; "no double-lock" is mechanised. See concurrency.
  • Migration from REPOS.json meta-repos via grex import --from-repos-json. See migration.

API reference (rustdoc): grex-core · grex-mcp.

Heads up: the published crate is grex-cli; the installed binary is grex. If pemistahl's unrelated grex (regex-from-test-cases) is already on your PATH, pass --force to cargo install grex-cli or rename the other binary first.

goals

Philosophy, competitive positioning, and scope for grex v1.

Philosophy (7 principles)

  1. Git repo is a universal container for machine-configurable state. Configs, tools, env declarations, symlink trees, install manifests all ride on git: free versioning, distribution, diffing, authorship.

  2. Pack = git repo + .grex/ directory. The .grex/ dir is the contract grex understands. Pack content outside .grex/ is opaque to grex. A pack is just a git repo that opts into the protocol.

  3. Every pack is a meta-pack. Uniform model. Packs can nest child packs. Leaf packs just have zero children. No special-casing in code.

  4. Repo sync is a universal op, orthogonal to pack-type. Every pack gets grex sync (git fetch/pull + recurse into children) for free. Install / update / teardown are per-pack-type.

  5. Extensibility is vital. grex cannot precompile every install or config logic. The action vocabulary and pack-types are plugin interfaces. v1 ships a small built-in set compiled in; v2 opens external plugin loading.

  6. Future-proof core, pragmatic content. Stable schemas + trait APIs at v1. Action vocabulary stays small (YAGNI) but grows via plugin contributions over time.

  7. Agent-native. Embedded MCP stdio JSON-RPC server exposing all CLI verbs 1:1. Not a subprocess wrapper — handlers call the same library entrypoints the CLI dispatcher calls.

Cross-cutting: blazingly fast via Rust + tokio. All built-in actions are native Rust (no shell fork). A shell escape hatch exists (exec action, scripted pack-type) but is the last resort, not the default.

Competitive positioning

Axiscodyaverett/metarepogrex
Domaingit repos onlyany resource via pack protocol
Concurrencysequentialtokio parallel, bounded semaphore
Stateintent-onlyintent + lockfile (separate files)
Atomic writesnoyes (temp + rename always)
MCPsubprocess wrapperembedded in-process server
Lean4 proofno1 scheduler invariant v1
Nestingvia sub-metauniform (every pack = meta)
Extensioncode changes onlytrait-based plugin registry
Cross-platyesyes + explicit Win/Linux/Mac CI matrix

v1 shippable scope

Core (always compiled)

  • Manifest (JSONL, intent events)
  • Lockfile (JSONL, resolved SHA + state, separate file)
  • Scheduler (tokio + bounded semaphore + per-pack .grex-lock + fd-lock global)
  • Sync engine (git clone/pull, recurse into children)
  • Gitignore automation (managed block markers)
  • MCP server (stdio JSON-RPC 2.0, methods = CLI verbs 1:1)
  • Pack discovery (.grex/pack.yaml parse)
  • Action executor + in-process action plugin registry
  • Pack-type executor + in-process pack-type plugin registry
  • Atomic file writes (temp + rename always)
  • Lean4 invariant proof (no double-lock on same resource path)

Built-in pack-types (3)

  • meta — nests children, no own actions.
  • declarative — runs Tier 1 actions from pack.yaml.
  • scripted — escape hatch; runs .grex/hooks/{setup,sync,teardown}.{sh,ps1}.

Built-in actions (7 Tier 1, grounded in real E:\repos scripts)

  1. symlink — create/update symlink w/ backup, idempotent, cross-platform.
  2. env — set env var (user / machine / session scope).
  3. mkdir — idempotent dir creation (parents).
  4. rmdir — remove dir, optional backup.
  5. require — prereq / idempotency gate (path-exists, cmd-available, reg-key, os, psversion, symlink-ok).
  6. when — platform / conditional gate wrapping nested actions.
  7. exec — shell escape (array-form cmd, no shell-parse by default).

CLI verbs (12, frozen contract)

init add rm ls status sync update doctor serve import run exec

Stable public APIs (breaking changes forbidden post-v1 without major bump)

  1. .grex/pack.yaml schema (with schema_version: "1").
  2. grex.jsonl manifest schema.
  3. grex.lock.jsonl lockfile schema.
  4. ActionPlugin Rust trait.
  5. PackTypePlugin Rust trait.
  6. Fetcher Rust trait.
  7. CLI verb surface.
  8. MCP method surface (= CLI verbs 1:1).

v2 backlog (NOT v1)

  • External plugin loading (dylib via libloading or WASM via wasmtime / extism).
  • Retro-futurist ratatui TUI dashboard.
  • Additional pack-types (software-list, env-bundle, dotfiles) via plugin.
  • Additional actions (pkg-install, url-download, archive-extract, file-append, patch, json-merge, template, path-add, shell-rc-inject) via plugin.
  • Extra Lean4 proofs (idempotency, commutativity, crash-safety of manifest fold).
  • SQLite optional backend for very large workspaces.
  • Self-update (grex upgrade).
  • Pack registry (grex.dev).
  • Embedded scripting (Lua / Rhai) — middle ground between declarative YAML and shell escape.

Non-goals (permanent)

  • Monorepo conversion.
  • Git submodule full replacement.
  • Cross-VCS support (hg, svn, fossil, perforce).
  • Language-specific build orchestration.
  • Generic CI runner.

Grounded reality — action-vocab rationale

Scanned real-world E:\repos scripts: 3 PowerShell scripts, 945 LOC total. Pattern frequencies:

PatternCountv1 Action
symlink-create8symlink
idempotency-check9require
env-set7env
exec-cmd (chain scripts)5exec
dir-create2mkdir
platform-gate2when
dir-remove (backup pattern)1rmdir
package installs0deferred v2 plugin
JSON merges0deferred v2 plugin
archive extracts0deferred v2 plugin

The 7-primitive Tier 1 vocab is grounded, not speculated. Everything else is deferred to v2 plugin contributions.

architecture

Crate layout, trait surfaces, and data-flow for grex v1.

Workspace

Single crate grex (lib + bin). Sub-crates avoided in v1 to keep the plugin trait crate vendored in the same compilation unit. v2 may split grex-plugin-api into its own crate for ABI stability.

grex/
├── Cargo.toml
├── rust-toolchain.toml
├── src/
│   ├── main.rs                # thin bin entrypoint
│   ├── lib.rs                 # public surface re-exports
│   ├── cli/
│   │   ├── mod.rs             # clap::Command composition
│   │   ├── init.rs            # grex init
│   │   ├── add.rs             # grex add
│   │   ├── rm.rs              # grex rm
│   │   ├── ls.rs              # grex ls
│   │   ├── status.rs          # grex status
│   │   ├── sync.rs            # grex sync
│   │   ├── update.rs          # grex update
│   │   ├── doctor.rs          # grex doctor
│   │   ├── serve.rs           # grex serve --mcp
│   │   ├── import.rs          # grex import
│   │   ├── run.rs             # grex run <action>
│   │   ├── exec.rs            # grex exec <cmd>
│   │   └── output.rs          # all print! / table / color
│   ├── manifest/
│   │   ├── mod.rs
│   │   ├── event.rs           # intent events
│   │   ├── state.rs           # folded pack state
│   │   ├── fold.rs            # event stream → HashMap<Id, State>
│   │   ├── lock.rs            # grex.lock.jsonl
│   │   ├── io.rs              # atomic temp+rename, fd-lock
│   │   └── compact.rs
│   ├── pack/
│   │   ├── mod.rs             # Pack struct, tree walk
│   │   ├── schema.rs          # pack.yaml schema v1
│   │   └── discovery.rs       # load/resolve children
│   ├── plugin/
│   │   ├── mod.rs             # registries, trait re-exports, v1 co-located builtins
│   │   ├── action.rs          # ActionPlugin trait
│   │   ├── packtype.rs        # PackTypePlugin trait
│   │   └── fetcher.rs         # Fetcher trait (git backend)
│   ├── log.rs                 # ActionLogger trait (plugin diagnostics)
│   ├── env.rs                 # EnvResolver trait ($VAR expansion surface)
│   ├── lockfile/
│   │   └── hash.rs            # compute_actions_hash (sha256 over canonical actions+sha)
│   ├── actions/               # 7 built-in action plugins
│   │   ├── symlink.rs
│   │   ├── env.rs
│   │   ├── mkdir.rs
│   │   ├── rmdir.rs
│   │   ├── require.rs
│   │   ├── when.rs
│   │   └── exec.rs
│   ├── packtypes/             # 3 built-in pack-type plugins
│   │   ├── meta.rs
│   │   ├── declarative.rs
│   │   └── scripted.rs
│   ├── fetchers/
│   │   └── git.rs             # gix or git2 behind Fetcher trait
│   ├── gitignore/
│   │   └── mod.rs             # managed-block read/write
│   ├── mcp/
│   │   ├── mod.rs             # stdio JSON-RPC 2.0 loop
│   │   ├── methods.rs         # verb → method dispatch
│   │   └── schema.rs
│   └── concurrency/
│       ├── mod.rs             # tokio runtime bootstrap
│       ├── scheduler.rs       # semaphore + per-pack lock
│       └── packlock.rs        # <path>/.grex-lock
├── tests/
│   ├── integration_add.rs
│   ├── integration_rm.rs
│   ├── sync_recursive.rs
│   ├── sync_parallel.rs
│   ├── gitignore_preserves_user_lines.rs
│   ├── crash_recovery.rs
│   ├── mcp_stdio.rs
│   ├── import_legacy.rs
│   ├── doctor_drift.rs
│   ├── pack_types_end_to_end.rs
│   └── property_manifest.rs
├── proof/
│   ├── lakefile.lean
│   └── Grex/
│       └── Scheduler.lean
└── .github/workflows/
    ├── ci.yml
    ├── lean.yml
    └── release.yml

Core trait sketches

Full contracts in plugin-api.md. Condensed here:

#![allow(unused)]
fn main() {
use async_trait::async_trait;
use serde_json::Value;
use std::path::Path;

pub enum Os { Windows, Linux, Macos }

// v1: PackCtx is realized as ExecCtx in code (2026-04-20).
pub struct ExecCtx<'a> {
    pub vars: &'a VarEnv,                // implements EnvResolver
    pub pack_root: &'a Path,
    pub workspace: &'a Path,
    pub platform: Os,                    // type-safe; decision 2026-04-20
    // deferred to M5: pack_id, dry_run, logger: &dyn ActionLogger
}

// v1 shipped shape (2026-04-20 — aligned with shipped trait in M4-B review fix).
// Sync fn, typed &Action (not &Value), returns ExecStep. Async + &Value form is
// the v2-facing target reserved for external plugin loading (dylib/WASM).
pub trait ActionPlugin: Send + Sync {
    fn name(&self) -> &str;
    fn execute(&self, action: &Action, ctx: &ExecCtx<'_>) -> Result<ExecStep, ExecError>;
}

#[async_trait]
pub trait PackTypePlugin: Send + Sync {
    fn name(&self) -> &str;
    async fn install(&self, ctx: &ExecCtx<'_>, pack: &Pack) -> anyhow::Result<()>;
    async fn update(&self, ctx: &ExecCtx<'_>, pack: &Pack)  -> anyhow::Result<()>;
    async fn teardown(&self, ctx: &ExecCtx<'_>, pack: &Pack) -> anyhow::Result<()>;
    async fn sync(&self, ctx: &ExecCtx<'_>, pack: &Pack)    -> anyhow::Result<()>;
}

pub struct FetchReport {
    pub sha: Option<String>,
    pub branch: Option<String>,
}

#[async_trait]
pub trait Fetcher: Send + Sync {
    fn scheme(&self) -> &str;            // "git"
    async fn clone(&self, url: &str, dst: &Path) -> anyhow::Result<FetchReport>;
    async fn pull(&self, dst: &Path)              -> anyhow::Result<FetchReport>;
}
}

Verb → module map

CLI verbEntry modulePrimary collaborators
initcli::initmanifest::io, gitignore, concurrency
addcli::addmanifest, pack::discovery, plugin::packtype, fetchers::git, gitignore
rmcli::rmmanifest (tombstone), plugin::packtype::teardown, gitignore
lscli::lsmanifest::fold, manifest::lock
statuscli::statusmanifest, per-pack-type status dispatch
synccli::syncfetchers::git, concurrency::scheduler, recursion
updatecli::updatesync + pack-type.install if lockfile delta
doctorcli::doctormanifest integrity, gitignore diff, schema validate
servecli::servemcp::*
importcli::importlegacy REPOS.json ingest → manifest::event::Add
runcli::runplugin::action, cli::output
execcli::exectokio::process, concurrency::scheduler

Data flow (ASCII)

        ┌──────────────┐
argv ──►│  clap parse  │
        └──────┬───────┘
               │ verb + args
               ▼
        ┌──────────────┐     ┌────────────────────┐
        │  dispatcher  │────►│ manifest::load     │
        └──────┬───────┘     │  fold events       │
               │             └────────┬───────────┘
               │                      │ HashMap<PackId, State>
               ▼                      │
        ┌──────────────┐              │
        │ pack::walk   │◄─────────────┘
        │ (load .grex/ │
        │  pack.yaml,  │
        │  recurse     │
        │  children)   │
        └──────┬───────┘
               │ PackTree
               ▼
        ┌──────────────┐
        │ concurrency  │  tokio runtime
        │  scheduler   │  semaphore(N)
        └──────┬───────┘  per-pack .grex-lock
               │
   ┌───────────┼───────────┐
   ▼           ▼           ▼
 fetcher   packtype    action
 (git       plugin    plugin
  pull)     dispatch  exec
               │
               ▼
        ┌──────────────┐
        │ manifest::   │  atomic temp+rename
        │  append      │  fd-lock RW
        └──────┬───────┘
               │
               ▼
        ┌──────────────┐
        │ lockfile     │  resolved state
        │  update      │
        └──────┬───────┘
               │
               ▼
        ┌──────────────┐
        │ gitignore    │  managed-block sync
        │  sync        │
        └──────┬───────┘
               │
               ▼
        ┌──────────────┐
        │ cli::output  │  pretty | plain | json
        └──────────────┘

pack::walk traverses two distinct edges in the pack graph:

  • children edge — ownership. The walker clones missing children, recurses into them, and applies their lifecycle transitively.
  • depends_on edge — verification only. The walker checks each named/URL'd prerequisite resolves to a present, satisfied pack in the workspace; it does NOT clone or recurse. Unresolved depends_on entries are a hard error at plan phase, before the scheduler dispatches any action. See pack-spec.md §children vs depends_on.

Runtime invariants

  • I1 (Lean4 v1 proof): scheduler never holds two concurrent locks on the same pack path.
  • I2: every manifest append is preceded by acquiring the global fd-lock.
  • I3: .gitignore managed-block sync is idempotent — running it twice is a no-op on disk.
  • I4: compaction output is fold-equivalent to its input.
  • I5: pack tree walk terminates (cycle detection).

See concurrency.md for I1's Lean4 formalization.

pack-spec

The .grex/ contract directory and pack.yaml schema v1. Normative.

Pack definition

A pack is a git repository containing a .grex/ directory at its root. grex reads and acts on the contract inside .grex/; everything else in the repo is opaque.

some-pack/                   # git repo root
├── .grex/                   # contract dir (required)
│   ├── pack.yaml            # required: pack manifest
│   ├── targets/             # optional: platform overrides
│   │   ├── windows.yaml
│   │   ├── linux.yaml
│   │   └── macos.yaml
│   ├── files/               # optional: payload files (configs, themes)
│   ├── hooks/               # optional: scripted-type escape hatch
│   │   ├── setup.sh / .ps1
│   │   ├── sync.sh / .ps1
│   │   └── teardown.sh / .ps1
│   └── .state/              # gitignored: runtime state cache
└── ...                      # opaque to grex

pack.yaml schema v1

Top-level fields

FieldTypeRequiredNotes
schema_versionstringyesMust be "1". Future reader rejects unknown.
namestringyesUnique within the parent workspace. Slug-like.
typestringyesOne of meta, declarative, scripted.
versionstringnoPack's own semver; not enforced by grex v1.
depends_onlist[string|url]noExternal prerequisites. Tool verifies presence; does NOT clone or walk. See below.
childrenlist[child-ref]noOwned sub-packs. Tool clones, walks, and syncs transitively. See below.
actionslist[action]noOrdered action list. Meaningful for type: declarative (and declarative children of meta).
teardownlist[action]noOptional explicit teardown. If omitted, default = reverse of actions.

children vs depends_on — ownership split

The two edge types in the pack graph are distinct and tools must not conflate them:

  • childrenowned sub-packs. grex clones them into the workspace, walks into each on sync, and applies their lifecycle transitively. Children appear in the pack tree output (grex ls). Removing a parent teardowns its children.
  • depends_onexternal prerequisites. grex verifies the named/URL'd packs are already present and satisfied in the workspace, but does NOT clone, walk, or modify them. They do not appear under the dependent pack in the pack tree. Failure to resolve a depends_on entry is a hard error at plan phase (before any action runs).

Every pack graph therefore has two edge kinds: a children edge (ownership / walk) and a depends_on edge (verification only). Cycle detection runs over both independently.

children child-ref shape

children:
  - url: git@github.com:user/warp-themes
    path: themes         # optional; default = last URL segment
    ref: v1.2.0          # optional; branch, tag, or SHA. Default: remote HEAD.

Children resolve as flat siblings of the parent pack root: a parent at ~/code/.grex/pack.yaml with a child path: themes materialises that child at ~/code/themes/.grex/pack.yaml. The bare-name rule on path is enforced at plan phase since v1.1.0 — see Validation rules for the regex and rejection shape.

actions list

Each entry is a YAML object with exactly one known action key (symlink, env, mkdir, rmdir, require, when, exec) or a plugin-registered name. The value under the key is the action's arg-object, per that action's schema (see actions.md).

Targets / platform overrides

Files under .grex/targets/{windows,linux,macos}.yaml are merged over the base pack.yaml on the matching OS. Merge rules:

  • Top-level scalars (name, type, version): override replaces.
  • Lists (actions, children, depends_on): appended (base first, then override), unless the override sets actions_replace: true at top level.
  • The override file follows the same schema as pack.yaml (minus schema_version; inherited).

Alternative to separate files: inline when: gates in actions (platform dispatch via the when action — see below).

files/ payload convention

Arbitrary files shipped inside the pack. Actions (e.g. symlink) reference them via paths relative to the pack root: files/config.yaml, files/themes/default.toml. grex resolves these against the pack's workdir at runtime.

.state/ runtime cache

Gitignored. Holds per-pack runtime cache (lock markers, resolved deps, per-platform resolution memo). grex doctor --compact may prune this.

The 3 built-in pack-types

meta

Nests children only. Has no own actions. Lifecycle:

  • install = clone all children, recursively dispatch their pack-type's install.
  • sync = git pull self, then recurse into children's sync.
  • update = sync + dispatch children's update if lockfile SHA changed.
  • teardown = recurse children teardown, then remove self dir (if owned).
schema_version: "1"
name: dev-env
type: meta
children:
  - url: git@github.com:user/warp-cfg
    path: warp-cfg
  - url: git@github.com:user/fonts-pack
    path: fonts

declarative

Runs actions list from pack.yaml in order. All actions are idempotent (or gated by require). May also have children.

  • install = run actions top-to-bottom under the current OS.
  • sync = git pull self, then recurse into children. actions re-run only if lockfile SHA changed (covered by update).
  • update = sync + re-run actions if lockfile delta.
  • teardown = run teardown: list if present; else reverse-order rollback of actions.
schema_version: "1"
name: warp-cfg
type: declarative
version: "0.2.0"

actions:
  - require:
      any_of:
        - cmd_available: git
        - os: windows
      on_fail: error

  - when:
      os: windows
      actions:
        - mkdir: { path: "$HOME/.warp" }
        - symlink:
            src: files/config.yaml
            dst: "$HOME/.warp/config.yaml"
            backup: true
            normalize: true
        - env:
            name: WARP_HOME
            value: "$HOME/.warp"
            scope: user

  - when:
      os: macos
      actions:
        - symlink:
            src: files/config.yaml
            dst: "$HOME/Library/Application Support/warp/config.yaml"

teardown:
  - rmdir: { path: "$HOME/.warp", backup: true }

scripted

Escape hatch. Runs .grex/hooks/{setup,sync,teardown}.{sh,ps1} on the matching OS. grex picks .ps1 on Windows, .sh on Linux/macOS. If the expected hook is absent for the current OS, the lifecycle phase no-ops.

  • install = run hooks/setup.{sh,ps1} with cwd = pack workdir.
  • sync = git pull self, then run hooks/sync.{sh,ps1} if present.
  • update = sync + rerun setup if lockfile delta (no-op if no setup hook).
  • teardown = run hooks/teardown.{sh,ps1} if present.

Hooks receive env vars: GREX_PACK_NAME, GREX_PACK_PATH, GREX_PACK_OS, GREX_DRY_RUN.

Exit code non-zero = failure (propagates).

schema_version: "1"
name: legacy-vim
type: scripted
# hooks/ directory ships setup.sh, setup.ps1, teardown.sh, teardown.ps1

Plain-git children (v1.1.1+)

A child path declared in a parent pack's children: list does not have to carry its own .grex/pack.yaml. When the walker resolves a child to a directory that contains .git/ but no .grex/pack.yaml, grex synthesizes an in-memory scripted-no-hooks pack manifest for it. No file is written to disk.

Synthetic packs are leaves by construction. They declare empty children: [], empty actions: [], and empty teardown: [], so the walker recurses no further past them. Sync against a synthetic pack runs git pull only — no setup, update, or teardown hooks fire (there are none to fire).

This makes the bootstrap pattern (REPOS.json-style flat-sibling layouts: a parent meta-pack whose children are existing plain git repos that the user did not author specifically for grex) walk end-to-end on grex sync without per-child .grex/pack.yaml authoring ceremony.

Surfacing

  • Lockfile: synthetic pack entries set synthetic: true (default false and #[serde(default)], so v1.1.0 lockfiles parse forward).
  • grex ls: synthetic entries are prefixed with ~ in tree mode and gain "synthetic": true in --json mode.
  • grex doctor: synthetic packs report OK (synthetic) instead of raising a missing-manifest error. JSON output gains "synthetic": true on the per-pack diagnostic.

Failure mode

If a declared child path resolves to a directory that has neither .grex/pack.yaml nor .git/, the walker still raises TreeError::ManifestNotFound. Synthesis only fires when at least one of the two exists; a path pointing at "nothing" is genuinely an error.

Example

Workspace layout:

~/code/                       # parent (meta) pack root, declares children
├── .grex/pack.yaml           # type: meta, children: [algo-leet, neetcode]
├── algo-leet/                # child #1 — plain git repo, no .grex/
│   └── .git/
└── neetcode/                 # child #2 — plain git repo, no .grex/
    └── .git/

grex sync ~/code walks algo-leet and neetcode as synthesized scripted-no-hooks packs, runs git pull in each, and exits 0. The lockfile records both with synthetic: true; grex ls shows them with the ~ prefix.

Validation rules

  • schema_version must be exactly "1".
  • type must be one of the 3 built-ins (or a registered plugin name when the plugin is loaded).
  • type in .grex/pack.yaml is the authoritative source of truth. Runtime manifest / lockfile entries record type as an observed snapshot only. On disagreement (manifest type ≠ pack.yaml type), pack.yaml wins and the manifest is corrected on the next sync. See manifest.md.
  • name regex: ^[a-z][a-z0-9-]*$ (letter-led; digits allowed in later positions).
  • children[].path must be bare name: same regex as name. Rejected: path separators (/, \), ., .., the empty string "", anything starting with a digit or capital letter, or a leading /. The empty-string rejection matters because it would otherwise resolve children at the parent's own pack root and silently overwrite it.
  • Unknown top-level keys rejected unless prefixed with x- (user annotations).
  • Unknown action keys rejected unless the plugin is registered.
  • Empty lists are VALID: actions: [], children: [], depends_on: [], teardown: [] all parse cleanly. Empty actions in a declarative pack is a no-op install. Empty children in a meta pack is a no-op sync. Do not reject empty lists.
  • Duplicate symlink.dst within the same pack is a validation error, caught at plan phase (before execution). Two or more symlink actions resolving to the same absolute dst path abort the plan with ActionArgsInvalid. Cross-pack duplicates are handled by conflict detection at the workspace level (separate concern; see concurrency.md).
  • YAML anchors (&name) and aliases (*name) are REJECTED during parse. Rationale: prevents billion-laughs / alias-bomb DoS. Implementation: parser config disables alias resolution, or the loader detects and errors before expansion.

grex doctor runs these checks on every registered pack.

Opacity rule

grex reads only .grex/. It never inspects or touches content outside it. Pack authors may store anything adjacent — scripts, assets, source — and grex stays agnostic.

Relationship to the workspace manifest

A workspace (the directory where you run grex init) is itself a git repo. It has its own grex.jsonl + grex.lock.jsonl tracking which packs are registered. A workspace does not need its own .grex/pack.yaml unless it is also meant to be published as a pack.

manifest

grex.jsonl (intent log) and grex.lock.jsonl (resolved state). Both live at the workspace root. Both are newline-delimited JSON (LF on all platforms — writer normalizes).

Two-file split

FilePurposeWritten by
grex.jsonlAppend-only intent log. User actions: register a pack, remove a pack, update a ref.add, rm, update verbs.
grex.lock.jsonlAppend-only resolved state. Actual SHA + install state after each successful sync/install.sync, update verbs.

Split rationale: intent is portable across machines; lockfile pins the actual state on this machine. Commit intent to git; lockfile may be committed too (for reproducible bootstrap) or gitignored (for per-machine pinning).

grex.jsonl event schemas

Common envelope (all events):

{"op":"<verb>","ts":"<rfc3339>","id":"<pack-id>","schema_version":"1"}

add

{"op":"add","ts":"2026-04-19T10:00:00Z","id":"warp-cfg","schema_version":"1","url":"git@github.com:user/warp-cfg","path":"warp-cfg","type":"declarative","ref":"main"}

rm

{"op":"rm","ts":"2026-04-19T11:00:00Z","id":"warp-cfg","schema_version":"1"}

update

{"op":"update","ts":"2026-04-19T12:00:00Z","id":"warp-cfg","schema_version":"1","ref":"v0.2.0"}

sync (optional intent marker)

{"op":"sync","ts":"2026-04-19T13:00:00Z","id":"warp-cfg","schema_version":"1"}

Action event brackets — action_started / action_completed / action_halted

The sync path writes three bracketing events around each action it applies. These sit alongside (do not replace) the sync intent marker; readers built against v1.0 continue to parse cleanly — unknown op values are ignored per the forward-compat rule.

{"op":"action_started","ts":"2026-04-20T10:00:00Z","id":"warp-cfg","schema_version":"1","action":"symlink","idx":0}
{"op":"action_completed","ts":"2026-04-20T10:00:00Z","id":"warp-cfg","schema_version":"1","action":"symlink","idx":0,"changed":true}
{"op":"action_halted","ts":"2026-04-20T10:00:01Z","id":"warp-cfg","schema_version":"1","action":"exec","idx":1,"reason":"ExecNonZero","stderr":"<truncated to 2 KiB>"}

Semantics:

  • action_started is written under the manifest lock before the action runs.
  • action_completed is written under the manifest lock after the action returns Ok.
  • action_halted is written when the action returns Err, carrying a compact failure reason plus (for exec) a stderr tail capped at 2 KiB (see actions.md §exec).
  • An action_started with no matching action_completed / action_halted indicates a crash mid-action. The startup recovery scan (see concurrency.md §Recovery scan) reports these; cleanup is grex doctor territory (M4+).

ManifestLock is acquired per-action (not per-sync), so a long sync with many actions interleaves lock acquire/release rather than holding the global lock end-to-end.

Fold algorithm (pseudocode):

state = {}
for line in read_jsonl(grex.jsonl):
    match line.op:
        "add":    state[id] = Pack::from(line)
        "update": state[id].patch(line)
        "rm":     state.remove(id)
        "sync":   no-op (intent marker)
return state

O(N) in event count. Deterministic regardless of compaction history.

grex.lock.jsonl resolved-state schema

{"id":"warp-cfg","sha":"abc123...","branch":"main","installed_at":"2026-04-19T13:05:00Z","actions_hash":"sha256:deadbeef..."}

Fields:

FieldRequiredDescription
idyesPack id; matches manifest id.
shayesGit commit SHA of the pack workdir after sync. Stored as the empty string when the pack is not a git working tree (e.g. a local-only root pack) OR when the HEAD probe failed. actions_hash is computed with the same commit_sha value, so empty-SHA records are internally consistent — if a future sync successfully probes a non-empty SHA, the hash differs and the skip-on-hash short-circuit correctly re-executes the pack. Probe failures are surfaced as a grex::walker tracing::warn! line so operators see the signal without the sync aborting. Lockfile-write failures at end-of-sync are intentionally non-fatal (recorded as a report.event_log_warnings entry); the successful pack actions are not rolled back.
branchnoBranch tracked; null if detached.
installed_atyesRFC3339 timestamp of last successful install/sync.
actions_hashyesSHA-256 content fingerprint of the pack's installable surface. Scope varies by pack type (see below). Used to detect whether update needs to re-run install logic.

actions_hash scope by pack type (name retained; semantics explicitly broadened):

  • declarative: hash of normalized actions array + files/ tree.
  • meta: hash of the serialized children array + each child's resolved SHA (from the child's lockfile entry). Captures the fact that a meta pack's installable surface is the set of owned children at pinned revisions.
  • scripted: hash of normalized actions array (if any) + files/ tree + SHA-256 of each hook file in .grex/hooks/ (sorted by filename, then concatenated). Any hook edit re-triggers update.

Rationale for keeping the name actions_hash: the field's purpose — "has the installable content changed since last sync?" — is unchanged; only its per-type inputs differ. Renaming would force a lockfile schema bump for no semantic gain.

Fold for lockfile: last-line-wins per id.

type field authority

The type recorded on add events and in lockfile entries is an observed snapshot of what the pack reported at that moment. The authoritative source of truth is .grex/pack.yaml's type field (see pack-spec.md §Validation rules). If the manifest type disagrees with pack.yaml on a subsequent sync, pack.yaml wins and the manifest is corrected by emitting a fresh add/update event reflecting the true type. Readers MUST NOT treat manifest type as normative when pack.yaml is available.

Atomic append

Single-line append uses buffered write + fsync:

#![allow(unused)]
fn main() {
let mut f = OpenOptions::new().append(true).open("grex.jsonl")?;
f.write_all(line.as_bytes())?;
f.write_all(b"\n")?;
f.sync_data()?;
}

Held under fd-lock. POSIX append is atomic for writes ≤ PIPE_BUF; we enforce event size ≤ 2 KiB to stay inside.

Compaction (temp + rename)

Periodic or on grex doctor --compact:

  1. Acquire global fd-lock (exclusive).
  2. Fold events → state map.
  3. Emit minimal equivalent event set to grex.jsonl.tmp (one add per live id, tombstoned ids dropped entirely).
  4. fs::rename(grex.jsonl.tmp, grex.jsonl) — atomic on POSIX and Windows NTFS (MoveFileEx with REPLACE_EXISTING).
  5. Release fd-lock.

Invariant: fold(pre-compaction) == fold(post-compaction).

Lockfile compaction mirrors intent-log compaction: last-line-wins per id → one line per id → atomic rename.

Locking

Global RW lock via fd-lock:

#![allow(unused)]
fn main() {
let file = OpenOptions::new().read(true).write(true).open("grex.jsonl")?;
let mut lock = fd_lock::RwLock::new(file);
let _guard = lock.write()?;  // exclusive for append/compact
}
  • Mutators (add, rm, update, sync write-phase, doctor --compact) take exclusive write lock.
  • Readers (ls, status, sync read-phase) take shared read lock.

Crash recovery (torn-line detection)

On every read:

  1. Parse line-by-line.
  2. If the final line fails JSON parse AND file does not end in \n, treat as torn write.
  3. Truncate file to length of last valid line.
  4. Emit tracing warning; continue.

Test: tests/crash_recovery.rs spawns a child, SIGKILL / TerminateProcess mid-append, asserts parent recovers.

Schema versioning

Every event has schema_version: "1". Breaking changes bump. Reader rejects unknown versions with actionable error pointing to grex upgrade-schema (post-v1 migration command).

Lockfile entries carry an implicit schema version tied to the workspace config. Separate bump cadence from intent-log schema.

Migration from legacy REPOS.json

grex import --from-repos-json <path> reads flat [{"url":"...","path":"..."},...] → emits one add event per entry with type defaulted to meta (or user-specified via --default-type). Idempotent: re-running detects existing ids by path and no-ops.

Example sequence

{"op":"add","ts":"2026-04-19T10:00:00Z","id":"warp-cfg","schema_version":"1","url":"git@github.com:me/warp-cfg","path":"warp-cfg","type":"declarative","ref":"main"}
{"op":"add","ts":"2026-04-19T10:01:00Z","id":"fonts","schema_version":"1","url":"git@github.com:me/fonts","path":"fonts","type":"meta","ref":"main"}
{"op":"update","ts":"2026-04-19T11:00:00Z","id":"warp-cfg","schema_version":"1","ref":"v0.2.0"}
{"op":"rm","ts":"2026-04-19T12:00:00Z","id":"fonts","schema_version":"1"}

Corresponding lock after first successful sync:

{"id":"warp-cfg","sha":"abc123def","branch":"main","installed_at":"2026-04-19T10:00:05Z","actions_hash":"sha256:..."}
{"id":"fonts","sha":"fff111","branch":"main","installed_at":"2026-04-19T10:01:05Z","actions_hash":"sha256:..."}

Fold of intent log → live set = {warp-cfg} (fonts tombstoned). Subsequent sync rewrites lockfile entry for warp-cfg and drops the fonts line on compaction.

walker

How grex sync traverses your nested meta-pack tree under v1.2.0+ — phase by phase, with the rules that decide what to clone, what to recurse into, and what to refuse.

Canonical source: .omne/cfg/walker.md (SSOT, separate grex-inst repo). This page is the user-facing projection; the SSOT is normative for behaviour.

What is a meta pack?

A pack is any directory carrying <dir>/.grex/pack.yaml. There are two flavours:

  • meta packpack.yaml lists children:. Owns its own lockfile at <meta>/.grex/grex.lock.jsonl. Recursion enters here.
  • leaf packpack.yaml has no children:. Holds actions, no lockfile.

The directory where you run grex is the cwd-meta — the entry point for the recursion. There is no longer a single global "workspace root" anchor (retired in v1.2.0); every recursion frame computes destinations against ITS own meta dir.

Three changes vs. v1.1.x

  1. Parent-relative resolution. dest = current_meta.join(child.path). Each frame uses its own meta dir as the join anchor.
  2. Distributed lockfile. Each meta has its own <meta>/.grex/grex.lock.jsonl listing ONLY direct children. Sub-metas are autonomous — a parent has zero knowledge of grandchildren. See lockfile.
  3. Cargo-style parallel. Direct siblings sync in parallel; sub-meta recursion fires in parallel across siblings. Bounded by concurrency primitives.

The walker is manifest-graph-driven, not filesystem-driven. It only ever visits paths declared by some live manifest's children: list. Undeclared directories on disk — even those carrying their own .git/ — are NOT auto-discovered, NOT auto-registered. v1.1.1's sync-time auto-synthesis is retired; see §5-way classifier.

The three phases

sync(cwd_meta) runs three phases per recursion frame. Each frame is autonomous: load my own pack.yaml, sync only my direct children, then recurse.

Phase 1 — sync direct children (parallel)

For each child in manifest.children, in parallel:

  1. Compute dest = canonical(cwd_meta.join(child.path)). Pre-canonicalization rejects relative segments that would resolve outside cwd_meta.
  2. Re-verify no path segment is a symlink crossing the parent boundary (see §Symlink hardening and toctou).
  3. mkdir -p dest.parent() (idempotent — concurrent siblings sharing an ancestor like tools/ race-safely).
  4. Apply the 5-way classifier (next section).
  5. Upsert a LockEntry into <meta>/.grex/grex.lock.jsonl, keyed by canonical meta-relative POSIX path.

After all children settle, if any landed on the "untracked git" branch the walker returns Err(UntrackedGitRepos(list)) with the complete list — no partial completion. Phase 2 and Phase 3 do not run for this frame.

5-way classifier (Phase 1)

The walker examines dest and routes to exactly one of five branches (evaluated top-down, mutually exclusive):

#Pre-condition at destAction
1Does not existgit clone child.url dest --branch child.ref
2Exists AND is an empty directoryTreat as branch 1 — retry the clone (recovers a failed mid-clone that left an empty dest).
3dest/.git exists AND dest/.grex/pack.yaml does NOTPush onto the untracked list. NO synthesis under v1.2.0+; user must run grex add <url> <path>.
4dest exists, is non-empty, AND lacks .git/Return Err(DestOccupied(dest, content_summary)). Foreign content; refuses to clone-over.
5dest/.git AND dest/.grex/pack.yaml BOTH present (registered pack)git fetch + checkout child.ref. Skip-on-hash if actions_hash and SHA unchanged.

Branches 1, 2, and 5 are the only ones that mutate dest. Branch 2 explicitly recovers a failed-mid-clone state, so a second sync always reaches branch 5 (idempotent). Branch 4 is a hard error — a typo or stale checkout that the walker refuses to silently destroy.

Phase 2 — prune children removed from manifest

Read the lockfile. For each entry whose path is NOT in the current manifest's children: paths:

  1. If dest/.git does not exist → drop the lockfile entry, no rm -rf (idempotent — already gone).
  2. Prune-safety check (default-deny — bypass only with --force-prune):
    • HEAD SHA must match entry.sha.
    • Working tree must be clean (git status --porcelain --ignored empty — covers tracked edits AND ignored content like target/ or node_modules/).
    • No in-progress git op (rebase, merge, cherry-pick, revert, bisect — see force-prune §In-progress probe).
    • Recursive consent walk. If dest contains its own non-empty .grex/grex.lock.jsonl, recursively check every grandchild for the same three conditions. Any dirty/in-progress grandchild → refuse the prune unless --force-prune-recursive.
  3. rm -rf dest (delegated to platform-native helper).
  4. Delete the lockfile entry (atomic rewrite).

Cleanup is CLI-invocation-driven, not eager. Removing a child from pack.yaml triggers prune on the next grex sync / update, not on edit. See force-prune for the full safety contract and audit log.

Phase 3 — recurse into child metas (parallel, autonomous)

For each child, in parallel:

  1. Compute child_dest = cwd_meta.join(child.path).
  2. If child_dest/.grex/pack.yaml exists, parse it.
  3. If the parsed manifest has non-empty children:, recursively call sync(child_dest).

Each recursion is a fresh autonomous frame: it loads its own manifest, walks its own lockfile, syncs its own direct children. Sibling sub-meta syncs run in parallel; the per-pack .grex-lock (see concurrency §Per-pack PackLock) prevents two ops on the same pack path even across recursion frames.

Phase 2 prune semantics deliberately cascade safety checks down the sub-meta tree. A meta whose declared child has its OWN sub-children (grandchildren) cannot be silently pruned if any grandchild is dirty or has an in-progress git op.

Three flag levels graduate the override:

FlagEffect
(none — default)Default-deny. Refuse on any SHA mismatch, dirty tracked file, dirty ignored file, in-progress op, or dirty grandchild.
--force-pruneBypass clean-tree assertions at the named dest. Still respects in-progress ops and still refuses if any grandchild is dirty.
--force-prune-with-ignoredAllow ignored content (e.g. target/, node_modules/) to be destroyed without warning at the named dest.
--force-prune-recursiveCascades the bypass to grandchildren. Required to prune past a dirty grandchild. See force-prune §Blast radius.

grex remove --force <path> is the per-path equivalent of --force-prune: it bypasses checks 2 and 3 at the named dest only. It does NOT cascade past one level.

Validator rules — child.path

Applied at every recursion depth, identical rules:

RuleBehaviour
Forward slash /Allowed (multi-segment paths). Each segment must match ^[a-z][a-z0-9-]*$.
Backslash \Normalised to / at parse-time on all platforms.
.. segment (any position)Rejected.
Absolute pathRejected.
Symlink crossing parent boundaryRejected post-canonicalization.
Empty pathRejected.
Duplicate path across two children: entriesRejected at parse-time as DuplicateChildPath(path).
: in any segmentRejected (NTFS Alternate Data Streams).
$ in any segmentRejected (variable expansion / Windows special).
~digit pattern (progra~1)Rejected (Windows 8.3 short-name aliasing).
NUL byte / control chars \x01-\x1F, \x7FRejected.
Drive-letter prefix (C:, D:)Rejected.

Path segments are NFC-normalised at parse-time before deduplication. Two manifests declaring caf\u00E9/foo (NFC) and cafe\u0301/foo (NFD) collide post-normalisation.

Untracked git policy (5-way branch 3)

v1.1.1's sync-time auto-synthesis (silently registering a plain .git/ discovered at a declared dest) is RETIRED. Under v1.2.0+ the walker NEVER synthesises a manifest from a plain .git/. A declared dest with .git/ but no .grex/pack.yaml is an error, never silently registered.

Contract:

  • The walker collects ALL untracked git repos across one sync invocation.
  • After Phase 1 completes for a frame, if any untracked were collected, the frame returns Err(UntrackedGitRepos(list)) with the COMPLETE list of offenders.
  • Phase 2 (prune) and Phase 3 (recurse) do NOT run for that frame.

User remediation: explicitly register each path with grex add <url> <path>. The walker has no opinion on which url is correct — that is operator-supplied by design.

The error message cites every untracked dir's absolute path so you can fix all in one batch rather than iteratively.

dest_has_git_repo(dest) refuses symlinked destinations outright via std::fs::symlink_metadata. Closes the symlink-redirection attack: a parent declaring path: code against a meta where <meta>/code -> $HOME cannot trick the walker into operating on $HOME/.git.

Reparse-point and gitfile policy. Maintainer-locked: REJECT ALL Windows junctions and non-symlink reparse points. v1.2.0+ rejects on Windows: IO_REPARSE_TAG_MOUNT_POINT (junctions, mklink /J), all reparse points except proper symlinks, and gitfile .git (regular file containing gitdir: ...). POSIX symlinks accepted with the boundary check; Windows proper symlinks accepted with the same check (they have a proper security model since Win10). Junctions and gitfile .git are unconditionally rejected — no flag, no override.

For the dirfd-binding TOCTOU mitigation that closes the path-swap window between canonicalize and clone, see toctou.

Cycle detection

Each recursion pushes pack_identity_for_child(child) (url:<url>@<ref>) onto an in-progress stack; a repeat returns TreeError::CycleDetected. Identity for the cwd-meta itself is path-keyed; for children it is URL+ref so the same repo at two distinct refs is distinct.

Lockfile keying

Lockfile entries within a meta are keyed by the canonical relative POSIX path of the child within that meta — single segment for direct children, but the writer always normalises through the path-keyed code path. v1.1.x bare-name keys remain valid as the degenerate single-segment case; readers fall back to bare-name lookup for legacy entries. See lockfile §Path keying and v1.1.1→v1.2.0 read-fallback for the full migration story.

Cross-references

  • Distributed lockfile schema, three readers, v1.1.1→v1.2.0 migration: lockfile
  • Bounded semaphore + per-pack lock + Lean4 invariant: concurrency
  • Force-prune semantics, audit log, blast radius: force-prune
  • BoundedDir TOCTOU primitive (cap-std + Linux openat2): toctou
  • Manifest event log + crash recovery: manifest
  • Pack layout + .grex/ contract: pack-spec

lockfile

grex.lock.jsonl — the resolved-state snapshot that pins each pack's last-synced commit, ref, install timestamp, and actions_hash. Companion to but distinct from the events.jsonl intent/audit log (see manifest).

Canonical source: .omne/cfg/lockfile.md (SSOT, separate grex-inst repo). This page is the user-facing projection.

Concept: pack.yaml = INTENT, lockfile = STATE

The two artifacts answer different questions:

ArtifactQuestion answeredAuthored by
pack.yaml (+ events.jsonl)"Which children at which paths at which refs do I want?"User / pack author
grex.lock.jsonl"Which commits am I currently at, with which actions applied?"sync / update

Same intent-vs-state separation as Cargo (Cargo.toml / Cargo.lock), npm (package.json / package-lock.json), Bundler, Poetry. The lockfile lives next to its meta's manifest — see §File location.

Why both are needed

  • Idempotency (skip-on-hash). sync re-resolves the ref → SHA, recomputes actions_hash, compares to the recorded entry, short-circuits if both match. Without a lockfile, every sync would re-execute every action.
  • Drift triangulation (3-leg). doctor compares declared (manifest) vs recorded (lockfile) vs present (disk). A 2-leg model cannot distinguish "user edited pack.yaml since last sync" from "someone hand-edited the working tree".
  • Concurrent-sync safety. Lockfile-write happens under the manifest fd-lock; sync reads it once at plan phase and writes once at the end.

File location

<meta>/.grex/grex.lock.jsonl

Distributed under v1.2.0+: EACH meta owns its own <meta>/.grex/grex.lock.jsonl, tracking ONLY that meta's direct children. There is no global workspace lockfile — each recursion frame in walker §Three phases reads and writes its OWN lockfile.

Both the lockfile (.grex/grex.lock.jsonl) and the event log (.grex/events.jsonl) live in the manifest folder .grex/. Their names are deliberately distinct (no shared grex.*.jsonl prefix) to prevent the lockfile-vs-event-log conflation that caused historical SSOT errors.

Three "lock" artifacts — disambiguation

The codebase has three artifacts whose names contain "lock". The rule of thumb: if it ends in .jsonl it carries state; if it does not, it is a mutex.

ArtifactPath (v1.2.0+)PurposeFormat
THE lockfile<meta>/.grex/grex.lock.jsonlResolved-state snapshot (commit + actions_hash per pack)JSONL
Event log<meta>/.grex/events.jsonlAppend-only history of add/rm/update/sync eventsJSONL
Manifest fd-lock<meta>/.grex.lockOS-level file mutex serialising lockfile + event-log writesEmpty file

Other file mutexes (<meta>/.grex.sync.lock per-meta-sync, <dest>.grex-backend.lock per-repo, <pack>/.grex-lock per-pack) are documented in concurrency §Five cooperating mechanisms; none of them carry state — they exist solely for mutual exclusion.

LockEntry schema

{"id":"warp-cfg","sha":"abc123...","branch":"main","installed_at":"2026-04-19T13:05:00Z","actions_hash":"sha256:deadbeef...","path":"warp-cfg"}
FieldSinceNotes
idv1.0Pack name: slug; matches Event::Add.id.
shav1.0Resolved commit SHA; empty string if pack is non-git or HEAD probe failed.
branchv1.0Tracked branch; null if detached.
installed_atv1.0RFC3339 timestamp of last successful install/sync.
actions_hashv1.0SHA-256 over installable surface (scope per pack-type — see manifest §actions_hash scope).
schema_versionv1.0Bumped on breaking lockfile schema change.
syntheticv1.1.1true for plain-git children synthesized by the walker (semantically dead under v1.2.0+ — see below).
pathv1.2.0Option<String> #[serde(default)], parent-meta-relative POSIX, normalised at write-time. Lookup-map key.

The path field is the lookup-map key under v1.2.0's nested-children layout. See walker §Lockfile keying.

Three readers

ReaderWhat it does with the lockfile
syncSkip-on-hash. Re-resolve commit, recompute actions_hash, compare against the prior LockEntry. Match → skip; mismatch → re-execute.
doctorDrift triangulation. Joins three legs: declared (manifest fold), recorded (lockfile entry), present (disk readdir / git probe). Each pair-mismatch is a distinct drift class.
lsState render. Per-pack synced / unsynced status. ls --long reads SHA + installed_at + actions_hash directly without folding the event log.

Path keying and v1.1.1 → v1.2.0 read-fallback

Through v1.1.1, lockfile entries were keyed by bare pack id (manifest name:), and the flat-sibling rule guaranteed id was unique within the single global workspace lockfile. Under v1.2.0's nested child paths, two declared children at distinct paths (e.g. tools/foo and vendor/foo) MAY share the same name: — a bare-id key would collide.

v1.2.0 decision: path-keyed, per-meta lockfile. Each meta owns its own lockfile tracking ONLY its direct children. Within that lockfile the in-memory index keys entries by meta-relative pack path (canonical relative POSIX, normalised at write-time, NFC).

Read-time fallback for v1.1.1 lockfiles

When a v1.2.0 binary reads a v1.1.1 lockfile entry where path: None (deserialized via #[serde(default)]), the path is derived as Some(entry.id.clone()). This is sound because v1.1.1 enforced bare-name-only paths (the validator rejected /), so id == path for all v1.1.1 entries. The walker proceeds without rewriting the file. After the next successful sync the entry is rewritten with path: Some(...), and subsequent reads bypass the fallback.

This means v1.2.0 reads v1.1.1 lockfiles cleanly with no manual migration step required. The library function grex_core::lockfile::migrate_v1_1_1 and the planned grex migrate-lockfile CLI subcommand (v1.2.1) exist for users who want to eagerly upgrade the lockfile bytes to v1.2.0 schema (e.g. before committing to git). Both are opt-in.

Migration path summary

ScenarioBehaviour
v1.2.0 binary reads v1.1.1 lockfile (no edit)Read-fallback: path derived as id. Sync proceeds.
v1.2.0 binary writes after a successful syncAll entries written with path: Some(...). Subsequent reads use the on-disk path directly.
User wants to eagerly upgrade lockfile bytesgrex migrate-lockfile [--dry-run] [--workspace <path>] (v1.2.1) — atomic temp+rename, idempotent.
User downgrades v1.2.0 → v1.1.xv1.1.x reader ignores path: field (forward-compat — unknown fields skipped); id-keyed lookup still works.

LockEntry.synthetic deprecation

The synthetic: bool field on LockEntry (introduced v1.1.1 to mark plain-git children synthesised by the walker) is semantically dead under v1.2.0+. No v1.2.0 code path sets synthetic: true. The field is retained on the struct for backward-compat reads — v1.1.x lockfiles continue to deserialize cleanly.

A future schema bump (post-v1.2.0) MAY drop the field. Until then, a successful v1.2.0 sync against a workspace with synthetic entries either (a) finds the path now registered via grex add (entry rewritten with synthetic: false), or (b) reports UntrackedGitRepos and refuses to proceed. Either way, no v1.2.0+ sync writes a fresh synthetic entry.

Lifecycle

  1. First sync. Walker reads pack.yaml graph → clones each child → runs install actions → writes one LockEntry per direct child of the cwd-meta into <cwd-meta>/.grex/grex.lock.jsonl. Sub-metas write their own lockfiles in their own .grex/ dirs.
  2. Re-sync (no edits). Walker re-resolves refs → for each pack, recomputes actions_hash and compares to the recorded entry. Match → skip; lockfile entry carried forward unchanged.
  3. Re-sync after pack.yaml edit. User changes a child's ref or an actions block → next sync's actions_hash differs → pack re-executes → LockEntry rewritten with new sha / actions_hash / installed_at.
  4. Child removed from pack.yaml children:. Next sync's walker Phase 2 reconciles the lockfile against the manifest, deletes the orphan dest (subject to prune-safety — see force-prune), and removes the lockfile entry.
  5. grex doctor. Reads lockfile + intent-log fold + disk state → flags drift across the three legs.
  6. Lockfile-write failure at end-of-sync. Intentionally non-fatal. Successful pack actions are not rolled back; the failure is recorded as a report.event_log_warnings entry.

Crash recovery

Lockfile writes use write-then-rename atomicity (write to <lockfile>.tmp, fsync, rename over the original). A crash mid-write leaves either the old or the new file fully intact — never a torn JSONL. The manifest fd-lock (see concurrency) serialises all writes, so concurrent torn writes are also impossible.

On read, parse-failure of any line surfaces as LockfileCorrupt(path, line_no, parse_error):

  • Severity Warning under grex doctor (which can repair the file by replaying from the event log + a clean re-sync).
  • Severity Error under grex sync (refuses to plan against a corrupt lockfile).

User remediation path: grex doctor → repair → re-sync.

Cross-references

  • Walker keying decision + parent-relative model: walker
  • File mutexes (sync-lock, backend-lock, pack-lock, manifest-lock) + Lean4 I1: concurrency
  • Schema field table + intent-log split + crash recovery: manifest
  • Force-prune audit log + safety contract: force-prune

concurrency

Tokio runtime, bounded semaphore, per-pack file lock, per-meta manifest lock. One Lean4-verified invariant.

Canonical source: .omne/cfg/concurrency.md (SSOT, separate grex-inst repo). This page is the user-facing projection.

Runtime

#[tokio::main(flavor = "multi_thread", worker_threads = ...)]
async fn main() -> anyhow::Result<()> { ... }

Worker threads default = num_cpus::get(), overridable via --parallel N or GREX_PARALLEL env. The same --parallel N cap is honoured by the rayon scheduler that drives sibling sync within one meta — see walker §Phase 1 and walker §Phase 3.

Five cooperating mechanisms

  1. Per-meta sync lock<meta>/.grex.sync.lock (fd-lock, non-blocking, fail-fast). Held for the full grex sync lifetime of THAT meta's frame. Two concurrent grex sync invocations against the same meta are a hard error, not a queue. v1.x → v1.2.0: through v1.x this was a single <workspace>/.grex.sync.lock at the workspace root (one global lock per workspace). Under v1.2.0+ each meta owns its own fd-lock under its own dir; cross-meta locks are independent (distinct metas never serialize against each other), the walker's recursion acquires + releases one lock per meta frame, and cargo-style parallel sub-meta sync is N concurrent fd-locks across the meta tree (one per meta currently being processed). Locking is per-meta, never global.
  2. Per-repo backend lock<dest>.grex-backend.lock (fd-lock, sibling file NOT inside <dest> so it survives <dest> wipe). Held across clone + fetch + materialise_tree for one repo path.
  3. Bounded semaphore — caps in-flight pack ops across the process.
  4. Per-pack .grex-lock — prevents two ops on the same pack path across processes and tasks.
  5. Per-meta manifest RW lock (fd-lock) — serialises that meta's lockfile + event-log writes. v1.x → v1.2.0: under v1.2.0+ each meta has its own manifest fd-lock at <meta>/.grex.lock; the lock is per-meta, not global, so distinct metas may mutate their own lockfile + event log in parallel.

Lock acquisition order (fixed, deadlock-free): per-meta-sync → semaphore → pack-lock → repo-backend → manifest-lock. Never reversed.

TOCTOU closure

The sync pipeline revalidates the per-meta dirty-check twice:

  1. Before attempting to acquire the per-meta sync lock (fast reject).
  2. After acquiring the per-meta sync lock AND immediately before calling materialise_tree (authoritative — any drift between steps 1 and 2 surfaces here).

Rationale: a concurrent non-sync writer (e.g. the user editing a file) could dirty the tree between our initial check and the moment we begin applying actions. The second check closes the window.

The path-swap TOCTOU (attacker swapping a directory for a symlink between canonicalize(dest) and the actual filesystem write) is closed by the BoundedDir dirfd-binding primitive — see toctou.

Recovery scan

At sync startup, before acquiring the per-meta lock, grex runs an informational recovery scan that:

  • Lists stale .grex.sync.lock / <dest>.grex-backend.lock whose owning PID is gone.
  • Lists incomplete event brackets in the manifest (action_started with no matching action_completed / action_halted).

The scan only logs — it never mutates. Auto-cleanup is grex doctor territory.

Bounded semaphore

#![allow(unused)]
fn main() {
use tokio::sync::Semaphore;
use std::sync::Arc;

pub struct Scheduler {
    permits: Arc<Semaphore>,
}

impl Scheduler {
    pub fn new(parallel: usize) -> Self {
        Self { permits: Arc::new(Semaphore::new(parallel)) }
    }

    pub async fn run<F, T>(&self, pack_path: &std::path::Path, fut: F) -> anyhow::Result<T>
    where
        F: Future<Output = anyhow::Result<T>> + Send,
        T: Send,
    {
        let _permit = self.permits.clone().acquire_owned().await?;
        let _plock  = PackLock::open(pack_path)?.acquire_async().await?;  // v1.2.4+: legacy `PackLock::acquire` is a deprecated shim
        fut.await
    }
}
}

The semaphore caps process-wide in-flight pack ops; the per-pack lock prevents double-execution of the same pack path across recursion frames or invocations. Sibling parallelism inside one meta and sub-meta parallelism across metas both run under the same semaphore cap.

Per-pack PackLock

File: <pack_workdir>/.grex-lock. Held exclusively via fd-lock::RwLock::write. Non-blocking try-first; on contention the task yields and retries with backoff.

API note (v1.2.4+): the canonical async entry point is PackLock::acquire_async (and PackLock::acquire_cancellable for the cancellable variant). The original PackLock::acquire signature shown in the sketch below is deprecated and retained only as a thin shim for backward compatibility — new call sites should use acquire_async.

#![allow(unused)]
fn main() {
pub struct PackLock {
    _guard: fd_lock::RwLockWriteGuard<'static, std::fs::File>,
}

impl PackLock {
    // Deprecated since v1.2.4 — prefer `acquire_async` (shown for prose continuity only).
    pub async fn acquire(pack_path: &std::path::Path) -> anyhow::Result<Self> {
        let lock_path = pack_path.join(".grex-lock");
        let file = std::fs::OpenOptions::new()
            .create(true).read(true).write(true)
            .open(&lock_path)?;
        let lock = fd_lock::RwLock::new(file);
        // retry loop: try_write() → on WouldBlock sleep + retry
        // ...
    }
}
}

Released on Drop. The file is NOT deleted on release (avoids a TOCTOU race). grex doctor prunes stale .grex-lock files whose owning PID is gone.

Per-meta manifest RW lock

Any events.jsonl or grex.lock.jsonl mutation takes exclusive fd_lock::RwLock::write on the meta-local <meta>/.grex.lock. Readers take shared read. See manifest. The three-way disambiguation between this fd-lock file (.grex.lock), the lockfile (.grex/grex.lock.jsonl), and the event log (.grex/events.jsonl) lives in lockfile §Three "lock" artifacts.

Because the lock is per-meta under v1.2.0+, distinct metas can mutate their own lockfile + event log in parallel. There is no global serialisation point at the manifest layer — the only cross-meta serialisation is the bounded process-wide semaphore on in-flight pack ops.

Scheduler pseudocode

schedule(packs, op):
    futures = []
    for pack in packs:
        fut = async {
            _sem_permit     = semaphore.acquire()            # bound parallelism
            _pack_lock      = PackLock::acquire_async(pack.path)   # per-pack exclusive (v1.2.4+; sync `acquire` is deprecated shim)
            result          = op.run_on(pack)
            _manifest_lock  = pack.meta.manifest.write_lock()  # innermost (per-meta)
            manifest.append(event_from(result))
            drop(_manifest_lock)                             # release innermost first
            result
        }
        futures.push(fut)
    return join_all(futures)

Key property: locks acquired outer-to-inner, released inner-to-outer. Manifest lock is the briefest; semaphore the longest.

Lean4 invariant I1 (no_double_lock)

Invariant I1: for any two concurrent tasks t1, t2 scheduled by Scheduler, if t1.pack_path == t2.pack_path, then their lock-holding windows do NOT overlap in time.

I1 = "Invariant 1" — first concurrency-series invariant. Distinct from walker I1 (boundary preservation) and architecture I1 (the same scheduler theorem re-cited from the architecture doc). See the invariant series cross-reference table in the SSOT.

Informal: PackLock::acquire_async (canonical entry point since v1.2.4; the legacy PackLock::acquire is a deprecated shim) is exclusive per path; the later arrival awaits the earlier's drop.

File: proof/Grex/Scheduler.lean.

Sketch:

namespace Grex.Scheduler

structure Task where
  path    : String
  started : Nat      -- logical clock
  ended   : Nat
  deriving Repr

def Schedule := List Task

def overlaps (a b : Task) : Prop :=
  a.started < b.ended ∧ b.started < a.ended

-- PackLock is modeled as FIFO queue per path:
-- acquire(p) returns only after all prior holders for p have released.
axiom pack_lock_exclusive
    (s : Schedule) (a b : Task) :
    a ∈ s → b ∈ s → a.path = b.path → a ≠ b → ¬ overlaps a b

-- I1: scheduler never holds two concurrent locks on the same pack path.
theorem no_double_lock
    (s : Schedule) (a b : Task)
    (ha : a ∈ s) (hb : b ∈ s) (hpath : a.path = b.path) (hne : a ≠ b) :
    ¬ overlaps a b :=
  pack_lock_exclusive s a b ha hb hpath hne

end Grex.Scheduler

CI job (.github/workflows/lean.yml):

- uses: leanprover/lean-action@v1
- run: cd proof && lake build

Zero sorry; zero unresolved axiom outside the stated model-bridging ones.

Walker I8 reduction

Walker invariant I8 (parallel sync of disjoint sub-trees commutes — see walker §Three changes vs v1.1.x) reduces to concurrency I1 for its mutual-exclusion lemma. The shipped axiom sync_disjoint_commutes in proof/Grex/Walker.lean covers the disjoint-pack work commutativity that the rayon scheduler relies on; no new theorem is required for the v1.2.1 rayon sibling-sync swap.

Operational tuning

  • --parallel default = num_cpus::get(). Typical 4-16.
  • Git fetch is IO-bound → higher parallelism helps until network saturates.
  • Shell-out actions (exec) may be internally multi-threaded; consider a per-type cap in v1.x.

Telemetry

Each scheduled task emits a tracing span: pack_path, op, duration_ms, result. grex doctor can read the last-N spans from an on-disk journal for retrospective diagnosis.

Cross-references

  • Walker phases + parallel sibling/sub-meta scheduling: walker
  • Distributed lockfile + per-meta manifest fd-lock disambiguation: lockfile
  • Manifest event-log atomic append + crash recovery: manifest
  • TOCTOU BoundedDir (cap-std + Linux openat2): toctou
  • Force-prune audit log (writes through this manifest fd-lock): force-prune

force-prune

Default-deny safety contract for walker Phase 2 — and the --force-prune family of flags that override it. With audit log, blast-radius analysis, and a forward reference to the v1.2.1 --quarantine snapshot.

Canonical source: .omne/cfg/walker.md §Cleanup semantics (SSOT, separate grex-inst repo). A dedicated .omne/cfg/force-prune.md will land in the SSOT repo separately. This page is the user-facing projection.

When does prune fire?

Cleanup is CLI-invocation-driven, not eager. Removing a child from pack.yaml children: triggers prune on the next grex sync / update invocation, not on edit. Phase 2 reconciles each meta's lockfile against its current manifest and rm -rfs any orphans — subject to the safety contract below.

PropertyBehaviour
TriggerManifest edit removes child, then user runs any grex command that touches the meta.
Scoperm -rf <meta>/<child.path> AND drop the lockfile entry.
EagernessCLI-invocation-driven (NOT filesystem-watcher-eager).
IdempotencyRe-running with the child still removed: lockfile already lacks entry, rm -rf is a no-op.
Cross-metaEach meta cleans its OWN orphans only.
SafetyDefault-deny on dirty / SHA-mismatched / in-progress dest; bypass via --force-prune.

Safety contract

Phase 2 must NOT silently destroy modified or shared content. Before rm -rf, the walker verifies the dest still matches the state recorded in the lockfile.

Adversary scenario

User cp -rs a child folder into a sibling meta and re-registers it there, then removes the original entry from the source meta's pack.yaml. Without verification, the source meta's Phase 2 destroys the now-shared folder while the sibling meta still believes it owns it.

The five checks

Default behaviour (override only with the --force-prune family below):

  1. Missing .git/ at dest. Treated as already-gone — drop the lockfile entry, no rm -rf. Idempotent.

  2. HEAD SHA mismatch (git rev-parse HEADLockEntry.sha). Abort with Err(DirtyDestRefuseToPrune(path, lockfile_sha, dest_sha)). The user has either rebased, fetched without resyncing, or the dest was swapped for foreign content.

  3. Dirty working tree (git status --porcelain --ignored non-empty). Abort with Err(DirtyDestRefuseToPrune(...)). The user has uncommitted edits OR gitignored content (build artefacts, deps caches, e.g. target/, node_modules/) that prune would silently destroy.

  4. Sub-meta consent walk. If dest contains <dest>/.grex/grex.lock.jsonl with non-empty entries, recursively check every grandchild for the same conditions. Any dirty/in-progress grandchild → refuse the prune unless --force-prune-recursive. grex remove --force <path> does NOT cascade past one level.

  5. In-progress git op probe. Refuse if any of these exist at <dest>/.git/:

    • rebase-merge/, rebase-apply/ (in-progress rebase)
    • MERGE_HEAD, CHERRY_PICK_HEAD, REVERT_HEAD (in-progress merge / cherry-pick / revert)
    • BISECT_LOG, sequencer/ (in-progress bisect / sequencer)

    Even if HEAD SHA matches lockfile and the working tree is clean, an in-progress git op blocks prune. No flag bypasses this except --force-prune-recursive combined with explicit per-path --force-prune.

  6. Match — clean tree, SHA equal to lockfile, no in-progress op, no dirty grandchild. rm -rf proceeds.

The --force-prune flag family

FlagEffect
--force-pruneBypass clean-tree assertions (checks 2 and 3) at the named dest. Still respects in-progress ops (check 5) and still refuses if any grandchild is dirty (check 4).
--force-prune-with-ignoredAllow ignored content (target/, node_modules/) to be destroyed without warning at the named dest. Useful when the only "dirty" content is a build cache.
--force-prune-recursiveCascades the bypass to grandchildren. Required to prune past a dirty grandchild.

grex remove --force <path> is the per-path equivalent of --force-prune: it bypasses checks 2 and 3 at the named dest only, never cascades.

The flag family is opt-in by design: a typo in pack.yaml should surface as a refusal, not as data loss.

Loss profile

Loss of ignored content (build artefacts, deps caches) is recoverable but expensive (re-compile / re-fetch). Loss of tracked dirty edits is unrecoverable. Loss of an in-progress rebase is unrecoverable from --force-prune-recursive's vantage point even though the underlying commits are still in .git/objects/ — the working state and rebase script are gone.

Audit log

Every --force-prune, --force-prune-with-ignored, or --force-prune-recursive invocation appends an entry to <meta>/.grex/events.jsonl BEFORE the rm -rf fires:

{"op":"force-prune","ts":"2026-04-30T10:00:00Z","id":"<pack-id>","schema_version":"1","path":"<meta-relative path>","lockfile_sha":"<sha>","dest_sha":"<sha>","dirty_files":<n>,"ignored_size":<bytes>}

The audit entry is fsync'd before the deletion proceeds. A crash mid-prune leaves a recoverable trail of what was about to be destroyed:

  • The fsync barrier guarantees the audit line hits stable storage before any unlink syscall fires.
  • On recovery, grex doctor can read the orphan entry and report "force-prune was about to delete <path>; the dest is gone — no recovery possible without git or filesystem-level undelete".
  • The audit lives in the same per-meta event log used by add / rm / update — see manifest §events.jsonl event schemas for the common envelope and atomic-append guarantees.

Blast radius

The blast radius of a --force-prune invocation is bounded as follows.

FlagWithin scope (deletable)Out of scope (untouched)
--force-pruneThe named dest's tracked dirty edits at the top levelAny grandchild with its own dirty edits or in-progress op (check 4 still refuses)
--force-prune-with-ignoredAll of --force-prune plus ignored content (target/, node_modules/, etc.) at the named destAny grandchild's ignored content (check 4 still applies)
--force-prune-recursiveThe full sub-tree, including grandchildren's tracked dirty edits and ignored contentSibling metas (cleanup is per-meta — see walker §Phase 2)

The walker NEVER deletes outside the cwd-meta's own tree. Sibling metas and parent metas are unreachable from any --force-prune invocation.

Recovery

Once rm -rf has fired, there is no in-band recovery path under v1.2.0. Options:

  1. Restore from backup.
  2. Use a filesystem-level undelete tool (extundelete, ntfsundel, etc.) — typically only succeeds on recent deletes against quiet filesystems.
  3. If the deleted dest was a git working tree, git/objects/ may still be available in a parent's .git/modules/ subtree (only if grex's clone used submodule semantics — rare).

v1.2.1 --quarantine flag (PLANNED, TBD)

The v1.2.1 release plans an opt-in --quarantine flag on --force-prune and --force-prune-with-ignored that snapshots the entire dest sub-tree to <meta>/.grex/trash/<ISO8601>/<basename>/ BEFORE the rm -rf fires. Failure of the snapshot aborts the prune (no delete). The Lean4 theorem Grex.Walker.quarantine_snapshot_precedes_delete is the gate that lets the Rust implementation land — proof-first per the SSOT rule.

The conceptual feature name is "quarantine"; the on-disk folder is named trash/. Per-meta scope (each meta has its own .grex/trash/ bucket). Not present in v1.2.0; see the v1.2.1 spec for the LOCKED layout decisions and acceptance criteria.

Until --quarantine lands, --force-prune is irreversible. The audit log is the only forensic trail.

Cross-references

  • Walker Phase 2 algorithm + 5-way classifier context: walker
  • Lockfile entry + path keying (the lookup map for prune candidates): lockfile
  • Per-meta manifest fd-lock + audit-append serialisation: concurrency
  • Audit-log envelope + crash-recovery torn-line detection: manifest
  • BoundedDir TOCTOU primitive (the rm -rf itself is dirfd-bound): toctou

toctou

The BoundedDir primitive — how grex closes the path-swap TOCTOU window between canonicalize(dest) and the actual filesystem write. Hybrid cap-std (uniform) plus Linux openat2(RESOLVE_BENEATH) (internal acceleration).

Canonical source: forthcoming .omne/cfg/toctou.md (SSOT, separate grex-inst repo). For now this page derives from .omne/cfg/walker.md §Symlink hardening, .omne/cfg/rust-design-decisions.md §6, .omne/proof/impl-axiom-bridge.md §3 (sync_local_writes), and crates/grex-core/src/fs/boundary.rs (the implementing module).

What is TOCTOU?

TOCTOU = Time-Of-Check / Time-Of-Use — a race-condition class where a program checks a property of a path (e.g. "this resolves to <meta>/code/.git") and then operates on the same path later, between which an attacker swaps the path's target.

The classic walker race window without BoundedDir:

parent.canonicalize() → resolve(child) → fs::create_dir_all(dest) → clone(dest)
                      ▲                ▲                          ▲
                      └─ race window ──┴─ swap dest for symlink ──┘

An attacker who can write inside the workspace mid-flight could redirect the clone write to an arbitrary location — for example, replace <meta>/code/ with a symlink to $HOME/.ssh/, then watch grex happily clone-over the user's keys.

Path-string-based filesystem APIs (std::fs::create_dir_all(path), std::fs::write(path, ...)) re-walk the path on every call. Each walk is a fresh check; nothing in the standard library ties consecutive calls to the SAME inode the prior call resolved.

Why TOCTOU matters for grex sync

The walker mutates the filesystem in three places — each is a TOCTOU surface if not bound to a kernel-vouched handle:

  1. Phase 1, branch 1 (clone). git_clone(child.url, dest, child.ref) writes to dest. A path-swap attack between the validator's canonicalisation pass and the clone could redirect the write outside <meta>'s subtree.
  2. Phase 1, branch 2 (recover empty dest). Same exposure as branch 1 — the recovery clone re-walks dest's path.
  3. Phase 2 (rm -rf). Deleting the dest after a default-deny prune-safety pass. A swap between the safety check and the unlink could redirect rm -rf to a location outside <meta> (catastrophic — see force-prune §Blast radius for the boundary contract that depends on this NOT happening).

See the 5-way classifier in walker §Phase 1 for context — branches 1, 2, and 5 are the only branches that mutate dest; branch 5 (git fetch) operates against the already-bound dest and is not a fresh path-walk.

BoundedDir — the primitive

BoundedDir is a thin wrapper around a kernel-confirmed directory handle (a "dirfd") obtained for a path provably contained beneath a parent directory. Once constructed, the handle is bound to the inode the kernel resolved at construction time — a subsequent attacker swap of the parent path for a symlink cannot redirect operations performed through the handle.

BoundedDir::open(parent, child_relative)? → handle bound to inode

Downstream operations either go through the dirfd (write confined) or compare against BoundedDir::path (the canonicalised, post-resolve path) — both of which the kernel has already vouched for.

The module lives at crates/grex-core/src/fs/boundary.rs. Visibility is pub(crate) — cap-std types do not leak into the public API surface, so a future implementation swap (e.g. raw openat2 plumbing) does not bump the grex-core SemVer.

Hybrid strategy: cap-std uniform, openat2 internal

Per design decision §6 in .omne/cfg/rust-design-decisions.md:

PlatformWhat BoundedDir actually does
Linux ≥ 5.6cap-std internally uses openat2(RESOLVE_BENEATH) — single syscall, kernel-enforced bound
Linux < 5.6cap-std falls back to O_NOFOLLOW-by-component verification
macOS / Windowscap-std uses platform-equivalent capability handles

The BoundedDir API is uniform across all platforms — callers do not branch. The openat2 acceleration on modern Linux is an internal detail.

Why uniform cap-std rather than per-OS hand-rolling

Trade: the marginal performance of a single-syscall openat2 versus three concrete benefits:

  1. No unsafe in grex-core. The crate carries #![deny(unsafe_code)]. Hand-rolled openat2 plumbing requires unsafe for the libc syscall surface.
  2. One code path to test across OSes. Per-OS branches multiply the test matrix and the audit surface.
  3. No kernel-version branching at runtime. cap-std handles the Linux ≥ 5.6 / < 5.6 split internally; grex-core never observes it.

Dirfd-binding model

What "bound to an inode" actually means:

  1. BoundedDir::open(parent, child_relative) opens parent as a cap_std::fs::Dir. The kernel resolves parent to an inode and gives back a file descriptor referring to that specific inode.
  2. The constructor then resolves child_relative through that dirfd — the kernel walks segment-by-segment, verifying each step does not escape the parent capability (no .., no symlink-to-outside, no absolute redirect).
  3. The returned BoundedDir carries the verified child dirfd. All future operations route through this fd, not through the original path string.
  4. If an attacker swaps parent/child_relative for a symlink AFTER step 2, subsequent reads/writes through the BoundedDir still hit the original inode the kernel resolved at step 2 — the attacker's swap is invisible to the bound handle.

Dropping the BoundedDir releases the dirfd; the inode is then eligible for unlinking by other processes (this is fine — by then the walker is done with that dest).

Path-swap attack closure

Concretely, the attack the walker now defeats:

T+0   user runs: grex sync .
T+1   walker validates: canonicalize(<meta>/code/) → /home/u/proj/code (real dir)
T+2   walker calls: BoundedDir::open(<meta>, "code") → fd 17 bound to inode 99
T+3   ATTACKER (running concurrently): rm <meta>/code; ln -s $HOME/.ssh <meta>/code
T+4   walker calls: git_clone(url, BoundedDir::path(&fd17)) → writes to inode 99
                    (the original /home/u/proj/code, NOT $HOME/.ssh)

Without BoundedDir, step T+4's git_clone would re-walk the path string <meta>/code/ and follow the attacker's symlink to $HOME/.ssh. With BoundedDir, the write is bound to the kernel-vouched inode from T+2.

The walker also rejects symlinks at canonicalisation gate (see walker §Symlink hardening) — but that gate runs at validation time, BEFORE the bind. BoundedDir is what closes the window between validation and write.

Lean4 axiom: sync_local_writes

The Lean4 mechanisation models this as a bridge axiom in proof/Grex/Bridge.lean:

axiom sync_local_writes
    (parent : Path) (w : World) (q : Path) :
    ¬ descends q parent →
    (sync parent w).tracked q = w.tracked q ∧
    (sync parent w).lock q    = w.lock q    ∧
    (sync parent w).hasGit q  = w.hasGit q

"Bridge 3. sync writes only inside its argument subtree. Every other path's tracked, lock, and hasGit are unchanged. This is the model-level statement of v1.2.0's parent-relative discipline."

The Rust contract that discharges this axiom is precisely the BoundedDir capability handle. Without it, a malicious symlink inside the subtree could cause sync to clobber w.hasGit q for a q that does not descend from parent, falsifying the axiom.

The validate_children_paths gate (rejects .. and absolute segments) is necessary but NOT sufficient on its own; the capability-handle invariant is what closes the symlink-traversal escape window. Any change to the Rust impl that swaps cap-std for raw std::fs MUST re-prove this axiom (or bridge it via an explicit "no-symlink-escape" lemma) — .omne/proof/impl-axiom-bridge.md §3 documents this re-review trigger.

What BoundedDir does NOT cover

  • O_NOFOLLOW does not protect against TOCTOU on the parent itself. BoundedDir::open(parent, child) opens parent as the capability root — if the attacker swaps the parent's path to a different inode BEFORE BoundedDir::open is called, the bind happens against the wrong inode. The walker mitigates this by binding the cwd-meta's parent dirfd at sync-startup, before any per-child resolution fires; from then on every recursion's parent is itself a fd, not a path string.
  • Concurrent writers within the bound dirfd. If a non-grex process holds a write descriptor under the bound dirfd, the walker's writes can race with theirs at the inode level. The per-pack .grex-lock (see concurrency §Per-pack PackLock) closes the in-grex case; cross-tool coordination is out of scope for BoundedDir.
  • Filesystem-layer attacks. A malicious filesystem (FUSE, network mount with adversarial server) can violate the kernel's inode-stability guarantee. BoundedDir assumes a non-adversarial filesystem layer.

Cross-references

  • Walker phases that perform the bound writes: walker
  • rm -rf blast-radius contract that depends on bound deletes: force-prune
  • Per-pack .grex-lock mutual exclusion (orthogonal to TOCTOU): concurrency
  • Lockfile atomic rewrite (also dirfd-bound under v1.2.0+): lockfile

cli — v1 frozen verb contract

12 verbs. Freeze is additive-only post-v1: adding verbs, flags, or JSON-output fields is allowed; removing or renaming is a v2 change.

Universal flags

FlagEffect
--jsonemit machine-readable JSON to stdout, suppress ANSI
--plainANSI off, no Unicode, CI/agent-friendly
--dry-runcompute plan, print it, do NOT mutate disk or manifest
--parallel Nbound scheduler semaphore to N permits (default: num_cpus)
--filter <expr>restrict verb to matching packs (name glob, type, depth)
--manifest <path>override default ./grex.jsonl
-v, -vv, -vvvtracing verbosity

Output mode precedence: --json > --plain > TTY auto-detect pretty default.

Exit codes

CodeMeaning
0success
1generic error
2CLI usage error
3manifest integrity failure
4pack op failed (fetch/install/sync/teardown)
5lock contention / concurrency
6MCP protocol error
7doctor found drift
8plugin / unknown action or pack-type

The 12 verbs

grex init

Initialize a grex workspace (creates grex.jsonl, configures hooks, writes .gitignore managed-block markers if missing).

  • Args: none.
  • Flags: --hooks-path <dir> (default .grex/hooks), --no-clone (skip fetch of pre-existing entries).
  • Example: grex init --parallel 4
  • JSON: {"workspace":"<cwd>","created":["grex.jsonl","grex.lock.jsonl"],"hooks":"<path>","cloned":[]}

grex add <url> [path]

Register a pack, clone it, run its install.

  • Args: <url> required; [path] optional bare-name, inferred from URL basename.
  • Flags: --type <meta|declarative|scripted> (auto-detected from pack.yaml), --ref <branch|tag|sha>, --no-install (clone only).
  • Exit: 2 if path not bare; 4 if fetch or install fails.
  • Example: grex add git@github.com:user/warp-cfg.git warp-cfg
  • JSON: {"id":"warp-cfg","type":"declarative","path":"warp-cfg","sha":"abc123","installed":true}

grex rm <path>

Run teardown, remove pack dir, tombstone in manifest, update .gitignore.

  • Args: <path> required.
  • Flags: --keep-files (tombstone only), --skip-teardown (do not run teardown actions/hooks).
  • JSON: {"id":"...","removed":true,"files_deleted":true,"teardown":"ok"}

grex ls

List registered packs (post-fold).

  • Flags: --type <...>, --long (include SHA + install time + actions_hash), --tree (nested view).
  • JSON: [{"id":"...","type":"...","path":"...","ref":"...","sha":"...","installed_at":"..."}]

grex status

Drift report: manifest vs lockfile vs on-disk.

  • Flags: --stale-after <duration>, --fail-on-drift.
  • JSON: [{"id":"...","on_disk":true,"sha_match":true,"actions_hash_match":true,"drift":"clean|dirty|missing|untracked|stale"}]
  • Exit: 7 if any drift with --fail-on-drift.

grex sync [--recursive]

Git fetch/pull every pack; recurse into children. Install actions are not re-run here (see update).

  • Flags: --recursive (default true), --only <id>, --fail-fast.
  • M4-D flags (additive, freeze-preserving):
    • --ref <REF> — override every pack's declared ref for this sync invocation (branch, tag, or commit SHA). Applied by the walker at each child clone / checkout; the root pack itself is not re-checked-out (operator manages root via grex add / manual git). Empty and whitespace-only values rejected at parse time.
    • --only <GLOB> — restrict sync to packs whose workspace-relative pack path, normalized to forward-slash form (/), matches the glob. Cross-platform consistent: a, b/c, vendor/* evaluate identically on Windows and POSIX. The root pack (whose path lies outside the workspace) falls back to its absolute forward-slash path. Bare pack names do not match unless the name coincides with the workspace-relative path. Repeat the flag to OR-combine multiple patterns. Non-matching packs are skipped entirely (no action execution); their prior lockfile entry is carried forward so a subsequent unfiltered sync still short-circuits on hash. Invalid globs exit 2 (CLI usage error). Caveat — --only does NOT expand to include a pack's depends_on / child dependencies; operator must include them explicitly if dependency-filtered runs are required. Empty and whitespace-only values rejected at parse time.
    • --force — re-execute every pack even when its actions_hash is unchanged from the prior lockfile. Bypasses the M4-B skip-on-hash short-circuit. Caveat — non-idempotent actions (exec without guard, mkdir with mode drift, etc.) may produce duplicate / compounding side effects when --force replays after a mid-run halt; operator responsibility to ensure action idempotency before using --force on a partially-applied workspace.
  • JSON: [{"id":"...","result":"ok|err","sha_before":"...","sha_after":"...","message":""}]
  • Exit: 4 on op failure without --keep-going.

grex update [pack]

Sync + re-run install actions for packs whose lockfile SHA or actions_hash changed.

  • Args: [pack] optional; defaults to all.
  • Flags: --force (re-run install regardless of lock), --only <id>.
  • JSON: [{"id":"...","synced":true,"reinstalled":true,"reason":"sha-changed|hash-changed|forced|none"}]

grex doctor

Integrity + drift + lint.

  • Checks: manifest schema, gitignore managed-block in sync, on-disk pack drift, .grex/pack.yaml schema validity, stale .grex-lock files, orphan entries.
  • Flags: --compact (run manifest compaction), --fix (auto-fix fixable issues).
  • Exit: 7 on drift, 3 on manifest integrity failure, 0 clean.

grex serve --mcp

Launch embedded MCP stdio JSON-RPC 2.0 server.

  • Flags: --mcp (required; reserved for --http in v2).
  • Exit: 6 on protocol error.
  • Details: mcp.md.

grex import

Bring external state into the manifest.

  • Flags:
    • --from-repos-json <path> — ingest legacy flat REPOS.json array.
    • --scan — walk workspace one level deep, register untracked .git dirs.
    • --default-type <...> — pack-type assumed for entries without pack.yaml (default: meta).
  • JSON: {"imported":[...],"skipped":[...],"errors":[...]}

grex run <action> [--filter <expr>]

Invoke a registered action by name across matched packs. Primarily for testing/diagnostic use; production installs go through pack-type lifecycle.

  • Args: <action> required; matches registered plugin name.
  • Flags: --filter <expr>, --parallel N.
  • JSON: [{"pack":"...","action":"...","changed":true,"message":""}]

grex exec <cmd> [-- args...] [--filter <expr>]

Run an arbitrary command inside each matched pack's workdir.

  • Args: <cmd> required.
  • Flags: --filter, --parallel N, --shell (opt-in shell parsing; off by default).
  • Example: grex exec git status
  • JSON: [{"pack":"...","stdout":"","stderr":"","exit":0}]

Verb interactions

  • sync only fetches; update = sync + install re-run on lockfile delta.
  • run operates on actions directly, bypassing pack-type lifecycle; useful for debugging a single action.
  • exec is never filtered through the action plugin registry; it runs arbitrary commands.
  • serve --mcp does not block other verbs; it exposes them over JSON-RPC.

Freeze semantics

A v1.x release may:

  • Add a new verb.
  • Add a new flag to an existing verb.
  • Add a new field to any --json output.
  • Add a new action name, pack-type name, or MCP method (all additive).

A v1.x release may NOT:

  • Remove or rename a verb.
  • Change the meaning of an existing flag.
  • Change the type of an existing JSON output field.
  • Remove an action name or pack-type name.

grex doctor validates pack.yaml against the frozen schema version.

CLI --json output

Every non-transport verb honours the global --json flag. When present, the verb writes a single JSON document to stdout and suppresses the default human-readable output. The serve verb is excluded — it owns stdio for JSON-RPC framing, so --json is not applicable there.

This chapter is the v1 JSON contract. Field names are stable across PATCH / MINOR releases; new fields may be added (readers must ignore unknown keys per the manifest wire invariant). Breaking changes require a MAJOR bump and a deprecation cycle — see semver.md.

Two envelope families

Every --json payload belongs to exactly one of two families. Callers distinguish them by the presence or absence of a top-level status key:

discriminantenvelope familystability
"status": "unimplemented"stub envelopestable shape while the verb remains unimplemented (see below)
no status keyverb-specific shapestable shape per the verb's section below

A verb transitioning from unimplemented to wired is a schema addition, not a replacement: the stub envelope is dropped and a verb-specific shape takes its place. Consumers MUST branch on the presence of status:

// Pseudocode — pick the right parser per verb
if (payload.status === "unimplemented") {
  // Stub verb. Treat as "no semantic data yet" and proceed.
} else {
  // Verb-specific shape documented below.
}

The two families never co-exist in the same payload. status is reserved for the stub envelope; no verb-specific shape will ever gain a top-level status field.

Stub envelope (unimplemented verbs)

init, rm, status, update, run, exec are still M1 stubs. --json emits:

{"status": "unimplemented", "verb": "init"}

Fields:

  • status — always the literal string "unimplemented".
  • verb — the verb name as typed on the command line.

The stub envelope is a contract for consumers to detect unfinished verbs without parsing the (absent) verb-specific body. When the verb is wired, the stub envelope is removed; the verb now emits its verb-specific shape. Exit codes are unchanged (stubs exit 0).

add

Wired. Emits an add registration report:

{
  "dry_run": false,
  "id": "pack-a",
  "url": "https://example.com/pack-a.git",
  "path": "pack-a",
  "type": "scripted",
  "appended": true
}

Fields:

  • dry_run — bool; mirrors the global --dry-run flag.
  • id — pack id written to the manifest; currently equal to path.
  • url — source URL as provided.
  • path — workspace-relative pack path, explicit or inferred from URL.
  • type — classified pack kind (scripted for git-like URLs, declarative otherwise).
  • appended — bool; false only when dry_run is true.

The MCP add tool emits a byte-identical body.

ls

Wired in v1.1.1. Walks the workspace from a root pack.yaml (or the current directory when no pack_root is given) without cloning, fetching, or executing anything, and emits a structured tree envelope:

{
  "workspace": "/abs/path/to/workspace",
  "tree": [
    {
      "id": 0,
      "name": "rootp",
      "path": "/abs/path/to/workspace",
      "type": "meta",
      "synthetic": false,
      "children": [
        {
          "id": 1,
          "name": "alpha",
          "path": "/abs/path/to/workspace/alpha",
          "type": "scripted",
          "synthetic": true,
          "children": []
        }
      ]
    }
  ]
}

Fields:

  • workspace — absolute path to the resolved workspace (the directory holding the root pack's .grex/, or the pack root itself for the flat-sibling layout).
  • tree[] — root-level nodes. Currently always one entry; the array shape is reserved so future surfaces walking from a workspace dir with multiple sibling packs can extend without a schema break.
  • Per node: id (stable in-walk depth-first counter, root = 0), name, path (absolute), type (one of meta, declarative, scripted), synthetic (bool — see below), children[].

synthetic: true indicates a plain-git child whose pack manifest was synthesised in-memory by the walker (the destination directory carries .git/ but no .grex/pack.yaml). Synthetic nodes always carry type: "scripted" per the v1.1.1 design. See pack-spec.md §"Plain-git children" for the full contract.

Error envelope

{"verb": "ls", "error": {"kind": "tree", "message": "..."}}

kind values: tree (root manifest could not be loaded), usage (invalid pack_root argument). The verb exits 2 on error and 0 on success.

The MCP ls tool emits a byte-identical successful body. The MCP surface does NOT accept a pack_root parameter (workspace-confinement invariant); the walk always starts from the server's pinned workspace.

sync and teardown

These verbs drive the M3 Stage B pipeline. --json emits a SyncReport-shaped document:

{
  "verb": "sync",
  "dry_run": false,
  "steps": [
    {"pack": "a", "action": "file-write", "idx": 0, "result": "performed_change", "details": null},
    {"pack": "b", "action": "shell-run", "idx": 1, "result": "skipped",
     "details": {"pack_path": "b", "actions_hash": "sha256:..."}}
  ],
  "halted": null,
  "event_log_warnings": [],
  "summary": {"total_steps": 2}
}

result values: performed_change, would_perform_change, already_satisfied, noop, skipped, other.

Missing <pack_root> → usage error (exit 2)

sync / teardown without a <pack_root> positional emit a verb-specific error envelope and exit 2 (the frozen usage-error exit code from cli.md):

{
  "verb": "sync",
  "error": {"kind": "usage", "message": "`<pack_root>` is required (directory with `.grex/pack.yaml` or the YAML file)"}
}

This is NOT a stub envelope — no status key. The usage-error branch is distinct from the unimplemented-verb branch so callers can distinguish "tell the user to fix their invocation" (exit 2) from "this verb has no implementation yet" (exit 0).

Error envelope (other failure paths)

Validation / tree / exec / halted paths share the same envelope shape:

{
  "verb": "sync",
  "error": {"kind": "validation", "message": "…"}
}

kind values: validation, tree, exec, usage, other. The halted sub-case emits a dedicated shape:

{"verb": "sync", "halted": {"pack": "a", "action": "shell-run",
 "idx": 0, "error": "…", "recovery_hint": "…"}}

doctor

Wired. Emits a DoctorReport:

{
  "exit_code": 0,
  "worst_severity": "ok",
  "findings": [
    {"check": "manifest-schema", "severity": "ok",
     "pack": null, "detail": "", "auto_fixable": false, "synthetic": false},
    {"check": "synthetic-pack", "severity": "ok",
     "pack": "algo-leet", "detail": "OK (synthetic)",
     "auto_fixable": false, "synthetic": true}
  ]
}

Fields:

  • exit_code — number; the severity-roll-up exit code the CLI also returns from the process.
  • worst_severity — string; one of ok / warning / error. Matches the highest severity in findings.
  • findings[] — array of per-check finding objects.

severity values: ok, warning, error.

v1.1.1+ adds synthetic: true to findings for synthetic plain-git children (skipped schema validation; gitignore + drift checks still run). The flag mirrors the synthetic marker on the matching LsTree node and on the lockfile entry, so consumers correlating doctor findings with grex ls output see the same plain-git provenance on both surfaces.

The MCP doctor tool emits a byte-identical body. The MCP surface does NOT accept --fix (read-only inspection only) or --workspace (workspace-confinement invariant). CLI-only users retain grex doctor --fix for interactive gitignore healing.

import

Wired. Emits an ImportPlan:

{
  "dry_run": true,
  "imported": [
    {"path": "pack-a", "url": "https://…", "kind": "declarative",
     "would_dispatch": true}
  ],
  "skipped": [{"path": "pack-b", "reason": "path_collision"}],
  "failed": []
}

Fields:

  • dry_run — bool; mirrors whichever of --dry-run / global --dry-run was in effect.
  • imported[] — entries that will be (or were) added to the manifest.
  • skipped[] — entries excluded; reason is one of path_collision, duplicate_in_input.
  • failed[] — entries that errored during ingest; each carries a human-readable error string.

No summary wrapper — callers derive counts from the three arrays. The MCP import tool emits a byte-identical body. The MCP surface does NOT accept a workspace parameter (workspace-confinement invariant); the fromReposJson path is resolved relative to the server's workspace and rejected if it canonicalises outside it.

Exit codes

--json does not alter exit codes — callers MUST use the process exit code as the source of truth for success / failure, not the presence of an error key. The JSON payload is diagnostic detail, not the wire signal.

actions

7 Tier 1 action primitives. Grounded in observed real-world script patterns (see goals.md grounded-reality table). Each is a native Rust built-in registered as an ActionPlugin at compile time.

Action invocation shape

In pack.yaml:

actions:
  - <action-name>:
      <arg>: <value>
      ...

Or for actions that take a bare argument object:

actions:
  - mkdir: { path: "$HOME/.warp" }

grex parses each entry, looks up the action by key in the registry, and dispatches to its ActionPlugin::execute.

Variable expansion

Action args support env-var interpolation: $HOME, $USER, $APPDATA, $LOCALAPPDATA, ${NAME}. Expansion is done by grex in the PackCtx::env resolver — native-per-platform:

  • POSIX: standard $VAR, case-sensitive.
  • Windows: $VAR works, plus %VAR% for legacy paths. $HOME maps to %USERPROFILE% (fallback applied on VarEnv::from_os / from_map only — NOT on an explicit insert). Lookup is ASCII-case-insensitive via a secondary lowercase-keyed index; $UserProfile and $USERPROFILE resolve to the same value.

Escape syntax

  • POSIX form: a literal $ is written as $$. $${HOME} expands to the literal string ${HOME} (no expansion).
  • Windows form: a literal % is written as %%. %%USERNAME%% expands to the literal string %USERNAME%.
  • Backslash escapes (\$, \%) are not supported.
- env:
    name: GREX_DOC_EXAMPLE
    value: "literal $${HOME} and %%USERNAME%%"   # → literal ${HOME} and %USERNAME%

The 7 primitives

Create or update a symlink, with optional backup of any existing dst.

- symlink:
    src: files/config.yaml     # relative to pack workdir
    dst: "$HOME/.warp/config.yaml"
    backup: true               # default false; renames existing dst to <dst>.grex-bak.<ts>
    normalize: true            # default true; absolute-normalizes both paths
    kind: auto                 # auto | file | directory; Windows needs explicit for dir symlinks
FieldTypeDefaultNotes
srcpathrequiredResolved relative to pack workdir.
dstpathrequiredMay contain env vars.
backupboolfalseRenames existing dst before creating symlink.
normalizebooltrueCanonicalizes both sides.
kindenumautoauto infers from src; directory forced on Windows for dir links.

Cross-platform: uses std::os::unix::fs::symlink on POSIX, std::os::windows::fs::{symlink_file, symlink_dir} on Windows. Requires Developer Mode or SeCreateSymbolicLink privilege on Windows; require gate recommended.

kind: auto with missing src: if kind is auto and src does not exist at execute time, grex errors with SymlinkAutoKindUnresolvable rather than defaulting to file. A dangling file-symlink where a directory was required is worse than a loud failure.

Idempotency: if dst is already a symlink pointing at src, no-op (changed: false).

Rollback: removes the symlink; if a backup was made, restores it.

Backup + create atomicity: when backup: true is set and dst exists, grex renames dst → <dst>.grex.bak then creates the symlink. If the rename succeeds but the create fails, grex renames the backup back to dst (best-effort restore). If the restore rename also fails, grex surfaces SymlinkCreateAfterBackupFailed — the user is told exactly what is on disk (backup at <dst>.grex.bak, no symlink at dst) so manual recovery is unambiguous.

Errors: src missing, dst parent missing, privilege denied, SymlinkAutoKindUnresolvable (see above), SymlinkCreateAfterBackupFailed (see above).

Duplicate dst within a pack: two or more symlink actions in the same pack whose resolved dst paths are equal is a plan-phase validation error (ActionArgsInvalid), raised before any action executes. On case-insensitive filesystems (Windows, macOS default APFS) the comparison is ASCII-case-folded so C:\Users\a\x and c:\users\a\X are detected as duplicates. Cross-pack collisions on the same dst are handled separately by workspace-level conflict detection. See pack-spec.md §Validation rules.

2. env

Set an environment variable.

- env:
    name: WARP_HOME
    value: "$HOME/.warp"
    scope: user                # user | machine | session
FieldTypeDefaultNotes
namestringrequiredVariable name.
valuestringrequiredExpanded before setting.
scopeenumuseruser persists to shell rc / registry HKCU; machine → HKLM / /etc/environment (requires admin); session → current process only.

Platform:

  • Windows: user writes HKCU\Environment + broadcasts WM_SETTINGCHANGE. machine writes HKLM\System\CurrentControlSet\Control\Session Manager\Environment.
  • POSIX: user appends managed-block to ~/.bashrc / ~/.zshrc / ~/.config/fish/config.fish. machine writes /etc/environment.
  • session uses std::env::set_var (doesn't persist).

Idempotency: re-read current value; no-op if already set.

Rollback: restores previous value if captured; else unsets.

3. mkdir

Create a directory, including parents.

- mkdir: { path: "$HOME/.warp" }
FieldTypeDefaultNotes
pathpathrequiredExpanded.
modestring"755" (POSIX)Ignored on Windows.

Idempotency: no-op if already a directory.

Errors: path exists as non-directory.

Rollback: if grex created it, remove it (only if empty).

4. rmdir

Remove a directory, optionally with backup.

- rmdir:
    path: "$HOME/.warp"
    backup: true               # default false; renames to <path>.grex-bak.<ts>
    force: false               # default false; if false, refuses non-empty unless backup
FieldTypeDefaultNotes
pathpathrequiredExpanded.
backupboolfalseRenames rather than deleting.
forceboolfalseAllow recursive delete of non-empty.

Idempotency: no-op if already absent.

Rollback: restores backup if one was made; else creates empty dir (best-effort).

5. require

Prerequisite / idempotency gate. Evaluates predicates; on failure, aborts or skips per on_fail.

- require:
    all_of:                    # or any_of / none_of
      - cmd_available: git
      - os: windows
      - psversion: ">=5.1"
    on_fail: error             # error | skip | warn

Predicates:

PredicateArgMeaning
path_existspathFilesystem path present.
cmd_availablenamename in PATH.
reg_keyhive\path!nameRegistry value present (Windows only; off-platform a leaf evaluation yields PredicateNotSupported). Forward-slash separators (HKCU/Software/X) are accepted and normalized to \. ACL-denied or transient registry I/O surfaces as PredicateProbeFailed rather than collapsing to false.
oswindows|linux|macosCurrent OS matches.
psversionversion-specPowerShell version constraint (Windows only; off-platform a leaf evaluation yields PredicateNotSupported). Probe is bounded by a 5 s timeout, prefers the absolute %SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe path to resist PATH-hijack, compares the full (major, minor) tuple, and surfaces non-zero exit / timeout / unexpected I/O as PredicateProbeFailed. powershell.exe genuinely missing degrades to false (matches the reg_key NotFound shape).
symlink_okPrivilege / dev-mode present to create symlinks.

Combiners: all_of (AND), any_of (OR), none_of (NOT). Nest freely. Inside these combiners (and inside when's all_of / any_of / none_of lists) a leg that yields PredicateNotSupported is treated as false so other legs still get a chance — this preserves the cross-platform rescue pattern any_of: [{reg_key: ...}, {path_exists: /etc/foo}]. The top-level combiner attached to a require stays strict: a single unsupported leaf under require still bubbles the typed error.

on_fail:

  • error → abort pack install with non-zero exit.
  • skip → remaining actions in this pack skipped, lifecycle reports "skipped".
  • warn → log warning, continue.

Observed frequency: 9 uses in the scanned scripts. Highest-leverage primitive.

6. when

Platform / conditional gate wrapping nested actions. Sugar over require for common platform dispatch.

- when:
    os: windows                # or: any_of / all_of / none_of
    actions:
      - mkdir: { path: "$HOME/.warp" }
      - symlink: { src: files/config.yaml, dst: "$HOME/.warp/config.yaml" }
FieldTypeDefaultNotes
osstringShorthand for require { os: ... }.
all_of/any_of/none_oflistFull predicate combiner support.
actionslistrequiredNested actions; run only if condition holds.

On condition false: all nested actions are skipped (not failures). No rollback needed — nothing ran.

Combiner precedence: when os and any of all_of/any_of/none_of appear together, they compose conjunctively (AND). os: is shorthand equivalent to an os: predicate inside an implicit all_of; the explicit combiners are appended to that same all_of. Mixed example:

- when:
    os: windows
    all_of:
      - cmd_available: pwsh
      - psversion: ">=7.0"
    actions:
      - exec: { cmd: ["pwsh", "-NoProfile", "-File", "files/setup.ps1"] }

Both the os: windows shorthand and every predicate under all_of must hold for the nested actions to run.

7. exec

Shell escape. Runs a command. Array form by default (no shell interpretation). Opt into shell parsing explicitly.

- exec:
    cmd: ["rclone", "copy", "gdrive:backup", "$HOME/backup"]
    cwd: "$HOME"               # default: pack workdir
    env:                       # extra env vars for this invocation
      RCLONE_CONFIG: "$HOME/.config/rclone/rclone.conf"
    shell: false               # default false; true = parse via sh -c / cmd /c
    on_fail: error             # error | warn | ignore
FieldTypeDefaultNotes
cmdlist[string]required (when shell=false)argv array.
cmd_shellstringrequired (when shell=true)Single string passed to shell.
cwdpathpack workdirWhere to run.
envmap{}Extra env vars.
shellboolfalseEnable shell interpretation.
on_failenumerrorError propagation.

Rule: exec is the last-resort primitive. If you find yourself writing a second exec in the same pack, consider promoting the logic to a purpose-built action (built-in or plugin).

No idempotency guarantee. grex does not know whether the command you ran is repeatable. Pair with require to gate it.

Rollback: none (grex cannot know how to undo arbitrary commands). Pack authors wanting true rollback must pair with a teardown action.

stderr capture on failure: when exec returns a non-zero status (and on_fail: error), grex records the failure as ExecNonZero and attaches a truncated copy of the command's stderr — capped at 2 KiB — to the manifest action_halted event. The cap bounds manifest event size to stay below the fd-lock append atomicity ceiling (see manifest.md §Atomic append). Full stderr is printed to the terminal regardless; only the manifest copy is truncated.

Observed-pattern → primitive mapping

From the E:\repos scan (3 PowerShell scripts, 945 LOC):

Observed patternCountv1 primitiveNotes
New-Item -ItemType SymbolicLink / ln -s8symlinkDirect mapping.
if (Test-Path …) { … } idempotency guards9requirepath_exists or cmd_available predicate.
[Environment]::SetEnvironmentVariable(…, 'User')7env (scope: user)Direct.
& ./install.ps1 chain scripts5execTemporary; plugin should replace long-term.
New-Item -ItemType Directory -Force2mkdirDirect.
if ($IsWindows) { … } platform gate2whenDirect.
Rename-Item backup then Remove-Item -Recurse1rmdir (backup: true)Direct.

No observed: package installs (winget, choco), JSON merges, archive extracts, template rendering. Those are real patterns but not in this sample. Deferred to v2 plugin contributions.

Action plugin registration

Built-ins register via the canonical register_builtins(&mut Registry) free function called from Registry::bootstrap() (decision 2026-04-20). inventory::submit! auto-registration is feature-gated behind plugin-inventory (default off) and lands in Stage M4-E. User-facing YAML keys resolve through the registry name-to-plugin map.

Full trait definition, registration details, and v2 external-loading path: plugin-api.md.

Error taxonomy

ErrorCauseRecovery
ActionArgsInvalidMalformed YAML for action.Fix pack.yaml.
ActionPreconditionFailedrequire predicate false with on_fail: error.Fix environment or pack.
ActionExecutionFailedRuntime error during action.Pack-type rollback invoked.
ActionUnknownAction key not registered.Plugin missing. Exit 8.
PredicateNotSupportedPredicate (reg_key / psversion) is platform-specific and the current platform cannot answer it. Inside all_of / any_of / none_of combiners this is tolerated as false; at the top-level require it is fatal.Wrap with when: { os: windows } or use any_of with a cross-platform fallback leg.
PredicateProbeFailedThe probe ran on the correct platform but itself broke — non-zero powershell.exe exit, 5 s timeout, ACL-denied registry read, or other OS I/O that is not a plain NOT_FOUND. Always fatal.Investigate the probe error (AV hook, WinRM stall, ACL). Not rescued by combiner tolerance — a broken probe is not a rescue-eligible condition.

All actions return Result<ExecStep, ExecError> to the pack-type driver (v1 shape, 2026-04-20; see plugin-api.md); the driver aggregates failures and triggers rollback per pack-type policy.

plugin-api

Stable trait contracts for v1 extension points. Post-v1 these are semver-protected: breaking changes require a major bump of grex itself.

Three traits

  1. ActionPlugin — implements one action name (e.g. symlink, env).
  2. PackTypePlugin — implements one pack-type (meta, declarative, scripted).
  3. Fetcher — implements one URL scheme (git in v1).

All three are Send + Sync + 'static async trait objects via async_trait.

Uniform &str across plugin traits (2026-04-20) — enables String-backed plugins in v2 (WASM/dylib); builtins return literals which coerce to 'static-lifetime &str for zero alloc.

ActionPlugin

#![allow(unused)]
fn main() {
use async_trait::async_trait;
use serde_json::Value;

#[async_trait]
pub trait ActionPlugin: Send + Sync {
    /// Stable action name, matches the YAML key.
    fn name(&self) -> &str;

    /// Execute the action. Args are the raw YAML sub-tree under the action key.
    async fn execute(
        &self,
        ctx: &ExecCtx<'_>,
        args: &Value,
    ) -> Result<ExecStep, ExecError>;
}
}

M4-B shipped shape (2026-04-20): the snippet above is the v2-facing target (WASM/dylib plugins consume raw &Value). The in-process v1 trait landed sync and takes the typed &Action instead of &Value:

#![allow(unused)]
fn main() {
pub trait ActionPlugin: Send + Sync {
    fn name(&self) -> &str;
    fn execute(&self, action: &Action, ctx: &ExecCtx<'_>)
        -> Result<ExecStep, ExecError>;
}
}

Rationale: the wet-run executor, planner, and scheduler are all synchronous today; the parse step has already validated shape + invariants so taking the typed &Action is zero-cost at the boundary. The async + &Value form is reserved for external plugin loading (M5+ / v2) where the trait crosses a dylib/WASM ABI boundary. Both shapes return ExecStep — that is stable across v1 and v2.

Return type (v1): ExecStep carries the per-action result envelope — action_name, result (ok/skipped/failed with diagnostics), kind, and related fields. ActionOutcome is superseded by ExecStep in v1 — richer shape carries diagnostics. Original ActionOutcome { changed, message } design retired 2026-04-20.

Rollback is not on the trait surface (decision 2026-04-20, matches openspec/feat-grex/spec.md §1). Rollback semantics remain where the M3 executor kept them (per-action inverse logic in the executor), not in an ExecStep field. A dedicated rollback protocol is deferred to M5+ when pack-type drivers may require it.

ExecCtx (v1 realization of PackCtx)

PackCtx as originally drafted is v1-realized as ExecCtx<'a> in code. Fields present: vars (implements EnvResolver), pack_root, workspace, platform (typed as Os enum). Fields deferred to M5: pack_id, dry_run, explicit logger: &dyn ActionLogger wiring. The ActionLogger and EnvResolver traits are defined in grex-core::{log, env} and available for plugins to use directly; ExecCtx field wiring deferred.

#![allow(unused)]
fn main() {
pub struct ExecCtx<'a> {
    pub vars: &'a VarEnv,                // implements EnvResolver
    pub pack_root: &'a std::path::Path,
    pub workspace: &'a std::path::Path,
    pub platform: Os,                    // Windows | Linux | Macos
    // deferred to M5: pack_id, dry_run, logger: &dyn ActionLogger
}
}

PackTypePlugin

Updated 2026-04-20: M5-1 trait signature aligned with shipped M4 code patterns. The trait mirrors M4 ActionPlugin exactly — same ExecCtx<'_> context, same Result<ExecStep, ExecError> return envelope — so pack-type and action plugins share one result pipeline. The earlier anyhow::Result<()> + bare Pack draft is retired.

#![allow(unused)]
fn main() {
pub trait PackTypePlugin: Send + Sync {
    fn name(&self) -> &str;

    async fn install(
        &self,
        ctx: &ExecCtx<'_>,
        pack: &PackManifest,
    ) -> Result<ExecStep, ExecError>;

    async fn update(
        &self,
        ctx: &ExecCtx<'_>,
        pack: &PackManifest,
    ) -> Result<ExecStep, ExecError>;

    async fn teardown(
        &self,
        ctx: &ExecCtx<'_>,
        pack: &PackManifest,
    ) -> Result<ExecStep, ExecError>;

    async fn sync(
        &self,
        ctx: &ExecCtx<'_>,
        pack: &PackManifest,
    ) -> Result<ExecStep, ExecError>;
}
}

Ground-truth references (M4 shipped, 2026-04-20):

  • M4 ActionPlugin trait: crates/grex-core/src/plugin/mod.rs:49-62 — pattern PackTypePlugin reuses.
  • ExecCtx<'a>: crates/grex-core/src/execute/ctx.rs:96-146 — reused verbatim.
  • PackManifest: crates/grex-core/src/pack/mod.rs:171-197 — canonical name (not Pack).
  • ExecStep / ExecError: crates/grex-core/src/plugin/mod.rs — same envelope as ActionPlugin return.

Async form: uses 2024-edition native async-in-trait; fall back to #[async_trait] only if a toolchain blocker surfaces at M5-1 implementation time.

PackManifest

Parsed .grex/pack.yaml. Ground-truth struct from crates/grex-core/src/pack/mod.rs:171-197:

#![allow(unused)]
fn main() {
pub struct PackManifest {
    pub schema_version: SchemaVersion,   // literal "1"
    pub name: String,
    pub r#type: PackType,                // enum: Meta | Declarative | Scripted | plugin-name
    pub version: Option<String>,
    pub depends_on: Vec<String>,
    pub children: Vec<ChildRef>,
    pub actions: Vec<Action>,
    pub teardown: Option<Vec<Action>>,   // already parsed; R-M5-09 just reads it
    pub extensions: BTreeMap<String, serde_yaml::Value>,
}
}

Dispatch at M5 executor boundary: registry.get(pack.r#type.as_str()). The r#type: PackType enum stays in the parsed form; the string view is only consumed at registry lookup.

Lifecycle semantics (required contract)

MethodRequired behavior
installIdempotent. Running twice must be equivalent to running once.
updateRun only when lockfile sha or actions_hash changed (grex core decides; plugin just does the work when called).
teardownMust attempt to reverse install. May be partial.
syncMay recurse into children. May no-op for leaf types.

Fetcher

#![allow(unused)]
fn main() {
#[async_trait]
pub trait Fetcher: Send + Sync {
    /// URL scheme this fetcher handles: "git".
    fn scheme(&self) -> &str;

    async fn clone(
        &self,
        url: &str,
        ref_spec: Option<&str>,
        dst: &std::path::Path,
    ) -> anyhow::Result<FetchReport>;

    async fn pull(
        &self,
        dst: &std::path::Path,
    ) -> anyhow::Result<FetchReport>;

    async fn current_sha(
        &self,
        dst: &std::path::Path,
    ) -> anyhow::Result<String>;
}

pub struct FetchReport {
    pub sha: Option<String>,
    pub branch: Option<String>,
    pub bytes: Option<u64>,
}
}

v1 ships one implementation (fetchers::git, either gix or git2). v2 may ship rclone, s3, oci, http behind the same trait.

Registry struct

Canonical v1 registry holding the action plugins. Packtypes + fetchers retain their existing maps on Registry; the signature below covers the action surface added in M4:

#![allow(unused)]
fn main() {
pub struct Registry {
    actions: HashMap<String, Box<dyn ActionPlugin>>,
    // packtypes, fetchers: see existing fields
}
impl Registry {
    pub fn new() -> Self;
    pub fn register<P: ActionPlugin + 'static>(&mut self, plugin: P);
    pub fn get(&self, name: &str) -> Option<&dyn ActionPlugin>;
    pub fn bootstrap() -> Self;  // calls register_builtins internally
}
}

bootstrap() is the canonical entrypoint: it constructs an empty Registry and hands it to register_builtins for the 7 Tier 1 actions. Executor dispatch goes through Registry::get(name) (an unknown name yields UnknownAction) — the dispatch swap from direct Action enum match to Registry::get lands in M4-B (moved 2026-04-20 from M4-A; see milestone.md Stage order note and openspec/feat-grex/spec.md §4). In M4-A the Registry is shipped as a parallel surface and covered by plugin-layer unit tests while FsExecutor / PlanExecutor keep the existing enum-match dispatch.

register_builtins free function

#![allow(unused)]
fn main() {
pub fn register_builtins(reg: &mut Registry);
}

Populates reg with all 7 Tier 1 plugins (symlink, env, mkdir, rmdir, require, when, exec). This is the canonical registration path in v1 — inventory::submit! auto-registration is optional (see feature flag below).

Builtins crate location (2026-04-20): v1 builtins live in grex-core::plugin (co-located for simplicity). grex-plugins-builtin is reserved for v2 third-party-facing extensions. Physical move deferred to M5+.

Idempotency

ExecResult::Skipped { pack_path: PathBuf, actions_hash: String } is emitted when the lockfile-stored actions_hash for a pack equals the recomputed hash at sync time. Hash scope is canonical JSON of the pack's actions: list plus the resolved commit sha (consistent with the "lockfile actions_hash field name kept" open-question note; variant reserved in PR #14). On a Skipped emission the executor performs no work for that pack and writes no new per-action events for it.

Hash algorithm (2026-04-20): actions_hash = sha256(b"grex-actions-v1\0" || canonical_json(actions) || b"\0" || commit_sha), lowercase hex. Computed per-pack; stored in LockEntry.actions_hash; compared at sync start; match emits ExecResult::Skipped and short-circuits the pack. Implemented in grex-core::lockfile::hash::compute_actions_hash.

Feature flag plugin-inventory

Default: off in v1. When on, built-in action modules use inventory::submit! to auto-register and Registry::bootstrap() walks inventory::iter::<BuiltinAction>(). When off, register_builtins is the only path. Keeping inventory optional means grex-core carries no hard dependency on it; linker-based collection is a deployment concern per-consumer.

Registration (v1 in-process)

Canonical path (decision 2026-04-20): explicit register_builtins(reg: &mut Registry). Registry::bootstrap() constructs an empty Registry and hands it to register_builtins, which registers all 7 Tier 1 actions + 3 pack-types + the git fetcher. No inventory dependency is pulled into grex-core on the default path.

#![allow(unused)]
fn main() {
fn register_builtins(reg: &mut Registry) {
    reg.register_action(Box::new(actions::Symlink));
    reg.register_action(Box::new(actions::Env));
    // ... remaining 5 Tier 1 actions
    reg.register_pack_type(Box::new(packtypes::Meta));
    reg.register_pack_type(Box::new(packtypes::Declarative));
    reg.register_pack_type(Box::new(packtypes::Scripted));
    reg.register_fetcher(Box::new(fetchers::Git));
}
}

Alternative: inventory::submit! (feature-gated, M4-E)

Opt-in compile-time auto-registration via the inventory crate, gated behind the plugin-inventory cargo feature (default off; see "Feature flag plugin-inventory" above). Lands in Stage M4-E as a discovery hook; not on the critical path for v1 and not required by any other stage.

#![allow(unused)]
fn main() {
pub struct BuiltinAction(pub fn() -> Box<dyn ActionPlugin>);
inventory::collect!(BuiltinAction);

pub struct BuiltinPackType(pub fn() -> Box<dyn PackTypePlugin>);
inventory::collect!(BuiltinPackType);

pub struct BuiltinFetcher(pub fn() -> Box<dyn Fetcher>);
inventory::collect!(BuiltinFetcher);
}

Each built-in module would then call inventory::submit! at file scope:

#![allow(unused)]
fn main() {
// src/actions/symlink.rs
pub struct Symlink;

#[async_trait::async_trait]
impl ActionPlugin for Symlink { /* ... */ }

inventory::submit! {
    crate::plugin::BuiltinAction(|| Box::new(Symlink))
}
}

When the feature is on, Registry::bootstrap() walks inventory::iter::<BuiltinAction>() (and the pack-type / fetcher collectors) instead of calling register_builtins directly. When the feature is off (default), register_builtins is the only path.

Adding a new built-in plugin in v1

The flow for a v1 contributor wanting to add, say, a pkg-install action:

  1. Create src/actions/pkg_install.rs implementing ActionPlugin.
  2. pub mod pkg_install; in src/actions/mod.rs.
  3. Add inventory::submit! block (or explicit register call).
  4. Integration test under tests/actions_pkg_install.rs.
  5. Docs entry in actions.md.

No changes to trait crate; no ABI concerns. Core grex recompile required, but plugin author writes no glue code beyond the trait impl.

v2 external plugin loading

Deferred. Two candidate routes:

Option A: dylib via libloading + abi_stable

  • Host loads libgrex_plugin_foo.{so,dylib,dll}.
  • Plugin crate uses abi_stable for FFI-safe trait objects.
  • Pros: native speed, same language.
  • Cons: ABI versioning is strict; every trait tweak risks SIGSEGV on version skew.

Option B: WASM via wasmtime / extism

  • Host loads foo.wasm.
  • Plugin compiled to wasm32-wasi.
  • Pros: sandboxed, cross-platform binary, forward-compatible ABI.
  • Cons: syscall surface must be bridged; filesystem access needs capability grants.

Decision in v2 alpha. ABI contract versioning strategy:

  • grex-plugin-api crate (extracted in v2) carries its own semver.
  • Plugin manifest declares grex_plugin_api = "1.x".
  • Host refuses load on major mismatch, warns on minor mismatch.
  • Candidate extension: ABI hash baked into plugin binary, checked at load.

Stability guarantees (v1)

Post-v1.0.0 the following are frozen until a v2.0.0:

  • ActionPlugin method signatures.
  • PackTypePlugin method signatures.
  • Fetcher method signatures.
  • ExecCtx field names & types (fields may be added; none removed or retyped).
  • ExecStep, FetchReport struct layouts (additive).
  • PackManifest struct (additive).
  • Registration mechanism.

Breaking changes require a grex major bump; v2 re-extracts the plugin traits into a separately-versioned crate so host and plugin can move independently.

mcp — embedded MCP server

grex serve launches an embedded stdio server speaking MCP 2025-06-18 natively. Every CLI verb except serve is exposed as an MCP tool invoked via tools/call. No custom JSON-RPC dialect, no grex.* methods, no batching.

Goal

Agent-native control surface. MCP tool handlers call the same library entrypoints the CLI dispatcher calls — no subprocess wrapper. Single-process observability, shared tokio runtime, manifest cache persists across requests, scheduler + pack-lock primitives shared verbatim.

Transport

  • Wire: stdio, newline-delimited JSON per MCP 2025-06-18 (one JSON-RPC message per line, LF-terminated). rmcp transport-io default framer.
  • Encoding: UTF-8.
  • Protocol version: 2025-06-18 — returned from initialize, asserted by clients and mcp-protocol-validator.
  • Batching: NOT supported. MCP 2025-06-18 rejects JSON-RPC batch arrays. Server MUST return -32600 Invalid Request if [req, req, …] arrives.
  • Stdout discipline: stdout is reserved exclusively for the JSON-RPC wire. Tracing, logs, and diagnostics go to stderr only. Any accidental stdout write is a server bug.

Protocol lifecycle

Only MCP-standard methods are accepted.

Method / notificationDirectionPurpose
initialize (req)client → serverCapability negotiation, protocol-version agreement.
notifications/initializedclient → serverClient ready to send requests.
tools/list (req)client → serverReturn the 11 tools with JSON-Schema.
tools/call (req)client → serverInvoke a tool by name.
notifications/cancelledclient → serverCancel an in-flight tools/call by requestId.
notifications/progressserver → clientOptional per-operation progress (deferred).
shutdown (req)client → serverDrain in-flight tasks then exit.

Handshake

→ {"jsonrpc":"2.0","id":1,"method":"initialize",
   "params":{"protocolVersion":"2025-06-18",
             "clientInfo":{"name":"claude-code","version":"x"},
             "capabilities":{}}}
← {"jsonrpc":"2.0","id":1,
   "result":{"protocolVersion":"2025-06-18",
             "serverInfo":{"name":"grex","version":"<workspace-version>"},
             "capabilities":{"tools":{"listChanged":false}}}}
→ {"jsonrpc":"2.0","method":"notifications/initialized"}

tools/call example

→ {"jsonrpc":"2.0","id":42,"method":"tools/call",
   "params":{"name":"sync","arguments":{"recursive":true,"parallel":8}}}
← {"jsonrpc":"2.0","id":42,
   "result":{"content":[{"type":"text","text":"<json-result>"}],"isError":false}}

Tool catalog (11 tools)

Frozen CLI verb set: init, add, rm, ls, status, sync, update, doctor, serve, import, run, exec (12 verbs).

Exposed as MCP tools: 11. serve is the server itself → not a tool. teardown is a plugin lifecycle hook of rm, not a user-invokable verb → not a tool. The constant VERBS_11_EXPOSED_AS_TOOLS is defined in grex-mcp and drives every len() assertion.

Tool nameDescription (for tools/list)readOnlyHintdestructiveHint
initInitialise a grex workspace.falsefalse
addRegister and clone a pack.falsefalse
rmUnregister a pack (runs teardown unless --skip-teardown).falsetrue
lsList registered packs.truefalse
statusReport drift + installed state.truefalse
syncSync all packs recursively.falsefalse
updateUpdate one or more packs (re-resolve refs, reinstall).falsefalse
doctorCheck manifest + gitignore + on-disk drift.truefalse
importImport packs from a REPOS.json meta-repo index.falsefalse
runRun a declared action across matching packs.falsetrue
execExecute a command across matching packs.falsetrue

Param and result shapes mirror the --json output of each CLI verb field-for-field. Every *Params struct derives JsonSchema; rmcp auto-publishes schemas in tools/list.

exec --shell is removed from the MCP surface. Arbitrary shell interpolation is a dangerous capability for an agent. The flag remains on the CLI but is absent from the exec tool's param schema. Reintroduction requires an explicit per-session capability opt-in (deferred).

Cancellation

MCP-standard notifications/cancelled with requestId. No custom grex.cancel method.

→ {"jsonrpc":"2.0","method":"notifications/cancelled",
   "params":{"requestId":42,"reason":"user aborted"}}

Server signals the matching request's tokio_util::sync::CancellationToken. Every tool handler propagates the token through:

  • Scheduler::acquire_cancellable(&CancellationToken)tokio::select! between semaphore.acquire_owned() and cancel.cancelled().
  • PackLock::acquire_cancellable(path, &CancellationToken) — same pattern; breaks the backoff loop on cancel.
  • Inner action / pack-type dispatch loop — checks cancel.is_cancelled() between steps.

Cancelled request returns -32800 request cancelled (MCP-standard reserved code).

Progress

notifications/progress is optional and deferred. v1 tool calls return only a final CallToolResult. Progress wiring from sync / update / run / exec handlers (tracing span → progress bridge) lands in a later milestone.

Error codes

Standard JSON-RPC 2.0 codes + MCP-standard -32800 + grex-reserved -32001..-32005 for pack-op failures.

CodeSourceMeaning
-32600JSON-RPCInvalid Request (malformed envelope; batch array)
-32601JSON-RPCMethod / tool not found
-32602JSON-RPCInvalid params (deserialization failure; disallowed flag)
-32603JSON-RPCInternal error (catch-all)
-32800MCPRequest cancelled
-32001grexManifest integrity failure
-32002grexPack op failed or initialization-state error (see note)
-32003grexLock contention
-32004grexDrift detected
-32005grexUnknown action / pack-type (plugin missing)

Dual use of -32002: same code surfaces (a) a user-level pack-op failure returned inside a completed tools/call, and (b) an initialization-state protocol error ("not initialized" / "already initialized") returned from the envelope. Disambiguation is by data.kind: "pack_op" vs "init_state". Splitting into two codes is a future item.

Agent-safety annotations

Every tool in tools/list declares both annotations.readOnlyHint and annotations.destructiveHint. See the catalog table above.

  • Read-only tools (ls, status, doctor) are safe for unattended agent use.
  • Destructive tools (rm, run, exec) carry destructiveHint: true so policy layers (claude-code, IDE clients) can prompt the user or gate them behind approval.
  • The annotations are advisory hints, not enforcement — enforcement is the client's responsibility.

Session model

One grex serve process = one MCP client session. Concurrent multi-client sessions over a single server are a future milestone. Rationale:

  • stdio transport is inherently single-peer.
  • Manifest cache, scheduler permit pool, and pack-lock table are scoped to the process — a second client would need explicit session partitioning.
  • Agent-harness pattern (Claude Code, Cursor, etc.) spawns one server per workspace anyway.

Concurrency integration

MCP tool handlers share one Arc<Scheduler> for the server lifetime — concurrent tools/call invocations respect --parallel exactly like local CLI invocations. Manifest cache is reused across requests. ExecCtx is built fresh per call, borrowing the shared scheduler + registry handles.

5-tier lock ordering invariant (M6). Tool handlers MUST acquire concurrency primitives in the fixed order documented in .omne/cfg/concurrency.md:

  1. workspace-sync lock
  2. scheduler semaphore permit
  3. pack-lock (per pack)
  4. backend (git) lock
  5. manifest lock

No handler may invert this order. Enforced at runtime by acquisition helpers and statically by M6's Lean4 proof (feat-m6-3).

Launch

grex serve — no --mcp flag; the command is the MCP server. Flags:

  • --manifest <path> — override manifest path (captured at launch; clients cannot override mid-session).
  • Inherits global --parallel N from the grex CLI root.

Security posture:

  • stdio only. No network listener.
  • Filesystem ops confined to the workspace root.
  • Session inherits process file permissions; no privilege escalation.

Implementation stack

  • Server framework: rmcp = "1.5" (official Rust MCP SDK). Provides transport framing, initialize negotiation, tools/list schema publication, and notifications/cancelled plumbing out of the box.
  • Schema generation: schemars — every tool's *Params struct derives JsonSchema.
  • Cancellation: tokio_util::sync::CancellationToken threaded through Scheduler and PackLock.
  • Crate layout: crates/grex-mcp/ (server + tool handlers) + crates/grex/src/cli/verbs/serve.rs (thin launch shim).

Testing:

  • crates/grex-mcp/src/** — inline #[cfg(test)] unit tests (routing, schema gen, error mapping).
  • crates/grex-mcp/tests/** — integration tests via tokio::io::duplex.
  • .github/workflows/ci.ymlmcp-validator job runs mcp-protocol-validator against a release build of grex serve.

Out-of-scope / future

  • Multi-client sessions over a single server process.
  • notifications/progress emission from long-running tool handlers.
  • exec --shell re-exposure via per-session capability opt-in.
  • Splitting -32002 into distinct pack-op vs init-state codes.
  • Remote transports (HTTP/SSE); stdio is the only v1 transport.

Pack template

grex ships a reference pack at examples/pack-template/ in the main repo. At v1.0.0 release time, the in-tree tree is mirrored to a standalone repo (git@github.com:egoisth777/grex-pack-template.git) so users can install it via grex add <URL>; until then, use the in-tree form below.

Trying the template

From a checkout of the main grex repo:

grex init
# Local (in-tree) form — works today, no external repo required:
grex add --from-path examples/pack-template
grex sync
grex doctor

Once grex v1.0.0+ is published, you'll also be able to install via the standalone mirror:

# Available at v1.0.0+ release; until then use the --from-path form above.
grex add git@github.com:egoisth777/grex-pack-template.git

Expected behaviour: the pack creates $HOME/.grex-pack-template/ and a symlink inside it pointing at the pack's files/hello.txt. Re-running grex sync is a no-op — every action is idempotent.

To undo: grex teardown grex-pack-template (or grex rm grex-pack-template to also remove it from the workspace manifest). The directory is backed up under <path>.grex-bak.<ts> before removal.

Walkthrough of the manifest

The template is type: declarative — the simplest of grex's three pack types. Its pack.yaml is structured as:

  1. require — gate the pack. If git is unavailable and the OS is not Windows, the install aborts before any filesystem action runs.
  2. mkdir + symlink — a single pair of actions, portable across linux / macos / windows via $HOME. grex-core's var-expansion synthesises $HOME from %USERPROFILE% on Windows (see crates/grex-core/src/vars/mod.rs), so no per-OS when: fan-out is required.
  3. teardown: — a single rmdir that reverses the install. Without an explicit teardown list, grex would default to reverse-order rollback of actions, which works but is less readable.

Every action is chosen for idempotency on repeat syncs: require is read-only, mkdir no-ops when the path exists, symlink no-ops when dst already points at src.

Structure of the in-tree copy

examples/pack-template/
├── .grex/
│   └── pack.yaml            # manifest (schema_version "1", type declarative)
├── files/
│   └── hello.txt            # payload referenced by the symlink action
├── README.md                # user-facing docs (Install / Structure / Customisation / Testing / Licence)
└── .gitignore               # M6 managed-block: .grex/.state/

The template is type: declarative, so it has no .grex/hooks/ directory. Hooks fire only for type: scripted packs.

Customising the template for your own pack

  1. Fork the tree into a new git repo.
  2. Rename name: in pack.yaml (regex ^[a-z][a-z0-9-]*$).
  3. Replace the actions with your own — see the actions reference for the 7 built-in primitives.
  4. If you need arbitrary shell steps that don't fit the declarative primitives, switch the manifest to type: scripted and add a .grex/hooks/ directory with setup.{sh,ps1} / sync.{sh,ps1} / teardown.{sh,ps1} scripts. Hooks receive GREX_PACK_NAME, GREX_PACK_PATH, GREX_PACK_OS, and GREX_DRY_RUN as env vars.
  5. Update the teardown: list to reverse your actions.
  6. Publish and install with grex add <your-url>.

CI validation

The in-tree copy is the canonical source and is exercised in CI by crates/grex/tests/pack_template_smoke.rs. The smoke test:

  • Parses examples/pack-template/.grex/pack.yaml via grex_core::pack::parse and asserts the top-level shape (name / type / schema_version / first-action is a require gate).
  • Asserts the payload files the README promises (.grex/pack.yaml, files/hello.txt, README.md, .gitignore) are present on disk.
  • Copies the template into a tempdir and runs grex_core::sync::run against it end-to-end, then re-runs sync to verify the second pass is an all-no-op.

If any check fails in CI, the template is broken — fix the in-tree copy before the next release, since the external mirror is regenerated from it (see the appendix below).

Relationship to other M8 stages

  • M8-1 (cargo-dist): the installer scripts referenced in the template's README live on the main grex releases page, not on the template repo.
  • M8-2 (crates.io): the template has no crates.io presence — it is a git-installable reference pack, not a Rust crate.
  • M8-3 (mdBook): this chapter is the authoritative doc for the template's ownership contract.
  • M8-5 (CHANGELOG): every release that changes the template must note it in the main grex CHANGELOG entry, plus re-mirror per the appendix.

Appendix: publishing the external mirror (release-time procedure)

Run these steps once per major grex release (v1.0.0, v1.1.0, v2.0.0, ...):

  1. Create the empty GitHub repo. On github.com: new repo egoisth777/grex-pack-template, public, MIT OR Apache-2.0 licence, empty (no README / .gitignore / licence auto-init — we push our own).

  2. Mirror the in-tree tree into a fresh git history. From the grex repo root (replace v1.0.0 with the actual release tag):

    cp -r examples/pack-template /tmp/grex-pack-template
    cd /tmp/grex-pack-template
    git init -b main
    git add -A
    git commit -m "feat: initial template from grex v1.0.0"
    git remote add origin git@github.com:egoisth777/grex-pack-template.git
    git push -u origin main
    

    On Windows, substitute $env:TEMP for /tmp and use PowerShell-native Copy-Item -Recurse.

  3. Tag the external repo to match the grex release.

    git tag -a v1.0.0 -m "grex v1.0.0"
    git push origin v1.0.0
    
  4. Verify end-to-end. From a fresh workspace:

    grex init
    grex add git@github.com:egoisth777/grex-pack-template.git
    grex sync
    grex doctor
    

    Expected: all four commands exit 0; grex doctor reports the pack as OK.

  5. Record the first-commit SHA in the main grex repo's CHANGELOG.md under the release entry, for traceability.

Ownership & CODEOWNERS

  • In-tree copy (examples/pack-template/) is governed by the main grex CODEOWNERS — same reviewers as the rest of the workspace.
  • External repo (grex-pack-template) has its own CODEOWNERS file, independent of main grex. Day-to-day PRs on the external repo (typo fixes, user-reported issues) land directly; breaking changes to the template shape MUST land in the in-tree copy first, ship with the next grex release, and then be force-pushed over the external repo as a new commit history per step 2 above.
  • Never hand-edit the external repo and the in-tree copy independently. The in-tree copy is canonical; the external repo is regenerated.

migration — from REPOS.json + .scripts/ to grex

Users on the legacy Python .scripts/ meta-repo migrate by running grex import --from-repos-json ./REPOS.json. Both systems can coexist during transition.

Legacy source system

repo/
├── .scripts/
│   ├── init.py  add.py  rm.py  sync.py  track.py  test.py
│   ├── lib/
│   └── hooks/
├── REPOS.json        # [{url, path}, ...]
└── .gitignore        # hand-curated, sub-repo dirs appended

REPOS.json shape:

[
  {"url": "https://github.com/grex-org/grex-tui.git",   "path": "grex-tui"},
  {"url": "https://github.com/grex-org/grex-core.git",  "path": "grex-core"}
]

Legacy shell native scripts (.ps1/.sh) are irrelevant in grex — Rust std::fs + built-in actions replace them.

Import command

grex import --from-repos-json ./REPOS.json

Behavior:

  1. Read + parse REPOS.json. Validate url and path (bare name) on every entry.
  2. For each entry not already in grex.jsonl (by path), emit an add event with type: meta (or --default-type <...>).
  3. For each entry already present with matching URL, skip.
  4. For each entry with same path but different url, abort unless --force (then emit update).
  5. Optionally --migrate-gitignore: rewrite .gitignore to use the managed-block format, preserving pre-existing lines outside the managed region.
grex import --from-repos-json ./REPOS.json --migrate-gitignore

Idempotent: re-running is a no-op once imported.

Disk-scan variant

grex import --scan

Walks workspace root one level deep, detects directories with .git/ not yet in grex.jsonl. For each, reads git config --get remote.origin.url, emits an add event. Skips entries without a remote.

Combinable with --from-repos-json: both sources processed, deduplicated by path.

Pack type for imported entries

Legacy REPOS.json carries no pack type info. Default assumption:

  • If the imported dir contains a .grex/pack.yaml, use its declared type.
  • Else use --default-type (flag), which defaults to meta (safe: meta packs have no actions, so no surprise side effects on first install).

User can later convert to declarative or scripted by adding a .grex/pack.yaml in the imported pack's own repo.

From v1.1.1+, plain-git children (no .grex/pack.yaml) walk via synthetic-manifest fallback — grex import --from-repos-json followed by grex sync works end-to-end on the bootstrap pattern (REPOS.json + flat-sibling git repos). See pack-spec.md §"Plain-git children" for the synthesis rule.

Coexistence during transition

Both systems can run against the same workspace if:

  • .scripts/ remains in place unmodified.
  • grex.jsonl is added alongside REPOS.json.
  • .gitignore is in managed-block format and lists every path from BOTH sources.

grex doctor in coexistence mode:

  • Warns (non-fatal) if REPOS.json has entries missing from grex.jsonl.
  • Warns if .scripts/ is still present while grex.jsonl exists.
  • Suggests running grex import --from-repos-json or retiring .scripts/.

Disambiguation rules

Same path in both sources:

REPOS.jsongrex.jsonlAction
presentabsentadd event emitted
present (url A)present (url A)no-op
present (url A)present (url B)error, abort without --force
absentpresentno-op
presenttombstonedskip, log info

--force resolves URL conflicts by emitting update from the REPOS.json value.

Path rule transition

Legacy REPOS.json required bare path (no separators). v1 grex preserves this. Nested paths (e.g. packs/foo) are deferred to v1.x; will require path-normalization + collision detection.

Retirement of .scripts/

Post-migration:

  1. Verify grex ls matches expected pack list.
  2. Run grex sync --parallel 8.
  3. Delete .scripts/ via git rm -r .scripts/.
  4. Delete REPOS.json.
  5. git config core.hooksPath .grex/hooks (grex installs these on init).
  6. Commit.

grex doctor after retirement should exit 0 on clean workspace.

Rollback

Nothing in grex import mutates .scripts/ or REPOS.json. Rollback = delete grex.jsonl + grex.lock.jsonl + revert .gitignore (if --migrate-gitignore used). No data loss path.

engineering

Cargo workspace setup, feature flags, CI matrix, release pipeline, versioning policy.

Cargo workspace

Single crate grex (lib + bin). No sub-crates in v1. grex-plugin-api splits out in v2 for ABI-stable plugin authoring.

Cargo.toml (root):

[workspace]
members = ["grex"]
resolver = "2"

[workspace.package]
edition      = "2024"
rust-version = "1.82"
license      = "Apache-2.0 OR MIT"
repository   = "https://github.com/grex-org/grex"

grex/Cargo.toml:

[package]
name         = "grex"         # fallback: "grex-cli" if crates.io taken
version      = "0.1.0"
description  = "Cross-platform dev-environment orchestrator. Pack-based, agent-native, Rust-fast."
readme       = "README.md"
keywords     = ["dev-env", "pack", "meta-repo", "mcp", "cli"]
categories   = ["command-line-utilities", "development-tools"]

[[bin]]
name = "grex"
path = "src/main.rs"

[features]
default           = ["git-backend-gix"]
git-backend-gix   = ["dep:gix"]
git-backend-git2  = ["dep:git2"]
simd-json         = ["dep:simd-json"]
tui               = ["dep:ratatui", "dep:crossterm"]     # v2
sqlite            = ["dep:rusqlite"]                     # v2
lean4             = []                                    # marker; CI-only proof job

[dependencies]
tokio              = { version = "1", features = ["full"] }
clap               = { version = "4", features = ["derive"] }
serde              = { version = "1", features = ["derive"] }
serde_json         = "1"
serde_yaml         = "0.9"
anyhow             = "1"
thiserror          = "2"
tracing            = "0.1"
tracing-subscriber = "0.3"
comfy-table        = "7"
owo-colors         = "4"
fd-lock            = "4"
async-trait        = "0.1"
num_cpus           = "1"
inventory          = "0.3"
gix                = { version = "0.66", optional = true }
git2               = { version = "0.19", optional = true }
simd-json          = { version = "0.14", optional = true }
ratatui            = { version = "0.28", optional = true }
crossterm          = { version = "0.28", optional = true }
rusqlite           = { version = "0.32", optional = true, features = ["bundled"] }

[dev-dependencies]
proptest           = "1"
tempfile           = "3"
assert_cmd         = "2"
predicates         = "3"
criterion          = "0.5"

Versions pinned at scaffold (M1); refreshed at release (M8).

Build

CommandPurpose
cargo builddev
cargo build --releaseoptimized
cargo build --all-featuresexercise optional features
LTO: [profile.release] lto = "thin", codegen-units = 1release speed

Test

CommandScope
cargo testunit + integration default features
cargo test --all-features --workspacefull matrix
cargo test -p grex --test crash_recoverysingle integration file
cargo benchcriterion (M2 onward)

Lint

cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
typos
cargo deny check

Details in linter.md.

CI matrix (.github/workflows/ci.yml)

name: ci
on: [push, pull_request]
jobs:
  test:
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]
        toolchain: [stable, beta]
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@master
        with:
          toolchain: ${{ matrix.toolchain }}
          components: rustfmt, clippy
      - uses: Swatinem/rust-cache@v2
      - run: cargo fmt --all -- --check
      - run: cargo clippy --all-targets --all-features -- -D warnings
      - run: cargo test --all-features --workspace
  lean:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: leanprover/lean-action@v1
      - run: cd proof && lake build
  deny:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: EmbarkStudios/cargo-deny-action@v1

Release pipeline

Tool: cargo-dist for cross-compiled release binaries.

Targets:

  • x86_64-unknown-linux-gnu
  • aarch64-unknown-linux-gnu
  • x86_64-apple-darwin
  • aarch64-apple-darwin
  • x86_64-pc-windows-msvc
  • aarch64-pc-windows-msvc

Flow:

  1. Bump version in Cargo.toml.
  2. Update CHANGELOG.md.
  3. git tag vX.Y.Z + push tag.
  4. release.yml (cargo-dist-generated) builds artifacts + creates GitHub Release.
  5. cargo publish -p grex to crates.io.
  6. Verify cargo install grex clean install on all three OSes (smoke).

Versioning policy

  • Crate semver MAJOR.MINOR.PATCH. Post-v1:
    • PATCH: bug fix, no API change.
    • MINOR: additive (new verb, flag, action, pack-type, MCP method).
    • MAJOR: any removal, rename, or semantic change of the 8 stable APIs.
  • Manifest schema (grex.jsonl schema_version field) — versioned independently. Breaking bump → reader rejects with actionable error pointing to grex upgrade-schema.
  • Lockfile schema — versioned independently (separate cadence from intent log).
  • pack.yaml schema_version — independent. v1 packs must remain readable by any v1.x.
  • MCP method catalog — tied to CLI verb surface; additions emit notifications/methods_changed.

Toolchain pin

rust-toolchain.toml:

[toolchain]
channel    = "1.82"
components = ["rustfmt", "clippy"]

External tooling required

  • git CLI (fallback).
  • lake + lean (CI-only, for proof job).
  • cargo-dist (release-pipeline only).
  • typos + cargo-deny (CI).

Security

  • cargo deny check enforces license + advisory gates.
  • #![forbid(unsafe_code)] at crate root; narrow exceptions via #[allow(unsafe_code)] per-module where absolutely needed (fd-lock integration, Windows symlink APIs).
  • Supply-chain: consider cargo vet in v1.x once stable.
  • No shell invocation outside the actions::exec module.

Observability

  • tracing throughout.
  • tracing-subscriber wired at binary entry; CLI -v/-vv/-vvv controls filter.
  • Structured fields: pack_path, action, op, duration_ms, result.
  • v1.x may add on-disk JSON log sink for grex doctor retrospection.

License

Decision locked at M7. Current preference: dual MIT OR Apache-2.0 (Rust-community convention). Alternative single-license choice acceptable if legal reviewer prefers.

test-plan

Pyramid from unit through CI cross-platform + Lean4 proof compilation + pack-protocol contract tests.

Pyramid

        ┌──────────────────────────────┐
        │  Cross-plat CI matrix        │  few, slow
        ├──────────────────────────────┤
        │  Pack-protocol contract      │  fixture packs end-to-end
        ├──────────────────────────────┤
        │  MCP roundtrip               │  JSON-RPC scripted
        ├──────────────────────────────┤
        │  Crash injection             │  SIGKILL / TerminateProcess
        ├──────────────────────────────┤
        │  Integration                 │  real git, temp dirs
        ├──────────────────────────────┤
        │  Property (proptest)         │  manifest CRUD algebra
        ├──────────────────────────────┤
        │  Unit                        │  fast, exhaustive, in-process
        └──────────────────────────────┘

Unit tests

In-module #[cfg(test)]. Fast, no IO except via tempfile.

Coverage targets:

  • manifest::event — every event variant, schema bump rejection, malformed line behavior.
  • manifest::fold — ordering, tombstone precedence, update idempotence.
  • manifest::lock — last-write-wins per id.
  • pack::schema — full pack.yaml schema validation, rejects + accepts.
  • gitignore — managed-block insert, update, preserve-user-lines, idempotent-sync.
  • cli::output — JSON / plain / pretty modes against golden strings.
  • concurrency::scheduler — semaphore acquisition order with mocked PackLock.
  • actions::* — each of 7 primitives has targeted unit tests (args parsing, dry-run, idempotency check).
  • packtypes::* — each lifecycle method dispatches correctly.
  • fetchers::git — URL parsing, ref-spec resolution.

Integration tests (tests/)

Each spins a temp dir via tempfile::TempDir, invokes compiled binary via assert_cmd or library entrypoints directly.

FileScenario
integration_add.rsgrex add against local bare-repo fixture → event appended, dir cloned, .gitignore updated, pack.yaml auto-detected
integration_rm.rsadd → rm → manifest tombstoned, dir gone, teardown ran
sync_recursive.rsmeta-pack with nested children syncs 3 levels deep
sync_parallel.rs8 local fixture packs, grex sync --parallel 4, all succeed, wall time sub-linear
gitignore_preserves_user_lines.rspre-populated .gitignore with user content outside managed block → round-trip preserves byte-for-byte
crash_recovery.rsspawn child, SIGKILL (Win: TerminateProcess) mid-append, grex ls recovers via torn-line detection
mcp_stdio.rsspawn grex serve --mcp, scripted JSON-RPC session, assert responses
import_legacy.rsseed REPOS.json + .gitignore, run grex import --from-repos-json, verify manifest + gitignore coexistence
doctor_drift.rscorrupt manifest / delete workdir, grex doctor --fix restores invariants
pack_types_end_to_end.rsone fixture of each of 3 pack-types: install + sync + teardown full round-trip on all OSes
bench_manifest.rs10k events fold < 1s, 100k events < 10s (criterion; non-blocking)

Git fixtures: bare .git local repos under tests/fixtures/, served via file:// URLs. No network in CI tests.

Property tests (proptest)

tests/property_manifest.rs:

  • Generate arbitrary sequences of add / rm / update / sync events.
  • Invariants under fold:
    • Tombstoned id never in state map.
    • Compaction idempotent: compact(compact(m)) == compact(m).
    • Fold-equivalence: fold(m) == fold(compact(m)).
    • Update last-writer-wins per id.

tests/property_gitignore.rs:

  • Random pre-existing .gitignore + random sequences of add/rm.
  • Invariants:
    • User lines outside managed block unchanged byte-for-byte.
    • Two consecutive syncs produce identical output.

tests/property_actions.rs:

  • Each action primitive: running twice in sequence is equivalent to running once (idempotency).
  • rollback(execute(x)) == starting state (for actions that support rollback).

Crash injection

tests/crash_recovery.rs:

  • Spawn helper binary (crash-helper, built alongside the test) that appends to grex.jsonl then panics mid-write (partial bytes, no newline, exits).
  • Parent opens the manifest, runs fold, expects success + one truncated-tail warning in tracing output.

Windows variant uses TerminateProcess via raw handle (#[cfg(windows)]).

MCP roundtrip

tests/mcp_stdio.rs:

  1. assert_cmd spawns grex serve --mcp --manifest <tempdir>/grex.jsonl.
  2. Pipe JSON-RPC frames to stdin, read stdout.
  3. Sequence: initializegrex.addgrex.lsgrex.syncgrex.statusgrex.rmgrex.ls.
  4. Assert each response matches expected JSON shape (via serde_json::Value equality + predicates).
  5. Assert clean shutdown on stdin close.

Cross-plat CI matrix

All integration + property + crash + MCP tests run on:

  • ubuntu-latest
  • macos-latest
  • windows-latest

Fixtures avoid platform-specific paths — all tests use tempfile + PathBuf.

Lean4 proof verification

.github/workflows/lean.yml:

- uses: leanprover/lean-action@v1
- run: cd proof && lake build

Job succeeds only if proof/Grex/Scheduler.lean compiles to .olean with zero sorry. Any unresolved axiom outside the single pack_lock_exclusive model-bridge axiom (resolved to theorem by M5-exit) fails CI.

Lean type-checking is the guarantee; CI does not attempt to verify proof content beyond compilation.

Pack-protocol contract tests

tests/pack_types_end_to_end.rs + fixture pack repos under tests/fixtures/packs/:

  • meta-basic/ — meta pack with 2 nested declarative children.
  • declarative-basic/ — declarative pack exercising all 7 action types.
  • scripted-basic/ — scripted pack with setup.sh + setup.ps1 + teardown.{sh,ps1}.

Contract assertions:

  • Install + sync + teardown round-trip leaves the workspace in the pre-install state.
  • Install followed by install (no changes) = idempotent.
  • Teardown followed by teardown = idempotent (second is no-op).
  • Lockfile entry matches expected sha + actions_hash after install.

Fixtures double as living documentation — they're the canonical "what does a v1 pack look like" examples.

Coverage

cargo-llvm-cov weekly on main. Target: 80% line coverage on manifest, pack, plugin, actions, packtypes, gitignore, concurrency. CLI + MCP exercised via integration tests, not measured for line coverage.

Smoke test (pre-release, manual)

Before tagging a release:

  1. cargo install --path grex
  2. cd <tempdir> && grex init && grex add git@github.com:grex-org/grex-inst.git && grex ls --long && grex sync && grex doctor.
  3. grex serve --mcp → send initialize manually, verify response.
  4. grex doctor → exit 0.
  5. Repeat on macOS and Windows.

linter

Rules enforced on every PR. CI fails on any violation.

Standard Rust tooling

ToolCommandGate
rustfmtcargo fmt --all -- --checkfail on any diff
clippycargo clippy --all-targets --all-features -- -D warningsfail on any warning
typostyposfail on misspellings
cargo-denycargo deny checklicense + advisory + source gates

Clippy configuration

clippy.toml:

avoid-breaking-exported-api = false
msrv = "1.82"

Lint levels in src/lib.rs:

#![allow(unused)]
#![forbid(unsafe_code)]
#![deny(
fn main() {
    clippy::unwrap_used,
    clippy::expect_used,
    clippy::panic,
    clippy::dbg_macro,
    clippy::print_stdout,
    clippy::print_stderr,
    clippy::todo,
    clippy::unimplemented,
)]
#![warn(
    clippy::pedantic,
    clippy::nursery,
    missing_docs,
)]
}

Tests and benches relax via #![allow(clippy::unwrap_used, clippy::expect_used)] at crate root for test binaries.

Custom rules

Output centralization

  • No println! / eprintln! / print! / eprint! outside src/cli/output.rs.
  • Enforced by clippy::print_stdout + clippy::print_stderr = deny.
  • All output goes through the formatter which honors --json / --plain / TTY detection.

Error handling discipline

  • Library modules (src/manifest, src/pack, src/plugin, src/actions, src/packtypes, src/fetchers, src/concurrency): use thiserror typed errors. anyhow banned here.
  • Binary modules (src/cli, src/main.rs, src/mcp): may use anyhow.
  • No unwrap() / expect() in production paths. Startup-only paths may expect() with a human-meaningful message if the invariant is unrecoverable (e.g. inventory registry empty = developer bug).

No direct shell-spawning outside actions/exec

  • tokio::process::Command and std::process::Command allowed ONLY in src/actions/exec.rs, src/packtypes/scripted.rs, and src/fetchers/git.rs (for CLI fallback).
  • Any other file invoking Command fails lint.
  • Enforced by CI grep rule:
if grep -rn 'process::Command' src/ --include='*.rs' \
    | grep -vE '^src/(actions/exec|packtypes/scripted|fetchers/git)\.rs'; then
  echo "shell invocation outside allowed modules"; exit 1
fi

Path rules (ported from legacy .scripts/test.py)

  • No hardcoded absolute paths in source, config, or embedded strings.

    • Banned: C:\, D:\, E:\, /home/, /Users/, /mnt/, /opt/.

    • CI grep:

      if grep -rn -E '([A-Z]:\\|/home/|/Users/|/mnt/|/opt/)' src/ --include='*.rs'; then
        echo "hardcoded path detected"; exit 1
      fi
      
  • No ~ in source strings. Home expansion lives in a PackCtx::env helper using dirs::home_dir().

  • No string concatenation with path separators. Use std::path::PathBuf + push()/join(). Clippy's path_buf_push_overwrite helps.

Manifest rules (runtime + lint)

  • pack.yaml children[].path MUST be bare name. Enforced at parse by pack::schema::validate() and at doctor-time by grex doctor.
  • grex.jsonl event path field likewise bare. No drive letters anywhere in manifest.

Plugin trait discipline

  • Every module under src/actions/ MUST contain exactly one impl ActionPlugin.
  • Every module under src/packtypes/ MUST contain exactly one impl PackTypePlugin.
  • Every module under src/fetchers/ MUST contain exactly one impl Fetcher.
  • Enforced by code review + presence of inventory::submit! block.

Shim rules — N/A

Legacy .scripts/ had Python-specific shim rules (no shutil.rmtree, no subprocess.run(shell=True), etc.). Rust has no direct analogue:

  • std::fs::remove_dir_all is cross-platform — no native-script indirection needed.
  • Shell invocation is already gated by the "no shell-spawning outside allowed modules" rule above.
  • Symlinks use std::os::{unix,windows}::fs directly.

CI job

.github/workflows/lint.yml (or a job in ci.yml):

lint:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - uses: dtolnay/rust-toolchain@stable
      with: { components: rustfmt, clippy }
    - run: cargo fmt --all -- --check
    - run: cargo clippy --all-targets --all-features -- -D warnings
    - uses: crate-ci/typos@master
    - uses: EmbarkStudios/cargo-deny-action@v1
    - name: hardcoded paths
      run: |
        ! grep -rn -E '([A-Z]:\\|/home/|/Users/|/mnt/|/opt/)' src/ --include='*.rs'
    - name: shell invocation scope
      run: |
        ! grep -rn 'process::Command' src/ --include='*.rs' \
          | grep -vE '^src/(actions/exec|packtypes/scripted|fetchers/git)\.rs'

Pre-commit hook

.grex/hooks/pre-commit:

#!/usr/bin/env bash
set -e
cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings

Activated by grex init via git config core.hooksPath .grex/hooks.

Man pages

grex ships a full set of Unix man pages — one root page plus one per CLI verb. They are a passive projection of the clap::Command tree defined in crates/grex/src/cli/args.rs; never edit the .1 files by hand.

What ships

14 files under man/ at the repo root:

PageCovers
grex.1Top-level binary + global flags (--json, --plain, --dry-run, --filter)
grex-init.1grex init
grex-add.1grex add <url> [path]
grex-rm.1grex rm <path>
grex-ls.1grex ls
grex-status.1grex status
grex-sync.1grex sync (parallel + --only + --ref)
grex-update.1grex update [pack]
grex-doctor.1grex doctor --fix --lint-config
grex-serve.1grex serve (MCP stdio)
grex-import.1grex import --from-repos-json
grex-run.1grex run <action>
grex-exec.1grex exec <cmd> …
grex-teardown.1grex teardown

Generating

The generator lives in crates/xtask/ and is invoked via the cargo xtask alias configured in .cargo/config.toml:

cargo xtask gen-man                    # write to <workspace>/man/
cargo xtask gen-man --out-dir /tmp/m   # write elsewhere

Internally the binary calls clap_mangen::Man::new(cmd).render(&mut buf) once for the root Command and once per subcommand. The subcommand name is prefixed with grex- so the .TH header reads grex-sync(1) instead of sync(1).

CI drift check

CI runs a man-drift job on every PR (see .github/workflows/ci.yml):

  1. cargo run -p xtask -- gen-man
  2. git diff --exit-code -- man/ — fails if the generated output differs from the committed files.

If you touch crates/grex/src/cli/args.rs (add a verb, rename a flag, edit a /// help doc comment) you must re-run cargo xtask gen-man and commit the regenerated .1 files or CI will reject the PR.

Release artifact inclusion

man/ is listed in [workspace.metadata.dist].include in the root Cargo.toml, so every cargo-dist-built release tarball ships the full man-page set alongside README.md, CHANGELOG.md, and the licenses.

Installing

See the README "Man pages" section for the one-line install -Dm644 incantation. The shell / PowerShell installer one-liners do not install pages into the system man path — manual copy is required for now.

roadmap

Content scope by release. Timeline is ordering + dependencies, not dates.

v1 — Pack-based orchestrator, stable core

Ships all 7 philosophy principles (see goals.md).

Core always compiled

  • Manifest (JSONL intent log).
  • Lockfile (JSONL resolved state).
  • Scheduler (tokio + bounded semaphore + per-pack .grex-lock + fd-lock).
  • Sync engine (git clone/pull, recursion).
  • Gitignore automation (managed-block markers).
  • MCP stdio JSON-RPC server.
  • Pack discovery (.grex/pack.yaml).
  • Action plugin registry + 7 built-in actions.
  • Pack-type plugin registry + 3 built-in pack-types.
  • Atomic file writes (temp + rename).
  • Lean4 proof Grex.Scheduler.no_double_lock.

Frozen public APIs

  1. .grex/pack.yaml schema (v1).
  2. grex.jsonl event schema.
  3. grex.lock.jsonl schema.
  4. ActionPlugin trait.
  5. PackTypePlugin trait.
  6. Fetcher trait.
  7. CLI verb surface (12 verbs).
  8. MCP method surface (1:1 with CLI).

Explicitly NOT in v1

  • External plugin loading.
  • TUI.
  • Non-git fetchers.
  • Additional pack-types / actions beyond the built-ins.
  • Pack registry.
  • Self-update.

Exit criteria: all success criteria in the feature spec PASS in CI matrix; crates.io publish successful; reference pack repo installs cleanly.

v2 — Extensibility & aesthetics

Opens third-party extension; adds TUI + non-git fetchers.

External plugin loading

Two candidate routes evaluated in v2 alpha:

  • Dylib (libloading + abi_stable): native speed, strict ABI versioning.
  • WASM (wasmtime / extism): sandboxed, forward-compatible, syscall bridging required.

Decision in v2 alpha; both may ship (host selects by file extension).

Retro-futurist TUI

ratatui-based dashboard, feature-flagged --features tui. Live pack tree, per-pack sync stream, lock inspector, CRT glyph aesthetic. Falls back to plain ANSI when --plain or non-TTY.

Additional pack-types (via plugin)

  • software-list — iterates package installs (winget/brew/apt).
  • env-bundle — manages a coherent group of env vars + PATH entries.
  • dotfiles — dotfile-manager style: iterate + symlink.

Additional actions (via plugin)

pkg-install, url-download, archive-extract, file-append, patch, json-merge, template, path-add, shell-rc-inject.

Additional Lean4 proofs

  • I2: manifest append serialization under fd-lock.
  • I3: .gitignore managed-block idempotence.
  • I4: compaction fold-equivalence.
  • Commutativity of disjoint-path events.

SQLite optional backend

Feature flag sqlite. Same Manifest API. For users with >100k events.

Self-update

grex upgrade pulls latest release from GitHub.

Embedded scripting

Lua or Rhai in-process scripting — middle ground between declarative YAML and full shell escape. Candidate for a pack-type plugin in v2.

Non-git fetchers

rclone, s3, oci, http — all implement the Fetcher trait. grex add accepts --scheme <rclone|s3|...> or auto-detects from URL.

v3+ — Scale & federation

Exploratory. No commitments yet.

  • Pack registry (grex.dev) — hosted index of discoverable packs.
  • Rules engine.rules.yaml per pack, enforced on add/sync (modeled after metarepo's rules plugin).
  • Org-level federation — multiple top-level workspaces referencing each other.
  • Interactive HTTP dashboardgrex serve --http with web UI.
  • Distributed locking — optional consul/etcd for multi-host deployments.
  • p2p fetchers — IPFS, BitTorrent.
  • Supply-chain signing — pack signatures; registry-enforced integrity.

Non-roadmap (never)

  • Cross-VCS support (hg, svn, fossil, perforce).
  • Monorepo conversion tooling.
  • Git replacement.
  • Generic CI runner.
  • Full .gitmodules semantic replacement.

Dependency ordering (cross-release)

v1 (frozen APIs)
  └─► v2 external plugin loading
        └─► v2 additional pack-types + actions (as plugins)
        └─► v2 non-git fetchers (via Fetcher trait impls)
  └─► v2 TUI (independent)
  └─► v2 SQLite backend (independent)
  └─► v3 pack registry (needs plugin signing story)

m3-review-findings

Master finding list from the M3-close review series, plus mapping to the fix PRs that landed on main.

  • Date: 2026-04-20
  • Baseline: main at d160c7c (M3 Stage B close).
  • Final state: main at 7ce186e (5 fix PRs merged).
  • Test count: 316 → 344.

Methodology

Eight parallel reviews were run:

  • Codex adversarial passes (4) — semver hygiene, data-integrity, concurrency, cross-platform. Prompted to find breakage, not to polish.
  • Analytical subagent passes (4) — docs / rustdoc coverage, perf / allocations, recovery / crash-resume, security audit.

7 of 8 returned usable synthesis. The security audit stalled at synthesis twice (codex truncated mid-report on both retries). Security retry is filed under open carry-forwards rather than being treated as a clean pass — do not assume the review was completed.

Each review produced a file:line-cited report; the master list below is the synthesized severity grouping the reviewer saw at close.

Master finding list

Severity legend: CRITICAL (correctness / data loss) · HIGH (wrong result under realistic input) · MEDIUM (bad UX / minor correctness) · LOW (cosmetic / edge) · NIT (style).

CRITICAL

#FindingEvidence
C1Concurrent grex sync against the same workspace could interleave manifest appends — no workspace-level lock existedcrates/grex-core/src/sync/mod.rs (pre-#16); concurrency review report
C2Manifest could record a successful Sync for an action that panicked mid-side-effect — readers had no way to detect partial applycrates/grex-core/src/sync/emit.rs (pre-#15)
C3Symlink backup path: after rename(dst → .grex.bak) succeeded, a failed symlink() left the user with no original file and no new symlinkcrates/grex-core/src/execute/fs/symlink.rs (pre-#18)

HIGH

#FindingEvidence
H1VarEnv was case-sensitive on Windows → $USERPROFILE vs $UserProfile resolved differentlycrates/grex-core/src/vars/env.rs (pre-#17)
H2DupSymlinkValidator compared dst paths byte-for-byte → duplicates that differ only in case passed validation on case-insensitive FSescrates/grex-core/src/pack/validate/dup_symlink.rs (pre-#17)
H3kind: auto silently defaulted to file when src was missing, creating a dangling file-symlink where directory was requiredcrates/grex-core/src/execute/fs/symlink.rs (pre-#17)
H4Concurrent sync on the same clone dest could race the bare fetch vs the checkoutcrates/grex-core/src/git/backend/gix.rs (pre-#16)
H5All public enums / arg structs were implicit #[non_exhaustive]-missing → adding a variant in M4 would be a SemVer majorcrates/grex-core/src/** (pre-#14)
H6ExecNonZero carried the full stderr → event size could exceed fd-lock append atomicity ceilingcrates/grex-core/src/execute/fs/exec.rs (pre-#18)

MEDIUM

#FindingEvidence
M1Action name was &'static str → plugin-provided names (heap-allocated) could not registercrates/grex-core/src/pack/action.rs (pre-#14)
M2No pre-run scan for stale locks / orphaned .grex.bak files → surfaced only on next hitrecovery review report
M3Dirty-check ran before lock acquire → TOCTOU window between check and materialise_treecrates/grex-core/src/sync/mod.rs (pre-#16)
M4HOME → USERPROFILE fallback also fired in insert → user-explicit HOME insert was silently retargetedcrates/grex-core/src/vars/env.rs (pre-#17)
M5No ExecResult::Skipped variant → M4 idempotency skip would force a non-additive enum changecrates/grex-core/src/execute/result.rs (pre-#14)

LOW

#FindingEvidence
L1Unicode NFC/NFD path equality not handled (macOS)cross-platform review
L2Windows MAX_PATH: no \\?\ prefix for long pathscross-platform review
L3POSIX mode on Windows mkdir silently ignored — no warningcross-platform review
L4README status line claims "M1" — stale vs actual M3-completedocs review
L5CONTRIBUTING.md missingdocs review
L6PR template missingdocs review
L7~39% rustdoc gap concentrated in grex CLI cratedocs review
L8Only 1 file has rustdoc code examplesdocs review
L9Arc<PackManifest> would eliminate multiple per-action clonesperf review
L10Batched manifest appends under single lock acquireperf review
L11Predicate cache on ExecCtx — repeated cmd_available probesperf review
L12Cow<str> hot path in vars::expandperf review
L13gix shallow-clone option exposed via SyncOptionsperf review

NIT

#Finding
N1Inconsistent tracing span names across sync path
N2Several test names begin with test_ (clippy items_after_statements style)

Mapping: finding → PR → resolution

Fix PRs on main:

  • A = PR #14 — semver hygiene
  • B = PR #15 — data integrity (event brackets + halt context)
  • C = PR #16 — concurrency (workspace + repo fd-locks, TOCTOU closure)
  • D = PR #17 — cross-platform (VarEnv, case-folding, kind:auto)
  • E = PR #18 — recovery (backup rollback, recovery scan, stderr cap)
#Finding (short)PRResolution
C1workspace-concurrent syncC (#16)resolved — <workspace>/.grex.sync.lock fail-fast
C2partial-apply undetectableB (#15)resolved — ActionStarted/Completed/Halted + SyncError::Halted(Box<HaltedContext>)
C3backup-then-create atomicityE (#18)resolved — rename-back on create failure; SymlinkCreateAfterBackupFailed if rollback fails
H1Win case-sensitive VarEnvD (#17)resolved — two-map (inner + ASCII-lowercase lookup_index)
H2DupSymlink case-sensitiveD (#17)resolved — ASCII case-fold on Windows/macOS
H3kind: auto silent defaultD (#17)resolved — ExecError::SymlinkAutoKindUnresolvable
H4repo-concurrent raceC (#16)resolved — <dest>.grex-backend.lock sibling file
H5missing #[non_exhaustive]A (#14)resolved — applied workspace-wide (list in PR description)
H6unbounded stderr in eventsE (#18)resolved — 2 KB truncation cap
M1plugin name heap-allocA (#14)resolved — Cow<'static, str>
M2no startup recovery scanE (#18)resolved — informational scan (auto-cleanup deferred to grex doctor M4+)
M3dirty-check TOCTOUC (#16)resolved — revalidated after lock + immediately before materialise_tree
M4HOME→USERPROFILE in insertD (#17)resolved — fallback only in from_os / from_map
M5no Skipped variantA (#14)reserved — variant added, emission deferred to M4 lockfile idempotency
L1NFC/NFD equalitydeferred (carry-forward)
L2MAX_PATH \\?\deferred (carry-forward)
L3POSIX mode on Win warndeferred (carry-forward)
L4README staledeferred (docs carry-forward)
L5CONTRIBUTING.mddeferred (docs carry-forward)
L6PR templatedeferred (docs carry-forward)
L7rustdoc gapdeferred (docs carry-forward)
L8no rustdoc examplesdeferred (docs carry-forward)
L9–L13perf itemsdeferred (perf carry-forward; not on M4 critical path)
N1–N2nitspunted (no ticket)

Deferred findings (remain open)

Grouped for triage when M4 planning starts:

Security

  • Security review retry — codex synthesis stalled twice. Re-run with a smaller scope or a different synthesizer before claiming a clean security pass.

Docs

  • README status line (M1 → M3).
  • Add CONTRIBUTING.md.
  • Add PR template.
  • Close the 39% rustdoc gap (primary offender: grex CLI crate).
  • Add rustdoc code examples to at least the public grex-core surface.

Perf

  • Arc<PackManifest> to eliminate clones across the sync pipeline.
  • Batched manifest appends under a single fd-lock acquire.
  • Predicate cache on ExecCtx (repeated cmd_available etc.).
  • Cow<str> on the vars::expand hot path.
  • Expose gix shallow-clone option via SyncOptions.

Platform edges (LOW)

  • Unicode NFC/NFD path equality (macOS).
  • Windows \\?\ long-path prefix for MAX_PATH.
  • POSIX-only mode field on mkdir should warn on Windows.

Cross-refs

  • progress.md — "Decisions locked during M3 review series" mirrors the decisions captured in the PR descriptions.
  • .omne/cfg/concurrency.md — updated to document workspace + repo fd-lock contract.
  • .omne/cfg/manifest.md — updated to document ActionStarted / ActionCompleted / ActionHalted event brackets.
  • .omne/cfg/actions.md — updated to document symlink backup-rollback, kind: auto missing-src error, and exec stderr truncation.

Release process

How to cut a grex release. Covers the GitHub Release (binaries via cargo-dist) and the crates.io publish steps. Rollback procedure at the end.

Audience: maintainers. Users should install per README.md §Install.

Prerequisites

  • Push access to main and tag-push rights on egoisth777/grex.
  • A crates.io API token with publish rights on grex-cli, grex-core, grex-mcp, grex-plugins-builtin (cargo login on your workstation).
  • cargo-dist installed locally at the pinned version matching [workspace.metadata.dist].cargo-dist-version (currently 0.31.0) — only required if you want to re-run dist plan before tagging.
  • Clean git status; working tree must match the exact commit you are releasing. No un-committed changes.

1. Prepare the CHANGELOG

In CHANGELOG.md:

  1. Rename the [Unreleased - 1.0.0] heading to [1.0.0] - YYYY-MM-DD using today's UTC date.
  2. Open a new empty [Unreleased] section above it with empty Added / Changed / Fixed / Removed subsections.
  3. Ensure every Added bullet references the PR that introduced it.
  4. Commit: git commit -am "chore(release): prepare v1.0.0".
  5. Push to main via the normal PR flow. Do NOT tag yet.

2. Tag and push

Once the chore(release): prepare v1.0.0 commit is on main:

git switch main
git pull --ff-only
git tag -a v1.0.0 -m "grex v1.0.0"
git push origin v1.0.0

The tag push triggers .github/workflows/release.yml:

  • plan — validates dist-manifest.json against the 5 targets.
  • build-local-artifacts × 5 — builds grex for each target, signs artefacts via GitHub's native attestations (actions/attest-build-provenance).
  • build-global-artifacts — produces the installer.sh + installer.ps1 scripts and SHA-256 sums.
  • host + announce — creates the GitHub Release and uploads all artefacts (.tar.xz / .zip / *.sha256 / installers / source.tar.gz).

The GitHub Release body is auto-extracted from the [1.0.0] section of CHANGELOG.md (cargo-dist convention).

3. Publish to crates.io (manual)

cargo-dist does NOT publish to crates.io. Do this manually from a checkout of the tagged commit, in strict topological order. Prefer --wait-for-publish (cargo 1.66+) over a hand-timed sleep — it polls the index and only exits once the crate is actually resolvable:

git switch --detach v1.0.0

cargo publish --wait-for-publish --timeout 300 -p grex-core
cargo publish --wait-for-publish --timeout 300 -p grex-plugins-builtin
cargo publish --wait-for-publish --timeout 300 -p grex-mcp
cargo publish --wait-for-publish --timeout 300 -p grex-cli

Order rationale: grex-plugins-builtin and grex-mcp both depend on grex-core; grex-cli depends on all three. See openspec/changes/feat-m8-release/crates-io-names.md §2 for the dep graph.

Smoke test post-publish:

cargo install grex-cli --locked
grex --version   # must print 1.0.0

4. Installer smoke tests

From a fresh shell session:

# Linux / macOS
curl -LsSf https://github.com/egoisth777/grex/releases/latest/download/grex-cli-installer.sh | sh
grex --version

# Windows
powershell -c "irm https://github.com/egoisth777/grex/releases/latest/download/grex-cli-installer.ps1 | iex"
grex --version

Every artefact is signed via GitHub's native build provenance (actions/attest-build-provenance). Users can verify the binary matches the commit + workflow that produced it before trusting it:

# Download + verify attestation (requires gh CLI >= 2.49)
gh release download v1.0.0 --repo egoisth777/grex --pattern '*.tar.xz'
gh attestation verify grex-cli-x86_64-unknown-linux-gnu.tar.xz --repo egoisth777/grex
tar xf grex-cli-x86_64-unknown-linux-gnu.tar.xz
sudo mv grex-cli*/grex /usr/local/bin/
grex --version

The curl | sh / irm | iex one-liners above are a convenience path and do NOT verify attestations.

Supported platforms

Pre-built binaries ship for these five triples (see [workspace.metadata.dist].targets in root Cargo.toml):

TripleRunner
x86_64-unknown-linux-gnuubuntu-22.04
aarch64-unknown-linux-gnuubuntu-22.04-arm
x86_64-apple-darwinmacos-13
aarch64-apple-darwinmacos-14
x86_64-pc-windows-msvcwindows-2022

Everything else (32-bit, musl, FreeBSD, aarch64-windows, etc.) falls back to building from source:

cargo install grex-cli --locked

Rollback

Yank a bad crates.io release

cargo yank hides the version from the resolver without deleting it. Yank in reverse-dependency order (bin first, so dependents cannot keep pulling it in):

cargo yank --version 1.0.0 grex-cli
cargo yank --version 1.0.0 grex-mcp
cargo yank --version 1.0.0 grex-plugins-builtin
cargo yank --version 1.0.0 grex-core

Yanking is reversible: cargo yank --version 1.0.0 --undo <crate> if you decide to keep the release after all.

Mark the GitHub Release as pre-release

gh release edit v1.0.0 --prerelease

This hides it from the "latest" installer URL without deleting the artefacts. Users on the installer one-liner will stop picking up the bad release automatically.

Ship a fix

Cut a fresh patch release (v1.0.1) with the fix — do not re-tag v1.0.0. Re-tagging breaks provenance and every cached copy of the installer script.

On the limits of rollback

  • cargo yank is not cargo delete. The crate file stays on crates.io forever; yanking only excludes it from new resolves. Code that pinned = 1.0.0 in a lockfile keeps compiling. There is no delete API.
  • Sigstore attestations are immutable. A released artefact whose build provenance is on the Sigstore transparency log cannot be revoked — gh attestation verify will keep returning OK even after you mark the release pre-release.
  • Compromised-binary rollback MUST use a patch bump (v1.0.1) that supersedes the bad version. Yank v1.0.0, mark its GitHub Release pre-release, and push v1.0.1 through the same release pipeline. Do not attempt to re-tag or delete artefacts.

Pinning updates

To update the pinned cargo-dist version:

  1. Bump [workspace.metadata.dist].cargo-dist-version in root Cargo.toml.
  2. Run cargo install cargo-dist --locked --version <new> locally.
  3. Run dist generate to regenerate .github/workflows/release.yml.
  4. Commit both files together. CI's release-plan job verifies the manifest still parses.

SemVer policy for grex

grex follows Semantic Versioning 2.0.0. This document pins down what "breaking", "additive", and "fix" mean concretely for grex, because the public surface spans four distinct contracts that users and agents depend on:

  1. Manifest schemagrex.jsonl + grex.lock.jsonl row shapes, keyed on the per-row schema_version field.
  2. CLI surface — verb names, flag names, exit codes, and the --json / --plain stdout formats.
  3. MCP tool surface — JSON-RPC tool names, input/output JSON schemas, and tool annotations exposed by grex serve --mcp.
  4. pack.yaml schema — pack-type plugin names, action names, and action field shapes consumed by the pack parser.

A release is MAJOR, MINOR, or PATCH based on the worst change across all four surfaces. A MAJOR change on any one surface forces a MAJOR release, even if the other three are additive-only.

The short version

BumpWhat it means
MAJORExisting workspaces / agents / packs may stop working after upgrade; migration may be needed.
MINOREverything that worked before still works; new capabilities are available to opt into.
PATCHBehaviour identical from the user's perspective; bugs fixed, perf improved, docs clarified.

The manifest-wire invariant

The single load-bearing invariant across all four surfaces is the JSONL wire format of grex.jsonl and grex.lock.jsonl:

  • Every row carries a schema_version integer field (since M2 / PR #2).
  • Writers never emit rows at a schema version older than the one their binary understands. A newer grex writes newer rows; an older grex never downgrades.
  • Readers treat unknown future fields on a known-version row as skip-don't-error — extra keys are ignored, not rejected.
  • Bumping schema_version past the max a reader supports is a MAJOR event for that row kind; readers older than the new major will refuse the row with a structured error and instruct the user to upgrade.

This is the one rule that survives any SemVer ambiguity below: if you cannot round-trip a manifest through an older compatible grex without silent data loss, the change is MAJOR.

Per-surface rules

1. Manifest schema (grex.jsonl / grex.lock.jsonl)

ChangeBump
Remove a row kind (e.g. drop RegisterPack rows)MAJOR
Rename a required field on an existing row (e.g. urlrepo_url)MAJOR
Change the type of an existing field (e.g. parallel: intparallel: str)MAJOR
Tighten a constraint (e.g. a previously free-form string becomes enum-only)MAJOR
Bump schema_version past what older readers supportMAJOR
Add a new row kind that older readers skip cleanly (unknown kind = skip)MINOR
Add a new optional field to an existing row (readers ignore unknown fields)MINOR
Widen a constraint (e.g. enum gains a new variant — older readers skip row)MINOR
Fix a writer bug that emitted malformed rows (readers already tolerant)PATCH
Improve compaction perf; rewrite internals without format changePATCH

2. CLI surface (verbs, flags, exit codes, stdout format)

ChangeBump
Rename or remove a verb (grex addgrex register)MAJOR
Change a verb's positional-argument shape (<url> [path]<path> <url>)MAJOR
Remap an existing exit code's meaning (e.g. 2 previously = parse error, now 2 = lock contention)MAJOR
Change the shape of --json stdout for an existing verbMAJOR
Remove a flag (even a short alias) that was stable in the previous MINORMAJOR
Add a new verbMINOR
Add a new flag with a safe default that preserves prior behaviourMINOR
Add a new field to an existing --json payload (consumers ignore unknowns)MINOR
Improve an error message; reword --help text; fix tab alignmentPATCH
Fix a buggy exit code that never returned its documented valuePATCH

Caveat on exit-code fixes: a PATCH-class exit-code correction is still visible to scripts that pinned against the buggy value. The CHANGELOG entry must call it out under Fixed in bold so operators notice before upgrading.

3. MCP tool surface (grex serve --mcp)

ChangeBump
Rename or remove a tool (pack_addregister_pack)MAJOR
Remove a required field from a tool's input schemaMAJOR
Add a new required field to a tool's input schemaMAJOR
Change a tool's output schema field typeMAJOR
Change or remove a tool annotation an existing client depends onMAJOR
Add a new toolMINOR
Add an optional input field with a safe defaultMINOR
Add a new output field (clients ignore unknowns per MCP spec)MINOR
Add a new annotationMINOR
Fix a handler bug where the tool returned success on partial failurePATCH
Improve tool description strings; tighten input-validation error messagesPATCH

The MCP conformance suite (PR #28) pins the 2025-06-18 MCP spec revision. Bumping to a later MCP spec revision is itself MAJOR if the newer spec has breaking changes the grex surface propagates; otherwise MINOR.

4. pack.yaml schema (pack-type + action plugins)

ChangeBump
Rename a built-in pack-type (declarativestatic)MAJOR
Rename a built-in action (file-writewrite-file)MAJOR
Remove a field from an existing action's input shapeMAJOR
Change an action's default behaviour for an existing fieldMAJOR
Remove a built-in pack-type or actionMAJOR
Add a new built-in pack-typeMINOR
Add a new built-in actionMINOR
Add a new optional field to an existing actionMINOR
Loosen a validation rule (previously rejected input now accepted with warning)MINOR
Fix a parser bug; improve error locations; clarify validation messagesPATCH
Improve action-execution perf; refactor executor internalsPATCH

External plugin ABI stability is deferred to the v2 plugin spec; v1.0.0 has no external plugin surface.

Deprecation policy

  • When grex needs to remove a verb, flag, tool, annotation, pack-type, action, or manifest field, it first deprecates the surface in a MINOR release.
  • A deprecated surface continues to work for at least one full MINOR cycle before removal in the next MAJOR.
  • grex doctor surfaces deprecation warnings in its output when a workspace's manifest or pack tree uses a deprecated surface. doctor-clean before a MAJOR upgrade means no deprecated usage left.
  • MCP clients receive deprecation notices via the tool-annotation mechanism; a deprecated tool's annotation gets a deprecated: true marker and a human-readable message pointing at its replacement.
  • Deprecation entries go under ### Deprecated in CHANGELOG.md on the MINOR release that introduces them and under ### Removed in the MAJOR release that retires them, with a back-reference to the deprecation entry.

What is not covered by SemVer

grex's SemVer contract covers the four public surfaces above. The following are explicitly out-of-scope:

  • Internal module layout (grex-core internals, private items). Reshuffled without bumping — consumers should not depend on private crate APIs.
  • Log / trace / stderr formatting (not --json and not --plain). Free to evolve at any point.
  • Build artefact names and installer script URLs — these follow the cargo-dist release pipeline's conventions, not grex SemVer.
  • docs/ content, design notes, and milestone.md. Documentation is maintained for correctness but not versioned.
  • CI matrix composition (adding or dropping platforms from the build matrix). Platform-support drops will be called out in the CHANGELOG but follow their own platform-support policy.
  • Minimum supported Rust version (MSRV) — MSRV bumps are MINOR and are called out in the CHANGELOG.

See also

MCP protocol conformance (CI gate)

Shipped in feat-m7-3. See openspec/changes/feat-m7-3-mcp-ci-conformance/proposal.md.

Purpose

mcp-conformance is the L6 external-oracle gate: an independent implementation (mcp-validator, Janix-ai) drives grex serve and asserts wire-protocol conformance at MCP protocol version 2025-06-18. Pairs with the in-process L2-L5 harness from feat-m7-2 — the harness checks our own rmcp-typed client, this job checks a third-party one.

Pin

FieldValue
Upstreamgithub.com/Janix-ai/mcp-validator
CI sourcegithub.com/egoisth777/mcp-validator (org-controlled mirror, same SHA)
Tagv0.3.1
Commit SHAd766d3ee94076b13d0b73253e5221bbc76b9edb2
Released2025-07-08T13:55:45Z
Install pathactions/checkout the pinned SHA from the mirror into .mcp-validator/, then pip install -r .mcp-validator/requirements.txt, then run python -m mcp_testing.stdio.cli with PYTHONPATH=.mcp-validator.
PyPI statusmcp-validator==0.3.1 is NOT published on PyPI (only 0.1.1 is).
pip install git+URLNOT supported at this SHA. The upstream repo at tag v0.3.1 ships neither setup.py nor pyproject.toml, so pip refuses with does not appear to be a Python project. Clone-and-run is the only supported path until upstream adds a packaging file.
Protocol2025-06-18 (matches .omne/cfg/mcp.md SSOT)
Pin verified2026-04-22 via gh api repos/Janix-ai/mcp-validator/git/refs/tags/v0.3.1

Bump policy

Any bump MUST update tag AND SHA together. Re-run:

gh api repos/Janix-ai/mcp-validator/releases/latest --jq '.tag_name,.published_at'
gh api repos/Janix-ai/mcp-validator/git/refs/tags/v<NEW> --jq '.object.sha'

Drift between the two is a merge blocker.

Supply-chain hardening

The CI job checks out from egoisth777/mcp-validator (an org-controlled mirror / fork of Janix-ai/mcp-validator) rather than upstream directly. Rationale: actions/checkout of an external repo that we then pip install hands that external maintainer arbitrary code execution under the CI token if upstream is compromised, rewritten, or replaced. Mirroring the pin into a repo we control closes that window — the SHA is byte-identical to upstream, but the host cannot be tampered with by third parties.

Mirror refresh procedure (run once per validator bump):

# 1. Confirm new upstream tag + SHA.
gh api repos/Janix-ai/mcp-validator/releases/latest --jq '.tag_name,.published_at'
NEW_SHA=$(gh api repos/Janix-ai/mcp-validator/git/refs/tags/v<NEW> --jq '.object.sha')

# 2. Sync mirror's default branch with upstream (one-time, if not already
#    tracking). The fork created via `gh repo fork Janix-ai/mcp-validator`
#    already has all history; subsequent refreshes via:
gh api -X POST repos/egoisth777/mcp-validator/merge-upstream \
  -f branch=main

# 3. Verify the new SHA is reachable from the mirror.
gh api repos/egoisth777/mcp-validator/commits/$NEW_SHA --jq '.sha'

# 4. Update `ref:` in `.github/workflows/ci.yml` mcp-conformance job.

CLI invocation

There is no mcp-validator console entry point at this SHA. The canonical invocation (matches upstream ref_gh_actions/stdio-validation.yml at tag v0.3.1) is a python -m call against the checked-out module:

PYTHONPATH=/abs/path/to/mcp-validator-checkout \
python -m mcp_testing.stdio.cli \
  "$GITHUB_WORKSPACE/target/release/grex serve" \
  --protocol-version 2025-06-18 \
  --output-dir reports \
  --report-format json

Verified --help output at tag v0.3.1:

usage: cli.py [-h] [--args ARGS [ARGS ...]] [--debug]
              [--protocol-version {2024-11-05,2025-03-26,2025-06-18}]
              [--output-dir OUTPUT_DIR] [--report-format {text,json,html}]
              server_command

Notes:

  • Server command is positional, NOT --server-command (the earlier spec draft had this wrong; corrected here and in ci.yml).
  • No --timeout flag exists at this SHA. Upstream's own ref_gh_actions/stdio-validation.yml template lists --timeout 30 but the template drifted from the code; the CLI rejects it with unrecognized arguments: --timeout 30. Omitted.
  • --protocol-version is passed to the validator, NOT to grex serve (which does not accept that flag).
  • PYTHONPATH is required because the upstream repo at this SHA is not pip-installable (no setup.py / pyproject.toml).
  • The job uploads reports/ as a workflow artefact (mcp-conformance-reports, 14-day retention) regardless of pass/fail so failed runs are debuggable.

Local repro

From repo root on any supported OS (Python 3.12):

cargo build --release -p grex
# Use the org mirror (same SHA) so local repro matches CI's supply chain.
git clone https://github.com/egoisth777/mcp-validator .mcp-validator
git -C .mcp-validator checkout d766d3ee94076b13d0b73253e5221bbc76b9edb2
python -m pip install --upgrade pip
pip install -r .mcp-validator/requirements.txt
mkdir -p reports
PYTHONPATH="$(pwd)/.mcp-validator" \
python -m mcp_testing.stdio.cli \
  "$(pwd)/target/release/grex serve" \
  --protocol-version 2025-06-18 \
  --output-dir reports \
  --report-format json

Exit code 0 = conformant. Non-zero = protocol drift (inspect reports/*.json for the failing test cases).

Deliberate-regression smoke

To confirm the gate actually gates (not just green-by-accident):

  1. On a throwaway branch matching the CI trigger glob (feat/**), break the MCP surface. Notes on what actually trips validator v0.3.1's stdio suite: renaming a tool or downgrading protocolVersion both PASS (validator accepts any initialize and does not assert tool inventory). The reliable break is making grex serve exit non-zero before accepting any stdio frames — e.g. replace serve::run body with anyhow::bail!("smoke").
  2. Push and wait for CI — the MCP protocol conformance (2025-06-18) job MUST exit red.
  3. Delete the throwaway branch locally and on origin; do NOT merge.

Proof of red run

DateBranchHead SHARun URLResult
2026-04-22feat/m7-3-smoke-regression-proof (since deleted)357b3b77dbbfabe7a5fec1f20f70a906cabfcd38https://github.com/egoisth777/grex/actions/runs/24808713997/job/72608705067mcp-conformance = failure

Revert: sub-branch deleted from both local and origin after capture; PR #28's feat/m7-3-mcp-ci-conformance HEAD never contained the smoke commit. Proof commit (357b3b7) is no longer reachable via any branch — only this URL preserves the evidence.

Bypass procedure

Adversarial case: the validator itself regresses and blocks all PRs. To remove the gate temporarily:

  1. Maintainer: GitHub → Settings → Branches → main branch protection → "Required status checks" → untick MCP protocol conformance (2025-06-18).
  2. Save. The gate is now advisory.
  3. File a tracking issue linking the validator upstream regression.
  4. Once the upstream pin is fixed (or reverted), re-tick the required check and close the issue.

The pin is explicit, so a validator regression is always reproducible locally via the install command above — fixes are single-line PRs that bump the tag + SHA together.

Upstream disappearance

If Janix-ai/mcp-validator is deleted, renamed, or has its history rewritten, CI continues to work unchanged because the job reads from the org mirror egoisth777/mcp-validator, which retains the pinned SHA independently. Remediation in that scenario:

  1. Confirm the mirror still holds the pinned SHA:
    gh api repos/egoisth777/mcp-validator/commits/d766d3ee94076b13d0b73253e5221bbc76b9edb2 --jq '.sha'
    
  2. File a tracking issue noting upstream loss so future bumps either (a) find a replacement validator or (b) vendor the validator source under .mcp-validator-vendored/ in-repo and drop the external checkout step.
  3. No CI changes required in the meantime — the mirror IS the durable source of truth for the pinned build.

Pointing ref: back at upstream is only appropriate if upstream is restored AND has been re-audited.

CI job layout

See .github/workflows/ci.yml job mcp-conformance. Notes:

  • No needs: dependency. The job owns its own cargo build --release -p grex step and runs in parallel with the debug build matrix. Adding needs: [build] would stall on the 3-OS matrix (~5 min p95) with no artefact payoff. Budget: ~3.5 min cold / ~1.5 min warm.
  • Release cache key is distinct (key: release) so the release target profile does not thrash the debug cache used by build.
  • Python 3.12 matches upstream's template.