Repo Context Scan

Overview

Every workbench registers one or more source repos under repos/<name>/. Agents working in the workbench do better when they can read a short, curated summary of each repo — what it does, what types and terms matter, what decisions are baked in — instead of grepping the source. The repo-context-scan feature builds that summary automatically.

It runs in two places, with no user action required:

init.wb — after every registered repo has been cloned, devkit scans each one and writes context/<name>/CONTEXT.md into the workbench.
join.wb — when a joiner registers extra repos, the same scan fires for the new repos only.

For manual refresh, drift recovery, or rerun-on-failure, use wb.rescan.

What gets scanned

For each registered source repo, the vendored repo-context-scan skill performs an LLM-driven semantic read of the working tree, surface-level docs, and git history. It produces a single CONTEXT.md describing the repo’s domain: core types, entities, vocabulary, and clearly-deliberate decisions worth seeding as ADRs.

For multi-context repos (e.g. a monorepo holding both an API and a UI), the skill emits a CONTEXT-MAP.md at the root plus one CONTEXT.md per sub-context (api/CONTEXT.md, ui/CONTEXT.md, etc.). Devkit harvests whichever shape the skill produces.

The skill itself lives upstream in the skills repo. Devkit vendors a pinned copy at ${DEVKIT_DIR}/skills/repo-context-scan/ with an .upstream file that records the source SHA. See Maintainer: re-sync the vendored skill.

Where outputs land

All scan outputs are wb-owned. Source repos under repos/<name>/ are never mutated — devkit reads them through a throwaway git worktree sandbox.

${WB_DIR}/
  context/
    README.md                          # aggregate index, generated by lib
    payments-svc/
      CONTEXT.md                       # devkit frontmatter + skill body
      docs/adr/0001-*.md               # seeded ADRs for clear decisions
    billing-svc/
      CONTEXT.md                       # may be a stub if the scan failed
    e2e-tests/
      CONTEXT-MAP.md                   # multi-context repo variant
      api/CONTEXT.md
      ui/CONTEXT.md
  .context-scan/                       # worktree staging — gitignored, transient
  repos/                               # source repo clones — never written to
    payments-svc/
    billing-svc/
    e2e-tests/

.context-scan/ is gitignored by the ai-workbench template. The aggregate context/README.md is regenerated on every scan batch.

Worktree sandbox model

The decoupling between “read from source” and “write to wb” is what keeps source repos pristine. The wrapper library ${DEVKIT_DIR}/lib/wb-context-scan.zsh exposes three subcommands that fence the work:

Subcommand	Responsibility
`setup <WB_DIR> <name>`	Defensive prune, mkdir, `git worktree add --detach` from `repos/<name>` into `.context-scan/<name>`, pre-wipe any prior CONTEXT files inside the worktree, acquire the lock, print `SCAN_DIR=<path>`.
`finalize <WB_DIR> <name> [--fail-reason "..."]`	If the worktree produced `CONTEXT.md` / `CONTEXT-MAP.md` / `docs/adr`, harvest into `context/<name>/` and prepend devkit frontmatter. Otherwise (or when `--fail-reason` is given) write a failure stub. Remove the worktree. Release the lock. Always idempotent.
`aggregate <WB_DIR>`	Walk `context/*/CONTEXT.md` and `CONTEXT-MAP.md`, cross-reference `project.conf` REPOS, regenerate `context/README.md`.

Between setup and finalize an external agent (sub-agent dispatched from Claude, inline invocation from Devin, or subprocess from wb.rescan) executes /repo-context-scan against SCAN_DIR. The lib itself is pure deterministic shell with no LLM dependency.

Concurrency is gated by a mkdir-as-lock on .context-scan/.lock (chosen over flock because it survives the process boundary between setup and finalize and works portably on macOS without extra dependencies). A second concurrent run fails fast with lock contended.

CONTEXT.md frontmatter

Every CONTEXT.md (success or stub) carries the same devkit-stamped frontmatter block. Schema:

---
generated_by:   ai-devkit/repo-context-scan
generated_at:   2026-05-13T14:22:18Z
source_repo:    payments-svc
source_url:     https://github.com/foo-org/payments-svc
source_commit:  a1b2c3d4e5f6...
devkit_version: 1.1.0
skill_version:  7f8a1c2
status:         scanned                  # scanned | scan-failed
# failure-only:
fail_reason:    "no HEAD in source repo"
failed_at:      2026-05-13T14:22:18Z
retry_with:     "wb.rescan payments-svc"
---

devkit_version is read from ${DEVKIT_DIR}/version.json; skill_version from skills/repo-context-scan/.upstream. Both are stamped at scan time and overwritten on every re-run.

User-authored keys survive re-runs. When finalize re-stamps the frontmatter, it merges defensively: devkit-owned keys are overwritten with fresh values, but any extra keys you add (e.g. owner: alice, last_reviewed: 2026-05-01, domain: payments) are preserved. The markdown body below the frontmatter is the skill’s output — to discard your local edits and accept fresh skill output, pass --force to wb.rescan (see below).

source_commit is captured at scan time even on failure, so a stub still records which commit was being scanned when things broke.

The aggregate index

${WB_DIR}/context/README.md is regenerated by the aggregate subcommand on every scan batch. It cross-references project.conf REPOS against the actual context/ directory contents:

# Workbench Context Index

| Repo | Role | Status | Top concepts |
|------|------|--------|--------------|
| [payments-svc](./payments-svc/CONTEXT.md) | service | scanned | Payment, Invoice, Refund |
| [billing-svc](./billing-svc/CONTEXT.md) | service | scan-failed | — |
| [e2e-tests](./e2e-tests/CONTEXT-MAP.md) | automation-tests | scanned | Customer, Order, Cart |
| [shared-lib](./shared-lib/CONTEXT.md) | shared-lib | orphan | Money, CustomerId |
| dropped-svc | (none) | missing | — |

Status values:

Status	Meaning
`scanned`	CONTEXT produced normally (success frontmatter).
`scan-failed`	Stub on disk. Retry with `wb.rescan <repo>`.
`orphan`	Context dir exists but the repo is not in `project.conf`. Either re-add it to `project.conf` or `rm -rf context/<name>`.
`missing`	Repo is in `project.conf` but no context dir exists. Run `wb.rescan <repo>` (or `--all`).

Top concepts are the first three bolded terms in the CONTEXT body — extracted by regex, no LLM. The richer LLM-driven cross-repo synthesis is a deferred feature.

Failure modes and recovery

Scans can fail for many reasons — the source repo has no HEAD, the LLM times out or crashes, the lock is contended, the agent’s output is malformed. In every case, devkit’s policy is the same:

Init / join never abort on a scan failure. The workbench gets created, the source repos get cloned, the manifests get written, the final commit happens.
A stub CONTEXT.md is written with status: scan-failed, fail_reason, failed_at, and retry_with: "wb.rescan <name>". The markdown body explains the situation in plain text.
Recovery is one command.

wb.rescan <repo>              # rescan one repo (auto-wipes the stub)
wb.rescan --all               # rescan everything
wb.rescan --aggregate-only    # refresh context/README.md only
wb.rescan --force <repo>      # discard user-authored prose, re-scan from scratch
wb.rescan --agent devin <repo>  # override engine for this run

wb.rescan self-commits its results with a `chore: rescan context for

` message but **never pushes** — you review the diff and push when ready. If a failure is structural (the repo isn't appropriate for scanning at all), just delete its context dir: `rm -rf context/`. The aggregate will reflect this on the next scan batch and the repo will show with `status: missing` (or you can drop it from `project.conf` entirely). ## Doctor integration `devkit doctor` extends to cover the scan feature in two scopes. **Global** (always checked): | Check | What it verifies | `--fix` | |-------|------------------|---------| | `repo-context-scan vendored` | `${DEVKIT_DIR}/skills/repo-context-scan/SKILL.md` exists. | Runs `scripts/sync-skill.zsh`. | | `engine symlinks intact` | All five symlinks resolve: `~/.claude/skills/`, `~/.devin/skills/`, `~/.agents/skills/`, and the two devkit-internal mirrors. | Re-runs `install.zsh`. | | `wb-context-scan lib` | `${DEVKIT_DIR}/lib/wb-context-scan.zsh` exists and is executable. | `chmod +x` | | `DEVKIT_DEFAULT_ENGINE set` | Env var is present (set by `install.zsh`). WARN-only. | Re-run `install.zsh`, then `source ~/.zprofile`. | | `engine available` | `command -v $DEVKIT_DEFAULT_ENGINE` succeeds. WARN-only. | Install the engine yourself. | **WB-scope** (only when run from inside a stamped workbench): | Check | What it verifies | |-------|------------------| | `context/ exists` | The directory is present. | | `context/ matches project.conf` | No orphan dirs, no missing rows. | | `no stale stubs` | No `CONTEXT.md` has `status: scan-failed`. | | `aggregate README current` | `context/README.md` mtime is at least as new as the newest per-repo CONTEXT file. | | `.context-scan/ worktree clean` | No leftover sandbox directories from a crashed scan. | Per the locked Q12 decision, `--fix` repairs deterministic **global** items only. WB-scope failures print suggested `wb.rescan` commands as advisory output — they never auto-invoke. This avoids surprising the user with unexpected scan churn inside their workbench. ``` $ devkit doctor TOOL LOCAL LATEST STATUS devkit 1.1.0 1.1.0 current ralph 1.0.0 1.0.0 current CHECK STATUS DETAIL repo-context-scan vendored OK sha 7f8a1c2d3e4f engine symlinks intact OK 5/5 resolve wb-context-scan lib OK /Users/.../lib/wb-context-scan.zsh DEVKIT_DEFAULT_ENGINE set OK devin engine available OK devin -> /usr/local/bin/devin Vendored skill: repo-context-scan @ 7f8a1c2d3e4f (synced 2026-05-13T14:00:00Z) ``` ## Engine selection The scan dispatches an LLM agent — Devin by default if `devin` is on PATH, otherwise Claude. The selection is materialized as an env var: ```zsh export DEVKIT_DEFAULT_ENGINE="devin" # written to ~/.zprofile by install.zsh ``` `install.zsh` detects which engine is present and writes the var idempotently. All callers respect it. Override per invocation with `--agent`: ``` wb.rescan --agent claude payments-svc wb.rescan --agent devin --all ``` `init.wb` and `join.wb` accept the same `--agent` flag. See [Commands](/ai-devkit/commands.html#agent-selection) for the full selection precedence. ## Maintainer: re-sync the vendored skill The skill source-of-truth lives in the [`skills`](https://github.com/amit-t/skills) repo. Devkit ships a vendored copy because we don't want every init/join run to clone or fetch upstream at runtime. To refresh the vendored copy after an upstream change: ```zsh export AT_SKILLS_DIR="/path/to/your/at-skills/clone" # default: sibling of devkit zsh ${DEVKIT_DIR}/scripts/sync-skill.zsh ``` The script rsyncs each known skill into `${DEVKIT_DIR}/skills//` with `--delete --exclude='.git' --exclude='.upstream'` (so removed upstream files vanish), then writes a fresh `.upstream` pin file: ```yaml upstream_repo: upstream_sha: 7f8a1c2d3e4f... synced_at: 2026-05-13T14:00:00Z synced_by: ``` Commit the result. This is maintainer-only — end users never need to run it. End users on a fresh install pick up whatever was committed last. If `AT_SKILLS_DIR` is missing or doesn't point at a real clone, the script fails loud with clone instructions rather than silently producing an empty mirror. ## Deferred features The following are explicitly out of scope for v1.1 and tracked as follow-up issues: - **Parallel scan dispatch.** Init currently scans repos sequentially. A `--parallel N` flag and parallel Task dispatch will land in a future release. The placeholder `# v1.1` marker is already in `init.prompt.md` to flag the seam. - **`wb.context.synthesize`.** LLM-driven cross-repo `CONTEXT-MAP.md` generation using per-repo `CONTEXT.md` as input. Today's aggregate is a static regex table; the LLM synthesis variant is queued. - **`wb.context.staleness` doctor check.** Compare the `source_commit` frontmatter value against the current source-repo HEAD and flag drift. Today, drift is invisible until you run `wb.rescan` by hand. - **Env-gated real-LLM CI smoke.** The `tests/MANUAL.md` recipe will eventually be wired into CI behind `WB_SCAN_REAL_SMOKE=1`. For now it is run by hand before each release. - **Per-engine prompt customization.** Devin uses the same generic `/repo-context-scan` invocation as Claude. A Devin-flavored template may land if dispatch quality diverges meaningfully. ## Related - [Commands](/ai-devkit/commands.html#wbrescan): full `wb.rescan` CLI reference. - [Getting started](/ai-devkit/getting-started.html#after-init-explore-context): what to read first after `init.wb` finishes. - [Versioning + upgrades](/ai-devkit/versioning.html): `devkit.upgrade`, `ralph.upgrade`, `wb.upgrade`, `devkit doctor`. - [`ai-workbench`](https://amit-t.github.io/ai-workbench/): the template that ships `scripts/wb-rescan.sh` and the `.gitignore` entry for `.context-scan/`.