Overview
Every workbench registers one or more source repos under repos/<name>/.
Agents working in the workbench do better when they can read a short, curated
summary of each repo — what it does, what types and terms matter, what
decisions are baked in — instead of grepping the source. The
repo-context-scan feature builds that summary automatically.
It runs in two places, with no user action required:
init.wb— after every registered repo has been cloned, devkit scans each one and writescontext/<name>/CONTEXT.mdinto the workbench.join.wb— when a joiner registers extra repos, the same scan fires for the new repos only.
For manual refresh, drift recovery, or rerun-on-failure, use
wb.rescan.
What gets scanned
For each registered source repo, the vendored repo-context-scan skill
performs an LLM-driven semantic read of the working tree, surface-level
docs, and git history. It produces a single CONTEXT.md describing the
repo’s domain: core types, entities, vocabulary, and clearly-deliberate
decisions worth seeding as ADRs.
For multi-context repos (e.g. a monorepo holding both an API and a UI),
the skill emits a CONTEXT-MAP.md at the root plus one CONTEXT.md per
sub-context (api/CONTEXT.md, ui/CONTEXT.md, etc.). Devkit harvests
whichever shape the skill produces.
The skill itself lives upstream in the skills
repo. Devkit vendors a pinned copy at
${DEVKIT_DIR}/skills/repo-context-scan/ with an .upstream file that
records the source SHA. See
Maintainer: re-sync the vendored skill.
Where outputs land
All scan outputs are wb-owned. Source repos under repos/<name>/ are
never mutated — devkit reads them through a throwaway git worktree
sandbox.
${WB_DIR}/
context/
README.md # aggregate index, generated by lib
payments-svc/
CONTEXT.md # devkit frontmatter + skill body
docs/adr/0001-*.md # seeded ADRs for clear decisions
billing-svc/
CONTEXT.md # may be a stub if the scan failed
e2e-tests/
CONTEXT-MAP.md # multi-context repo variant
api/CONTEXT.md
ui/CONTEXT.md
.context-scan/ # worktree staging — gitignored, transient
repos/ # source repo clones — never written to
payments-svc/
billing-svc/
e2e-tests/
.context-scan/ is gitignored by the ai-workbench template. The
aggregate context/README.md is regenerated on every scan batch.
Worktree sandbox model
The decoupling between “read from source” and “write to wb” is what
keeps source repos pristine. The wrapper library
${DEVKIT_DIR}/lib/wb-context-scan.zsh exposes three subcommands that
fence the work:
| Subcommand | Responsibility |
|---|---|
setup <WB_DIR> <name> |
Defensive prune, mkdir, git worktree add --detach from repos/<name> into .context-scan/<name>, pre-wipe any prior CONTEXT files inside the worktree, acquire the lock, print SCAN_DIR=<path>. |
finalize <WB_DIR> <name> [--fail-reason "..."] |
If the worktree produced CONTEXT.md / CONTEXT-MAP.md / docs/adr, harvest into context/<name>/ and prepend devkit frontmatter. Otherwise (or when --fail-reason is given) write a failure stub. Remove the worktree. Release the lock. Always idempotent. |
aggregate <WB_DIR> |
Walk context/*/CONTEXT.md and CONTEXT-MAP.md, cross-reference project.conf REPOS, regenerate context/README.md. |
Between setup and finalize an external agent (sub-agent dispatched
from Claude, inline invocation from Devin, or subprocess from
wb.rescan) executes /repo-context-scan against SCAN_DIR. The lib
itself is pure deterministic shell with no LLM dependency.
Concurrency is gated by a mkdir-as-lock on .context-scan/.lock (chosen
over flock because it survives the process boundary between setup
and finalize and works portably on macOS without extra dependencies).
A second concurrent run fails fast with lock contended.
CONTEXT.md frontmatter
Every CONTEXT.md (success or stub) carries the same devkit-stamped
frontmatter block. Schema:
---
generated_by: ai-devkit/repo-context-scan
generated_at: 2026-05-13T14:22:18Z
source_repo: payments-svc
source_url: https://github.com/foo-org/payments-svc
source_commit: a1b2c3d4e5f6...
devkit_version: 1.1.0
skill_version: 7f8a1c2
status: scanned # scanned | scan-failed
# failure-only:
fail_reason: "no HEAD in source repo"
failed_at: 2026-05-13T14:22:18Z
retry_with: "wb.rescan payments-svc"
---
devkit_version is read from ${DEVKIT_DIR}/version.json;
skill_version from skills/repo-context-scan/.upstream. Both are
stamped at scan time and overwritten on every re-run.
User-authored keys survive re-runs. When finalize re-stamps the
frontmatter, it merges defensively: devkit-owned keys are overwritten
with fresh values, but any extra keys you add (e.g. owner: alice,
last_reviewed: 2026-05-01, domain: payments) are preserved. The
markdown body below the frontmatter is the skill’s output — to discard
your local edits and accept fresh skill output, pass --force to
wb.rescan (see below).
source_commit is captured at scan time even on failure, so a stub
still records which commit was being scanned when things broke.
The aggregate index
${WB_DIR}/context/README.md is regenerated by the aggregate
subcommand on every scan batch. It cross-references project.conf
REPOS against the actual context/ directory contents:
# Workbench Context Index
| Repo | Role | Status | Top concepts |
|------|------|--------|--------------|
| [payments-svc](./payments-svc/CONTEXT.md) | service | scanned | Payment, Invoice, Refund |
| [billing-svc](./billing-svc/CONTEXT.md) | service | scan-failed | — |
| [e2e-tests](./e2e-tests/CONTEXT-MAP.md) | automation-tests | scanned | Customer, Order, Cart |
| [shared-lib](./shared-lib/CONTEXT.md) | shared-lib | orphan | Money, CustomerId |
| dropped-svc | (none) | missing | — |
Status values:
| Status | Meaning |
|---|---|
scanned |
CONTEXT produced normally (success frontmatter). |
scan-failed |
Stub on disk. Retry with wb.rescan <repo>. |
orphan |
Context dir exists but the repo is not in project.conf. Either re-add it to project.conf or rm -rf context/<name>. |
missing |
Repo is in project.conf but no context dir exists. Run wb.rescan <repo> (or --all). |
Top concepts are the first three bolded terms in the CONTEXT body — extracted by regex, no LLM. The richer LLM-driven cross-repo synthesis is a deferred feature.
Failure modes and recovery
Scans can fail for many reasons — the source repo has no HEAD, the LLM times out or crashes, the lock is contended, the agent’s output is malformed. In every case, devkit’s policy is the same:
- Init / join never abort on a scan failure. The workbench gets created, the source repos get cloned, the manifests get written, the final commit happens.
- A stub
CONTEXT.mdis written withstatus: scan-failed,fail_reason,failed_at, andretry_with: "wb.rescan <name>". The markdown body explains the situation in plain text. - Recovery is one command.
wb.rescan <repo> # rescan one repo (auto-wipes the stub)
wb.rescan --all # rescan everything
wb.rescan --aggregate-only # refresh context/README.md only
wb.rescan --force <repo> # discard user-authored prose, re-scan from scratch
wb.rescan --agent devin <repo> # override engine for this run
wb.rescan self-commits its results with a `chore: rescan context for
- ` message but **never pushes** — you review the diff and push when
ready.
If a failure is structural (the repo isn't appropriate for scanning at
all), just delete its context dir: `rm -rf context/