How we ship — android-llm-bridge dev team

A review team that ships in the open. 把评审团队入仓，过程公开可追。

alb's .claude/ tree carries 7 review agents, 4 slash commands, and a 5-file knowledge base. Every non-trivial commit goes through the team, every accepted suggestion writes a regression note, every reversed decision lands a fresh ADR. Below: the workflow, the numbers, and what the team caught that humans missed. alb 仓库的 .claude/ 目录里入仓了 7 个评审 agent、4 条 slash 命令、5 份共享知识库。每条非 trivial 改动都过这套团队，采纳的建议会留下回归记录，反转的决策会立新 ADR。下面是工作流、实战数字，以及团队抓到的人没看出来的问题。

reviews shipped 已完成评审

suggestions raised 提出建议

84%

acceptance rate 采纳率

lessons learned 沉淀教训

ADRs recorded 立档决策

debts closed 已关技术债

7 specialised reviewers, 5 read-only. 7 个专项评审 · 5 个只读。

Each agent owns one slice of quality. Five agents have no Write tool at all — their findings come back as a report; the main conversation decides what to land. Two agents (perf / ui-fluency / visual) can write only to .claude/reports/<agent>-<ts>.md. No agent ever edits product code. 每个 agent 负责一个质量维度。5 个完全只读 —— 反馈以报告形式回到主对话，由主对话决定是否落地。2 个能写但只能写到 .claude/reports/<agent>-<ts>.md。没有任何 agent 会改产品代码。

code reviewer 代码评审

code-reviewer

Resource lifetime, error propagation, race conditions, test coverage, API contracts. Five dimensions, two findings each max. 资源生命周期 · 错误传播 · 并发争用 · 测试覆盖 · API 设计。五维度、每维度最多两条。

read-only

architecture reviewer 架构评审

architecture-reviewer

Module boundaries, dependency direction, debt accrual, ADR consistency. Required to flag every reversal of a prior ADR's alternatives. 模块边界 · 依赖方向 · 债务积累 · ADR 一致性。必须标记任何反转既有 ADR 备选方案的改动。

read-only

performance auditor 性能审计

performance-auditor

Bundle size, render cost, memory footprint, network chatter, long-task wallclock. Files reports per run. 包体积 · 渲染开销 · 内存 · 网络频次 · 长任务时长。每次运行落一份报告。

reports/

ui fluency auditor 界面流畅度

ui-fluency-auditor

Animation cadence, layout shift, three-state UI (loading / error / data), keyboard nav, a11y. Captures screenshots when a smell shows up. 动画节奏 · CLS · 三态 UI（加载 / 出错 / 有数据）· 键盘导航 · a11y。看到可疑迹象会截图存证。

reports/

mockup baseline checker 基线对照

mockup-baseline-checker

Compares React render output against the source HTML mockup (docs/webui-preview-v*.html). Catches drift before it ships. 对照 React 渲染结果与源 HTML mockup （docs/webui-preview-v*.html）。在偏离 ship 前抓住。

read-only

visual audit runner 视觉审验

visual-audit-runner

Wraps the sky-skills three-gate Playwright + WCAG scripts. Files per-route screenshots into the timestamped reports folder. 包装 sky-skills 三道闸 Playwright + WCAG 脚本。按路由出截图，归档到时间戳报告目录。

reports/

security & neutrality 合规检查

security-and-neutrality-auditor

OWASP basics, XSS surfaces, credential exposure, plus the project-specific neutrality rule (no employer brand names leaking into open-source code). OWASP 基本面 · XSS 表面 · 凭证泄漏，加上项目特定的中立性规则（开源仓库不允许出现雇主品牌名）。

read-only

main conversation 主对话

(you, the operator)

Spawns reviewers, integrates feedback, asks the user to accept or reject each suggestion, writes accepted ones into knowledge files, and only commits after the user signs off the message. 调起评审者，整合反馈，逐条问用户采纳或驳回，把采纳的写入知识库，commit message 用户审过后才执行 commit。

writes knowledge

Six steps from sketch to ship. 从草图到 ship · 六步走完。

Write code, run the three gates 写代码，过三道闸 645+ pytest pass, sensitive-words 0, offline-purity clean. Tests stay green before any reviewer is spawned. 645+ pytest 全过 · 敏感词 0 · offline-purity 干净。任何评审者被调起之前，测试必须是绿的。

Stage and spawn reviewers in parallel stage 改动，并行调起评审者 Code-review-only changes get code-reviewer + security; new modules also get architecture-reviewer; UI changes go through ui-check (mockup → fluency → visual, sequential). 只改代码 → code-reviewer + security 并行；引入新模块 → 加 architecture-reviewer；UI 改动走 ui-check（mockup → 流畅度 → 视觉，串行）。

Integrate findings, ask the user 整合反馈，逐条问用户 Main conversation summarises every finding, recommends accept / reject / partial, and waits for the user to confirm. No silent decisions. 主对话汇总每条建议，给出采纳 / 驳回 / 部分采纳推荐，等用户确认。不允许默默决定。

Apply changes, end-to-end verify 落地改动，端到端验证 Re-run gates. For new data paths or deployment surfaces, run a real browser hit (lesson L-017) — type-checking and unit tests do not catch wiring bugs. 再过三闸。新数据路径或新部署表面要跑真浏览器（教训 L-017） —— 类型检查和单元测试抓不住接线 bug。

Write knowledge before commit commit 前写知识库 Accepted suggestions become diff comments; rejected ones become review-feedback.md entries. New rules become lessons. Reversed decisions become new ADRs. 采纳的建议变成 diff 注释；驳回的进 review-feedback.md。形成新规律 → lessons；反转决策 → 新 ADR。

User reviews the commit message, then push 用户审 commit message → push Two confirmations, two checkpoints: commit message before git commit, push readiness before git push. 两道确认 · 两道关卡：git commit 前确认 message， git push 前再次确认。

Eight reviews across the F-dock milestone. F 档里程碑 · 八次评审。

F-dock connected the live token-throughput stream end-to-end — backend sampler, dual WS, reducer, KPI tile. Every step had a review, every review left a paper trail in .claude/knowledge/review-feedback.md. F 档接通了端到端的 token 吞吐流：后端采样、双 WS、reducer、 KPI 卡片。每步都有评审，每次评审都在 .claude/knowledge/review-feedback.md 留痕。

step步骤	topic主题	suggestions建议数	acceptance采纳率	key output关键产出
F.1	TokenSampler + tps_sample bus TokenSampler + tps_sample 总线	18	65%	ADR-021, L-013 bus event split — caught a 2–3× precision bug that would have shipped silently. ADR-021、L-013 总线事件分类 —— 抓到一个会静默 ship 的 2–3 倍精度错。
F.3	`GET /metrics/summary`	10	80%	Defensive type guards, DEBT-008 cache stub. 防御性类型守卫，DEBT-008 缓存条目登记。
F.4	`GET /tools`	8	100%	L-014 — MCP tool docstrings are public API. L-014 —— MCP 工具 docstring 是公开 API。
F.5	Dual WS instances 双 WS 实例	12	92%	ADR-022 reverses ADR-018 alt c; L-015 meta-rule — reversed decisions need a fresh ADR, period. ADR-022 反转 ADR-018 备选 c；L-015 元规则 —— 反转决策必须立新 ADR。
F.6	tps_sample reducer tps_sample reducer	15	87%	L-016 view-aware scaling, plus L-017 — end-to-end testing discovered a 4-month-old projection bug that unit tests missed. L-016 view-aware 协议下 scaling 同层；L-017 触发 —— 端到端测试发现单元测试漏掉的、4 个月前就埋下的投影 bug。
F.7	KpiStrip + dual buffer KpiStrip + 双缓冲	9	78%	Closed DEBT-003/004/011, escalated DEBT-008 low → mid (the first stable consumer changed the calculus). 关闭 DEBT-003/004/011；DEBT-008 升级 low → mid（第一个稳定消费者改变了优先级）。
DEBT-014	alb-api SPA fallback alb-api SPA fallback	10	90%	Subclassed StaticFiles, kept asset 404s honest. 子类化 StaticFiles，保留 asset 真 404 不被改写成白页。
DEBT-015	GH Pages SPA fallback GH Pages SPA fallback	12	92%	ADR-023 cross-surface heterogeneity, L-018 inline-recovery rule. Prod verify exposed an unrelated 6-day-old vite-base bug (DEBT-016). ADR-023 跨部署 surface 异构；L-018 静态托管 recovery 必须 inline 同步。Prod 验证还暴露一个 6 天前就有的 vite base bug（DEBT-016）。

step步骤

topic主题

suggestions建议数

acceptance采纳率

key output关键产出

F.1

TokenSampler + tps_sample bus TokenSampler + tps_sample 总线

65%

ADR-021, L-013 bus event split — caught a 2–3× precision bug that would have shipped silently. ADR-021、L-013 总线事件分类 —— 抓到一个会静默 ship 的 2–3 倍精度错。

F.3

GET /metrics/summary

80%

Defensive type guards, DEBT-008 cache stub. 防御性类型守卫，DEBT-008 缓存条目登记。

F.4

GET /tools

100%

L-014 — MCP tool docstrings are public API. L-014 —— MCP 工具 docstring 是公开 API。

F.5

Dual WS instances 双 WS 实例

92%

ADR-022 reverses ADR-018 alt c; L-015 meta-rule — reversed decisions need a fresh ADR, period. ADR-022 反转 ADR-018 备选 c；L-015 元规则 —— 反转决策必须立新 ADR。

F.6

tps_sample reducer tps_sample reducer

87%

L-016 view-aware scaling, plus L-017 — end-to-end testing discovered a 4-month-old projection bug that unit tests missed. L-016 view-aware 协议下 scaling 同层；L-017 触发 —— 端到端测试发现单元测试漏掉的、4 个月前就埋下的投影 bug。

F.7

KpiStrip + dual buffer KpiStrip + 双缓冲

78%

Closed DEBT-003/004/011, escalated DEBT-008 low → mid (the first stable consumer changed the calculus). 关闭 DEBT-003/004/011；DEBT-008 升级 low → mid（第一个稳定消费者改变了优先级）。

DEBT-014

alb-api SPA fallback alb-api SPA fallback

90%

Subclassed StaticFiles, kept asset 404s honest. 子类化 StaticFiles，保留 asset 真 404 不被改写成白页。

DEBT-015

GH Pages SPA fallback GH Pages SPA fallback

92%

ADR-023 cross-surface heterogeneity, L-018 inline-recovery rule. Prod verify exposed an unrelated 6-day-old vite-base bug (DEBT-016). ADR-023 跨部署 surface 异构；L-018 静态托管 recovery 必须 inline 同步。Prod 验证还暴露一个 6 天前就有的 vite base bug（DEBT-016）。

Highlight: F.5 reviewer caught a doc debt no human flagged 高光：F.5 评审者抓到了人没看出来的文档债

F.5 implemented "alt c" of ADR-018 (separate WS per kind) — but ADR-018 had explicitly rejected that alternative. The architecture-reviewer dug through the historical ADR, noticed the reversal, and demanded a new ADR rather than letting the contradiction ship as a sketch comment. ADR-022 was born; meta-rule L-015 was promoted: reversed decisions never sneak through commit messages again. F.5 实施了 ADR-018 的"备选 c"（不同 kind 分别开 WS）—— 而 ADR-018 明确否决过这个备选。architecture-reviewer 翻了历史 ADR，识别出反转，要求立新 ADR 而不是把矛盾留在 sketch 注释里。于是 ADR-022 诞生；元规则 L-015 升级：反转决策再也不能从 commit message 偷偷过去了。

Highlight: F.6 end-to-end run found a 4-month-old wiring bug 高光：F.6 端到端跑出了 4 个月前的接线 bug

Unit tests, type-check, sensitive-word scan, offline-purity check — all green. Then the F.6 closing run booted alb-api, hit a real chat session, and watched the LiveSession spark stay flat at zero. Trace: audit_route._project() had been silently dropping the data field on every event since C.1 (commit 36537d5, 4 months prior). The reducer's defensive data ?? {} fallback masked it; only end-to-end data flow exposed it. 单元测试、类型检查、敏感词、offline-purity 全绿。然后 F.6 收官阶段起 alb-api 跑真 chat session，发现 LiveSession spark 贴底为零。追根溯源：audit_route._project() 从 C.1（commit 36537d5，4 个月前）开始就一直在静默丢弃 data 字段。 reducer 的防御性 data ?? {} 兜底把这条 bug 盖住了 —— 只有端到端跑真数据才能挖出来。

Lesson L-017: deployment-layer wiring (SPA fallback, projection, CDN cache rules) is a path too. If you mount it, hit it with a real browser before you ship. 教训 L-017：部署层兜底（SPA fallback、投影、CDN 缓存规则）也是数据路径。挂上之后必须用真浏览器打一次再 ship。

Five files agents read before reviewing. 评审前必读的五份文件。

Only the main conversation writes here — agents read but never touch. This keeps the team's shared memory clean from any single run's hallucination. 只有主对话能写 —— agent 只读不改。这样团队记忆不会被任何单次运行的幻觉污染。

architecture.md

Module map & invariants 模块地图与不变量

Where things live, who depends on whom, and the cross-cutting rules that must hold (e.g. SPA route paths cannot contain .). 模块在哪、谁依赖谁、必须守住的横切规则（比如 SPA 路由路径段不能含 .）。

writer: main conversation 写者：主对话

decisions.md

ADRs (4 active) ADR（已立 4 条）

Each major decision: context, decision, alternatives, trade-off, when to revisit. ADR-022 reverses ADR-018; ADR-023 documents cross-surface heterogeneity. 每条重大决策：上下文、决定、备选、trade-off、何时重审。 ADR-022 反转 ADR-018；ADR-023 记录跨部署 surface 的异构。

writer: main conversation 写者：主对话

debts.md

Tracked technical debt 登记的技术债

Each debt: severity, where it lives, why we deferred, what triggers repayment. Six closed, one upgraded, three new in this cycle. 每条债：严重度、位置、为什么推迟、什么条件触发还。本周期 6 关 · 1 升 · 3 新。

writer: main conversation 写者：主对话

lessons.md

Lessons (L-013…L-018) 教训 (L-013–L-018)

Negative-pattern catalogue with root cause, rule, and "where agents apply this". L-017 (end-to-end gates real bugs) has its own positive case file. 反模式清单，记录根因、规则、agent 在哪应用。L-017 （端到端验证抓真 bug）配有专属正面案例。

writer: main conversation 写者：主对话

review-feedback.md

Per-review log: what was rejected and why 逐场评审日志：驳回了什么、理由

Mandatory reading for every reviewer. Suggestions rejected three times in a row are signals to tune the agent prompt; suggestions accepted three times graduate into lessons. 每位评审者必读。同类建议被驳回三次是信号 —— 该调 agent prompt；被采纳三次则升级为 lesson。

writer: main conversation · readers: every agent 写者：主对话 · 读者：每个 agent

Any clone gets the team for free. 任意 clone · 团队即得即用。

Agent definitions live under .claude/agents/; slash commands under .claude/commands/; the knowledge base under .claude/knowledge/. They're all in-tree. Clone the repo, open Claude Code, and the team is there. Type /review on staged changes and the code-reviewer + security-and-neutrality agents run in parallel; /preflight runs all seven before a milestone. Agent 定义在 .claude/agents/；slash 命令在 .claude/commands/；知识库在 .claude/knowledge/。全部入仓。clone 完打开 Claude Code，团队就在。在 stage 的改动上敲 /review， code-reviewer 和 security-and-neutrality 并行跑； /preflight 在里程碑前跑全部七个。

No CI dependency. No third-party service. No proprietary plugin. The whole workflow is plain Markdown plus a few Python and Node scripts. Fork it, adapt the agent prompts to your domain, and ship. 不依赖 CI。不依赖第三方服务。不依赖私有插件。整套工作流就是 Markdown 加几个 Python 和 Node 脚本。fork 一份、把 agent prompt 调成你的领域，就能 ship。