android-llm-bridge
View on GitHub
  • Home
  • On-device Agent
Vision · M4+

Pairing alb with an on-device agent. alb 与设备端 Agent 的协作方案。

Design note · 2026-04-21 · open for comments

This page is a design note, not a shipped feature. It describes how we want alb to evolve past the host-side agent loop — by putting a small agent on the device itself, and keeping both sides talking. 这是一份设计笔记,不是已上线的功能。它描述了我们希望 alb 如何从"只在宿主侧跑 agent loop"演进到"宿主 + 设备两端 agent 协作"的路径。

Why host-only is not enough.

Today alb runs entirely on your Linux (or Mac) host. An LLM reasons on the host, pushes tool calls through adb / ssh / UART, and observes the device from the outside. This works — and M2 just made the loop streaming and cache-friendly — but it hits four real walls.

  1. Latency. adb round-trips are 50–200 ms each. For a trace that walks 30 thread states, that is 3–6 seconds of pure transport — longer than the phenomenon you are trying to observe.
  2. Blind spots. Kernel panics, boot hangs, ramdump windows happen before adb exists. ANR triggers fire in milliseconds; by the time adb says "not responding", you have lost the signal.
  3. Cost. Piping long logcats to a frontier model is expensive and noisy. Most of the tokens are boilerplate the host LLM has already seen a thousand times.
  4. Autonomy. When the device is at a customer site with no dev reachable, nothing on the host can help. The problem reproduces, and no one is there.

A small agent on the device closes every one of these gaps — without replacing the host.

Host brain, device reflex.

Two cooperating agents, each doing what it is good at. A shared WebSocket keeps them in lockstep.

HOST — BRAIN DEVICE — REFLEX alb host Reasoning Claude · GPT · Gemma AgentLoop ReAct · tool call · stream Memory sessions · artifacts Device bridge WS server · RPC · events alb-mobile (Android) Gemini Nano AICore · LiteRT · Gemma 4 Probes perfetto · /proc · dumpsys Triage log dedup · stack cluster Host bridge WS client · offline cache RPC · directives events · traces · logs WebSocket · bidirectional adb / ssh / UART fallback · first-boot · recovery Android device hardware kernel · framework · apps · sensors · GPU · thermal
Primary path: WebSocket agent channel. Fallback: existing adb / ssh / UART transports always available.

Who does what.

A rough division of labor. Subject to change as we prototype, but the shape holds.

Task Runs on Why
Threshold alerts (OOM, ANR, thermal)DeviceZero-latency, adb cannot see the transient
logcat stream dedupe, stack clusterDevice (small LLM)Saves host tokens, filters noise at source
perfetto / systrace captureDeviceDirect /sys access, no adb chunk overhead
Complex reasoning, code edits, user dialogueHost (frontier LLM)Capability + cross-file context
Cross-session memory, artifact archiveHostDevice storage is limited and volatile
On-site offline debug + later syncDevice (autonomy)Customer field, no dev reachable
Workflow orchestration (multi-step plan)HostPlanning + memory is host's strong suit
First-response "what just happened?"DeviceSub-second triage before the event ages out

The channel.

One WebSocket, six message types. alb host already speaks WebSocket (/chat/ws), so we extend the same FastAPI app with a /device/channel endpoint and let devices connect as clients. Authentication reuses alb's config.toml token.

Message types

  • HEARTBEAT — device → host. Liveness + battery + thermal + freeform health field.
  • EVENT — device → host, unsolicited. OOM, ANR trigger, crash, thermal warn, custom threshold. Includes a short device-side triage summary.
  • REQUEST — device → host. "I think this needs a bigger brain, here is what I saw." Escalates to host LLM.
  • RPC_CALL — host → device. "Run perfetto for 5 s on PID 1234", "dump heap of app X", "capture method profile". Returns structured data.
  • RPC_REPLY — device → host. Result of an RPC_CALL, with {ok, data, error, artifacts} shape — same as the rest of alb.
  • LOG_STREAM — device → host. Pre-filtered, pre-clustered logcat. Replaces the adb logcat pipe with something that has already paid its tokens for filtering.

Four phases to ship.

We get value at every phase — we do not need phase 4 for phase 1 to be useful.

1

Skeleton

alb-mobile Android app: WebSocket client + logcat streamer + system metrics. No LLM yet. The win: replace adb logcat with a filtered, structured stream.

2

Local brain

Integrate AICore / Gemini Nano on Android 14+, or LiteRT + Gemma 4. Device does first-pass triage: summarize, cluster, decide whether to escalate.

3

Bidirectional RPC

Host can dispatch device-local tools: perfetto traces, heap dumps, custom probes. No adb round-trip. Tool specs published back to host MCP automatically.

4

Offline autonomy

Device runs standalone — captures reproducers, diagnoses locally, packages artifacts. Syncs to host when network returns. Field debugging without anyone on the other end.

Aligned with where Android is going.

Google's public direction already points here. AICore ships a system-level LLM runtime on Pixel 8+ and is expanding; Gemini Nano is the first model delivered through it. LiteRT (formerly TFLite) gives anyone a path to ship their own quantized model on-device. And Google's own "Gemma 4 for Android" push calls out the on-device agent as a flagship use case.

alb does not compete with any of that — it composes. alb-mobile will use whatever the device provides: AICore if available, LiteRT if not, Gemma 4 quantized as a fallback. The interesting work is not the model — it's the host ↔ device protocol and the division of labor. That's what we own.

Open questions.

Where we do not have a final answer. Help welcome.

How does alb-mobile ship — sideload or Play Store?

Developer builds will be sideload (debug APK from Releases). A Play Store track is on the table for field/autonomy use but needs review and a clear safety story.

Is this Android-only?

Yes for v1. The WebSocket channel is generic; porting to HarmonyOS or embedded Linux boards (buildroot + a Gemma runner) is feasible but not on the M4 plan.

What about non-rooted devices?

The plan assumes unrooted. perfetto, dumpsys, logcat are all available to app-level context via ADB, and AICore / LiteRT run as normal Android libraries. Some probes require adb shell escalation — those stay on the host side.

Security and permissions?

WebSocket connection uses a shared token (dev environment) or mTLS (production). Device never initiates destructive actions without a host-approved RPC_CALL. All device-local LLM inputs are logged for audit.

What about iOS?

Not in scope. iOS's debugging model is fundamentally different, and Apple does not expose anything comparable to AICore or adb-equivalent at this layer. A reduced host-side alb for iOS is conceivable — not on this roadmap.

Want to shape this?

Open an issue with the label on-device-agent. This is a design in motion — the earlier we hear, the more we can fold in.

Open an issue Model selection →模型选型 → ← Back to home