Remote-Controlling AI Coding Agents From Your Phone: What's Real, What It Costs, and Where It's Going
For fifty years, controlling a computer meant being at the computer. Hands on the keyboard, eyes on the terminal, one person driving one process.
Last night I broke that without thinking about it. I kicked off work across two AI coding agents on my laptop, closed it, and finished steering both of them from my phone on the couch. Same files, same context, no setup ceremony. The laptop was a control tower I happened to walk away from.
This is a guide to doing that on purpose. What's real today, what it costs, and the line between "neat trick" and "actually how you'll work."
The shift: operator to conductor
An operator executes. Hands on keys, one thing at a time. A conductor doesn't play an instrument during the performance. They cue entrances and keep twelve players coherent.
Remote control moves you up that ladder. The work runs somewhere else, and you move to wherever you are. The keyboard becomes a control tower you can walk away from instead of a cockpit you sit in.
That's the actual architecture, not just a figure of speech, and the rest of this walks through how the pieces fit.
Step 1: remote-control one session
Start small. In any active Claude Code session, run:
/remote-controlIt also has a shorthand, , if you'd rather type less. You get a QR code. Scan it with the Claude mobile app and the session opens on your phone, with the same files, same MCP servers, same project context. You're not looking at a stripped-down view. You're driving the exact session running on your machine.
I tested this live before I trusted it. It works, and the shift in how it feels is bigger than the feature sounds. The work stops being tied to the chair.
Worth knowing before you start: you'll need Claude Code 2.1.51 or later, a Pro or Max plan, and a claude.ai login. API keys won't work for this, and on Team or Enterprise an admin flips the Remote Control toggle on first. You also have to trust the workspace once.
Step 2: run several, and switch between them
One agent on your phone is convenient. The unlock is several at once.
Run in each session and the app lists them. Each interactive session registers one remote session, so a few terminals means a few entries in the list. (If you want many from a single process, runs a server that hosts several at once.) You jump between agents like a manager doing rounds instead of babysitting one terminal. I verified it with two running in parallel. Both showed up. Both controllable.
Underneath, the same thing works headlessly, which is what makes it scriptable. You can kick off a task non-interactively and capture its session so you can come back to that specific agent later:
# fire off a task without sitting in the session
claude -p "Refactor the auth module and add tests" --allowedTools "Read,Edit,Bash"
# capture a session id, then resume THAT agent on its own thread
session_id=$(claude -p "Start the portal refactor" --output-format json | jq -r '.session_id')
claude -p "Now wire up the tests" --resume "$session_id"Once a task returns structured output, it becomes a building block for everything else:
claude -p "Summarize what changed" --output-format json | jq -r '.result'That is the seam where a phone tap on the couch and a CI job on a push become the same primitive.
Step 3: know what it costs, and what it doesn't
Two things people get wrong about the money.
The remote control itself is free of tokens. is a relay. It mirrors the session to your phone. Pairing and viewing aren't model calls, so they don't burn anything.
The work costs normal tokens. Same session, same meter, whether you type on the desktop or the phone. It's one session's usage, not doubled. Driving from your phone does not bill you twice.
And you can toggle desktop and phone freely. It's one session with two control surfaces, not two copies. Type on the couch, finish at the desk, pick the phone back up. Same context the whole way through.
Step 4: local vs cloud, the distinction that actually matters
keeps the session local. It runs on your Mac, and the phone is just a remote control. It's more resilient than you'd expect: if your laptop sleeps or your network blips, the session reconnects automatically when the machine comes back. What actually ends it is quitting the process, a network outage longer than about ten minutes while the machine is awake, or starting an ultraplan session. The real constraint is that the process has to keep running, so the machine can't be fully shut down.
For work that should survive a closed laptop, you want cloud sessions instead. Connect a repo and the tasks run on managed cloud infrastructure, in parallel, each in its own isolated sandbox. There's even a flag to pull a cloud run's files back down to your local environment, so the two modes are ends of one continuum, not separate worlds.
| Remote Control | Cloud session | |
|---|---|---|
| Runs on | your machine | the cloud |
| Machine can be off | no | yes |
| Best for | grabbing a session on the go | a fleet you run from anywhere |
| Setup | one command | connect a repo |
The short version: is for grabbing one session while you're out. Cloud is the substrate for actually conducting a fleet.
Step 5: where this goes
Stack those pieces and the picture stops being "control one agent" and becomes "conduct several."
You open your tracker on your phone, assign a handful of issues, agents pick them up in isolated worktrees, and you merge the pull requests from a coffee line. That's not science fiction. Every piece in this post already exists. What's left is the plumbing that makes a phone-launched agent as capable as your local one, and the coordination layer that keeps a fleet from tripping over itself.
That coordination is the honest constraint. This is leverage, not magic, and the friction just moves:
- Parallel agents have a clean ceiling around three to five before merge integration becomes the bottleneck. The limit isn't whether they can run. It's whether you can land their branches without conflicts.
- Cloud agents start context-blind unless you ship your config, conventions, and memory to them.
- More agents means more cost and rate limits. Ten agents is ten meters running.
- Async means nobody watched it. The loop only closes if verification is automated. "Done" you didn't verify isn't done.
More agents was never the hard part. The real leverage is the substrate that lets coordination keep up with parallelism. Get that right and three chaotic agents become eight coherent ones, so build the plumbing before you grow the fleet.
What I'd tell you if you're starting
- Run on one session today. Feel the work come off the chair before you think about scale.
- Local is for going out, cloud is for going big. Pick by whether your laptop needs to be on.
- The remote control is free. The work is the work. One session, one meter, two surfaces.
- Coordination is the ceiling, not raw capability. Solve it like managing a team that can't see each other's screens.
- You're aiming to conduct, not operate. The goal isn't typing faster; it's directing work you no longer hold in your head.
Start by driving one agent from your phone. That's the whole shift, scaled down to where you can feel it.
Resources
Official documentation, each verified live:
- Remote Control: continue local sessions from any device
- Run Claude Code programmatically (headless, the Agent SDK)
- Claude Code on the web (cloud sandboxes, parallel tasks)
- Claude Code in CI (GitHub Actions)
- Linear: building agents you can assign work to
- The control-plane idea this borrows from (Kubernetes components)