Docker Sandboxes: The New Standard for Running AI Agents Safely
The way developers write code has vastly changed. AI coding agents with tools like Claude Code, Gemini CLI, GitHub Copilot CLI, and Codex are no longer novelties. They're production tools. Over a quarter of all production code is now AI-authored, and developers using these agents merge roughly 60% more pull requests than those who don't. But this productivity comes with a problem that until recently had no clean answer: how do you let an agent run autonomously without putting your machine, your credentials, and your data at risk?
Docker Sandboxes, launched in experimental preview in late 2025 and upgraded to microVM-based isolation in early 2026, is Docker's answer to that question. This article explains what they are, how they work, why they're built the way they are, and what their limitations reveal about the broader challenge of securing AI agents.
The Core Problem: Why Sandboxing Agents Is Harder Than It Looks
To understand Docker Sandboxes, you first need to understand why agent isolation is genuinely a hard problem, not just a matter of wrapping a process in a container.
A coding agent isn't a web server or a batch job. It's an autonomous program that, by design, does things you didn't explicitly script. It installs packages. It executes shell commands. It builds and runs Docker containers. Crucially, to do useful work, it needs meaningful access to your project's codebase, which often sits on the same filesystem as your SSH keys, environment files, and AWS credentials.
Before Docker Sandboxes, developers faced three unsatisfying options:
OS-level sandboxing (macOS's sandbox-exec, Linux seccomp) confines only the agent process itself. Every time it needs to install a package or access a dependency, it hits the host system and triggers a permission prompt. This creates what Docker aptly calls "approval fatigue": repeated interruptions that either slow everything down or train developers to click through without looking.
Containers seem like the natural answer. Docker containers already isolate processes using Linux namespaces and cgroups. But there's a structural problem: coding agents need to run Docker themselves — to build images, spin up services, test containerized code. Giving a container access to Docker means mounting the host Docker socket (
/var/run/docker.sock), which is effectively a root-level backdoor to the entire host system. The isolation you set up immediately collapses.Full virtual machines work, but they're slow to start, tedious to configure, and difficult to reuse across projects. They also don't integrate naturally into the
cd my-project && run-agentworkflow that developers expect.
Docker Sandboxes were built specifically to close these gaps.
What Docker Sandboxes Actually Are
A Docker Sandbox is a disposable, isolated execution environment for AI coding agents. Each sandbox is a lightweight virtual machine, a microVM, running its own Linux kernel, its own Docker Engine, its own network stack, and a view into your project workspace. The agent lives entirely inside this VM. Your host machine is untouched.
The key facts:
Each sandbox has its own Docker daemon. When the agent runs
docker buildordocker compose up, those commands execute against the sandbox's private Docker Engine, not yours. The agent has no path to your host daemon.Only your workspace is shared. The directory you're working in is mounted into the VM with read-write access. Everything else, the rest of your filesystem, your credentials, your other projects, is invisible to the agent.
Network access is proxied and deny-by-default. All HTTP/HTTPS traffic goes through a host-side proxy. Non-HTTP protocols are blocked entirely. You can define allow and deny lists.
Credentials never enter the VM. API keys are injected into outbound HTTP headers by the host-side proxy. The raw values never touch the microVM.
Sandboxes are disposable. Run
sbx rmand the VM and everything inside it is gone.
The quick-start experience is intentionally minimal:
sbx run claude-code
That command creates a new microVM, mounts your current directory, and launches Claude Code inside it with --dangerously-skip-permissions enabled by default. Because of the sandbox boundary, the "dangerous" mode is actually safe, the agent can do whatever it wants inside the VM, and none of it reaches your host.
Developer Experience: The Commands You Actually Use
Docker Sandboxes ships with a dedicated CLI, sbx, separate from the main docker command.
Create and run a sandbox:
sbx run claude-code # Launch Claude Code in a new sandbox
sbx run gemini # Launch Gemini CLI
sbx run codex # Launch Codex
Pass arguments to the agent:
sbx run claude-code -- --model claude-opus-4-5
List active sandboxes:
sbx ls
Destroy a sandbox:
sbx rm <sandbox-name>
On Linux specifically, the install path differs from Docker Desktop:
curl -fsSL https://get.docker.com | sudo REPO_ONLY=1 sh
sudo apt-get install docker-sbx
sudo usermod -aG kvm $USER
sbx login
---
The sandbox automatically mounts your current working directory. Filesystem changes the agent makes inside the sandbox appear on your host in real time (with the default direct mount). Installed packages, Docker images, and other VM state persist across restarts — the environment survives a reboot, it just can't escape it.
Conclusion
Docker Sandboxes represents a genuine architectural advance for developer-side agent isolation. By running each agent in a microVM with its own kernel, its own Docker Engine, and proxied network access, it closes the three main gaps that made earlier approaches unworkable: OS-level sandboxing's constant interruptions, containers' inability to safely host Docker-in-Docker, and traditional VMs' friction and slowness.
It's not a complete answer to agent security, nothing is, yet. Sandboxes protect your local machine; they don't govern what agents do through the tools and APIs you've given them access to. That layer of the problem remains unsolved and is where much of the industry's attention is turning next.
But as a foundation, a reliable, fast, cross-platform way to let agents run with genuine autonomy without risking your host system. Docker Sandboxes is the most practical option currently available. The experimental label should be taken seriously, but the core architecture is sound. This is likely the direction the ecosystem is heading.



