Each coding agent I exploit — Claude Code, Codex, even PI — leans on the identical device: /bin/bash. PI particularly runs virtually solely by bash, no sandbox in sight. There’s cause for that. Bash is without doubt one of the most closely represented languages in any pre-training corpus on the planet, and LLMs write it fluently. If you happen to give a mannequin a file to govern, a folder to examine, or a one-shot pipeline to assemble, the reply that falls out is sort of at all times a couple of strains of shell.
The draw back is the friction. Except you reside in YOLO mode, you spend half your day clicking Permit on discover, grep, sed, and cat prompts. Codex within the cloud sidesteps this by spinning up a recent container per process. On my Mac, each Codex and Claude Code fortunately edit my precise information — and even with git worktrees, I’ve ended up with stray uncommitted adjustments on predominant greater than as soon as.
So I began questioning: bash isn’t actually that difficult a language. What if I simply had Opus write me a bash interpreter — in Swift?
A weekend with the 1M context window
During the last day or so I had Opus on Further Excessive replenish the 1M context window a few occasions over. I gave it Vercel’s just-bash for inspiration and bashlex as a reference for the way an actual bash parser is structured, and let it cook dinner.
The constraints I cared about:
- Pure trendy Swift. No
Course of, nofork, noexec. Has to drop right into a Mac, iOS, or Linux app with out dragging libc shell-out habits right into a sandboxed binary. - Every part an LLM would truly write.
ls,cat,grep,sed,discover,awk,jq,tar,curl,bc,xargs,mktemp, the lot. - Actual sandboxing. Both a cordoned-off temp folder that seems like an actual filesystem to the script, or a pure in-memory tree that by no means touches the disk in any respect.
That final one was the entire level. Codex’s cloud sandboxes are good exactly as a result of they’re disposable. I wished the identical property regionally — and on iOS, the place you may’t fork something anyway.
What it seems like
The library is cut up into three merchandise plus a CLI. The smallest helpful program is that this:
import BashInterpreter
import BashCommandKit
let shell = Shell() // sandbox-by-default id
shell.registerStandardCommands() // ls, cat, grep, sed, discover, …
attempt await shell.run("""
for f in *.txt; do
echo "$(basename "$f" .txt): $(wc -l < "$f") strains"
achieved | kind -k2 -n
""")
Each command is a registered Swift kind. Pipelines are AsyncStream channels. The filesystem is a FileSystem protocol — and there are three implementations to select from:
RealFileSystem— the host’sFileManager, for trusted scripts.SandboxedOverlayFileSystem— confines the script to at least one host listing plus an in-memory/tmp. Symlink escapes are blocked, each path passes byrealpath(3), and error messages reference digital paths solely — host paths by no means leak.InMemoryFileSystem— pure in-memory tree. Nothing ever hits the disk.
A freshly-constructed Shell() already leaks nothing in regards to the host:
$ echo 'whoami; hostname; ls /Customers; cat /and many others/passwd'
| swift-bash exec --sandbox /tmp/work /dev/stdin
consumer
sandbox
ls: /Customers: No such file or listing
cat: /and many others/passwd: No such file or listing
The 4 virtualisation axes — filesystem, community, processes, id — are all impartial. You decide into each. Need the script to have the ability to name your API however nothing else?
shell.networkConfig = NetworkConfig(
allowedURLPrefixes: ["https://api.example.com/v1/"],
allowedMethods: ["GET", "POST"],
denyPrivateIPs: true // block 127.0.0.1, 10/8, 192.168/16, …
)
That’s it. curl reads from Shell.networkConfig and refuses every little thing else with exit standing 7.
Bash 4, not bash 3.2
One small shock from this mission: macOS nonetheless ships /bin/bash 3.2 from 2007, due to a GPL licensing factor. Trendy Linux, Homebrew, and mainly everybody else are on bash 4 or 5. So when LLMs generate bash, they generate bash 4 — associative arrays, ${var^^} case conversion, ${arr[-1]} unfavourable indexing, mapfile, coproc. SwiftBash targets bash 4.x semantics for every little thing it implements, which suggests scripts that an LLM writes usually simply work — no “unhealthy substitution” surprises.
declare -A counts
for phrase in $(cat phrases.txt); do
counts[$word]=$(( ${counts[$word]:-0} + 1 ))
achieved
for okay in "${!counts[@]}"; do
echo "$okay: ${counts[$k]}"
achieved | kind -k2 -rn
That runs in SwiftBash. It doesn’t run in /bin/bash on a inventory Mac.
The onerous ones, correctly achieved
The factor I’m most happy about — and actually a bit stunned by — is how full the implementations of the staple instructions ended up being. These aren’t shims that deal with the three flags an LLM occurs to make use of most frequently. They’re correct implementations of what are, in lots of instances, full programming languages in their very own proper.
The largest ones, ranked by strains of Swift it took to implement them:
| Command | Swift LOC | What it truly is |
|---|---|---|
jq |
~4,500 | JSON question language: lexer, parser, evaluator, ~80 builtins |
awk |
~3,000 | Sample-action language: lexer, parser, expression tree, builtins |
sed |
~1,600 | Stream-editor mini-language: handle ranges, s/// with backrefs, b/t branches, maintain area |
discover |
~900 | Expression tree with -and/-or/-not, -exec … {} +, time/measurement predicates |
curl |
~600 | HTTP consumer with the allow-list and SSRF defenses bolted in |
bc |
~400 | Expression calculator with -l math library (Double-precision) |
jq, awk, and sed particularly every wanted their very own parser and evaluator — they’re actual languages. The truth that all three got here out coherent, with associative arrays and user-defined features in awk, with hold-space and labels in sed, with path expressions and scale back/foreach in jq, is the half I hold being just a little amazed by. These are the instructions that make bash truly helpful for knowledge manipulation, and so they’re those I’d most miss in the event that they had been stubbed out.
Past that tier there’s strong protection on grep, rg (ripgrep), kind, tar, gzip/gunzip, diff/patch, yq, tr, minimize, paste, be a part of, comm, xargs, and the remainder of the textbook unix toolkit.
Cowl the bulk, fail actually on the remainder
The design rule I saved coming again to: deal with nearly all of real-world utilization, and if you hit a limitation, fail in a method the mannequin can learn and route round.
LLMs are remarkably good at restoration in the event you give them an sincere error. They’re horrible in the event you silently produce improper output. So each command emits the identical type of error an actual GNU/BSD device would — prefixed with the command title, written to stderr, with a non-zero exit standing:
$ swift-bash exec script.sh
column: unknown possibility: --table-columns
awk: operate `gensub' not applied
ps: -L not supported in sandbox
When an agent sees awk: operate 'gensub' not applied, it does the plain factor: it rewrites the road as a sed substitution or an awk gsub, and strikes on. That restoration loop is the entire cause this works as an LLM device. A silent failure or a improper reply would poison the remainder of the session; a loud, particular error is simply one other knowledge level the mannequin handles in stride.
The corollary: I’d a lot somewhat ship a command with 80% protection and crisp error messages on the lacking 20% than a command with 95% protection and undefined habits on the perimeters. If the autopsy on a failed agent run is “it tried comm -12 --check-order and SwiftBash quietly ignored the flag,” I’ve made the improper tradeoff.
Math, due to course you want math
LLM-generated bash loves bc for arithmetic. SwiftBash ships a bc that’s “adequate” — it’s Double-accuracy somewhat than arbitrary precision, however for the sorts of expressions an agent truly writes it’s indistinguishable from the true factor:
$ echo "scale=6; 22/7" | bc
3.142857
$ echo "s(1.5707963)" | bc -l # sine, with the mathematics library
.999999999999
$ echo "sqrt(2) * 100" | bc -l
141.42135623730950488
# sum a column of numbers
$ awk '{print $2}' gross sales.tsv | paste -sd+ - | bc
18420.50
Mixed with awk, paste, and the standard $(( … )) arithmetic growth, that covers mainly each “do a fast calculation” factor an agent reaches for.
Just a few actual scripts
Simply to provide you a way of what runs unmodified — these are the type of one-liners and small pipelines that LLMs produce consistently, and so they all undergo the in-process interpreter with out spawning a single subprocess.
# Discover the ten largest supply information in a tree.
discover . -name '*.swift' -type f -print0
| xargs -0 wc -l
| kind -rn
| head -11
| tail -10
# Depend TODO/FIXME feedback by writer, utilizing grep + awk.
grep -rn -E 'TODO|FIXME' Sources/
| awk -F: '{ print $1 }'
| xargs -I{} git log -1 --format="%an" -- {}
| kind | uniq -c | kind -rn
# Rewrite a config file in place: bump each model: x.y.z by one patch.
sed -i.bak -E 's/^(model: [0-9]+.[0-9]+.)([0-9]+)/1
$((2+1))/' config.yaml
# Tally HTTP standing codes from an entry log.
awk '{ print $9 }' entry.log
| kind | uniq -c | kind -rn
| head
None of those want /bin/bash, none want Course of. They run inside the identical Swift course of that hosts your app.
The CLI
There’s a swift-bash binary that mirrors the embedded interpreter — similar parser, similar instructions, similar sandbox flags. You should use it as a safer bash for scripts you don’t absolutely belief:
# AI-generated script, no host entry in any respect.
echo "$llm_output" | swift-bash exec --sandbox /tmp/work /dev/stdin
# Sandboxed run with read-only entry to at least one particular API.
swift-bash exec --sandbox ~/Paperwork/scratch
--allow-url https://api.github.com/repos/instance/
analyze.sh
It additionally has a parse subcommand that prints the AST, which is helpful if you’re making an attempt to know why some bizarre quoting edge case isn’t doing what you anticipated.
What it’s truly for
The imaginative and prescient is an iPad coding-agent app that embeds this factor as its bash device. OpenAI provides you code_interpreter over the wire, and it’s nice — but when I’ve a wonderfully serviceable interpreter that runs in-process on the system, why pay a round-trip to run wc -l? Mild agentic exploration, summarising a folder of CSVs the consumer dropped into the sandbox, fundamental knowledge wrangling — all of it stays native, and all of it stays contained in the sandbox the host app handed the script.
To be clear: SwiftBash solely manipulates information inside the sandbox you give it. It doesn’t attain into the consumer’s Photographs library or learn arbitrary information from the Information app. However the sandbox is a standard Swift FileSystem, which suggests an embedding app can plug in no matter further instructions it needs. I can think about pulling in a couple of of my SwiftText routines — Markdown-to-HTML, HTML-to-PDF, that form of factor — and registering them as bash instructions. Then you may have an LLM produce a report in Markdown contained in the sandbox and get a elegant HTML or PDF out of the identical script.
It additionally seems to be a helpful CLI in its personal proper. I now attain for swift-bash exec --sandbox each time an LLM fingers me a script and I haven’t but learn the entire thing.
And yet another factor
I requested Opus to summarise the teachings we discovered constructing the bash interpreter — what the abstractions ended up being, the place the parser and the executor cut up, how AsyncStream pipelines truly need to be wired. Then I handed that abstract to one other Opus and requested it to begin a Swift interpreter on the identical structure.
It’s already additional alongside than I anticipated. Most arithmetic, management move, and performance definitions work. I’ll in all probability wire it into SwiftBash itself as a stand-in for swiftc in order that #!/usr/bin/env swift scripts can run inside the identical sandbox as every little thing else.
Identical trick, completely different language — and the identical cause it really works. The coaching knowledge is already there. We simply have to provide it someplace protected to run.
Why open supply?
Actually? As a result of I don’t know the way full or appropriate that is but. Bash is a sprawling, decades-old language with all types of corners (job management, brace growth edge instances, the seventeen alternative ways [[ … ]] differs from [ … ]), and I’ve coated the components that LLM-generated scripts truly train — however “truly train” is a shifting goal. Each mannequin I throw at it finds one other quoting wrinkle.
So I’m placing it on GitHub. If you happen to learn this and assume that’s a enjoyable concept, however you forgot about X, please inform me. When you have a use case I haven’t considered — embedding it in a Shortcuts motion, wiring it as much as an area mannequin, utilizing it as a instructing sandbox for a bash class — I’d love to listen to that too. The repo is the dialog; I’ll meet you there.
Associated
Classes: Administrative
