Blog

6 min read

Stop Asking AI to Check Things Scripts Can Check

How I extended my hobbyist AI budget by moving deterministic compliance checks out of Claude sessions and into shared shell scripts across all four of my iOS projects.

  • ai
  • workflow
  • tokens
  • indie dev
  • ios
  • verification

I'm building iOS apps as a hobbyist, spending £37 a month across Claude Pro and ChatGPT Plus, and that budget has a real ceiling. Every session I start with Claude burns context just re-establishing what the model already found last time, and if I'm not careful, I hit the token limit before the interesting work even begins. So over the last few sessions I've been deliberately thinking about which parts of my workflow actually need AI intelligence and which parts just need to run consistently and tell me something is wrong.

The distinction matters more than it sounds.

This idea started with Kanora

A few months ago, when I was deep in work on Kanora, I wrote about how I'd started using a verify.sh script to enforce architectural rules without relying on the model to check them every session — you can read that piece here.

The core insight was that swiftlint custom rules and a build gate could handle a lot of the mechanical compliance work, freeing the model up for the parts that actually require reasoning. Kanora had the most mature implementation, with custom swiftlint rules covering service protocols, singleton usage, and business logic in views.

What I've done recently is take that same idea and extend it properly across all four of my active projects — Kanora, Photobooth, Murmr, and TetherPad — with a shared, parameterised check library that any of them can call. The concept was already proven; this was about making it consistent and filling the gaps.

What AI is genuinely good at (and what it isn't)

AI is excellent at reasoning about architecture, suggesting the right pattern for a situation it hasn't seen before, synthesising documentation from source code, and catching subtle design violations that require understanding context — the kind of thing where you need to explain why something is wrong, not just that it matches a bad regex.

It's much less suited to being a deterministic linter you invoke at the start of every task to confirm that nobody has called print() in a SwiftUI view since last Tuesday. That's just a grep. Asking Claude to do it costs tokens. Running rg '\bprint\s*\(' Sources/ costs nothing.

All four projects follow the same 83OS architectural standards: no business logic in views, no raw Color() values (use design tokens), no direct IO from the feature layer, every #Preview applying .designSystem(), and all user-facing strings going through a typed L10n enum. Every agent session, something would spend tokens re-establishing whether those rules were being followed. That's wasted budget, and it accumulates fast across four codebases.

The fix: a shared check library

The approach we landed on was a small library of parameterised shell scripts sitting one level above all the project repos at _kb/scripts/checks/:

_kb/scripts/checks/
  check_no_print.sh
  check_design_tokens.sh
  check_no_view_io.sh
  check_preview_designsystem.sh
  check_localisation.sh

Each script takes source directory paths as arguments and exits 0 on pass or 1 on violations. They're standalone, deterministic, and project-agnostic — the caller provides the right directories for their structure. Photobooth passes Photobooth/Features/, TetherPad passes iPad/Features/ macOS/Features/, Murmr passes Apps/. The check logic itself lives in one place, so fixing a false positive or tightening a pattern fixes it everywhere.

Every project then has a verify.sh that calls these shared checks, builds the app, and runs whatever tests exist. The whole thing runs in under a minute for the check-only (fast) mode and is something an AI agent can invoke at the end of a task as part of a defined Definition of Done, rather than something that needs AI attention to execute.

The one check that required a bit of thought was the localisation one. The pattern flags Text("multi word string") — string literals containing a space, with no interpolation — in view files outside of tests and previews. It's a heuristic rather than a proof: a string containing a space is more likely to be user-facing text that should go through L10n than a short identifier or a formatted value. Single-word strings, interpolated strings, and strings in preview files are all excluded. It will have false positives occasionally, but the point isn't to be exhaustive — it's to catch the obvious cases without asking AI to read every file.

How the skills wire into this

The three main Claude skills in my workflow — ios-implement, ios-code-review, and ios-code-review-implement — all treat verify.sh as a hard Definition of Done gate. The skill tells the agent to run the script; the script delegates compliance checking to the shared library; the result is pass or fail, with no tokens spent on the checking itself.

Mermaid diagram

The key boundary is the one between the skills box and everything below it. Above the line: Claude, reasoning, context budget. Below the line: shell scripts, regexes, exit codes. Once verify.sh exists and the skills know to call it, the compliance layer runs completely outside of the AI session.

What this saves in practice

The real saving isn't just tokens-per-check, it's session structure. When I start a Claude session to implement something, I want the model spending its context budget on the interesting problem — understanding the architecture, reasoning about the right service boundaries, writing the actual implementation. I don't want it spending three exchanges confirming that the existing codebase doesn't have print() calls in it before we get started.

With verify.sh as a defined gate in every project, the instruction to agents becomes simple: run ./scripts/verify.sh fast before declaring a task done. The agent doesn't need to understand what the checks are — it just needs to know the script exists and what a passing run looks like. That's a much smaller context footprint than "please check for print statements, raw Color usage, hardcoded string literals in views, preview compliance, and direct IO in the feature layer."

It also means the checks run whether AI is involved or not. If I make a quick edit myself, I can run the script. If a session ends abruptly, the next one can verify the state without needing to re-derive anything. The checks are not in anyone's memory — they're in version control, which is exactly where they should be.

The broader principle

The mistake I was making — before Kanora showed me a better way — was treating AI as a general-purpose verification layer. It's not, or rather it's a wildly expensive one for checks that don't require intelligence.

The useful framing I've settled on is that anything expressible as a regex or a structural file check should be a script, and anything requiring reasoning about why a pattern is wrong in context, or whether an exception applies, is where AI earns its budget.

Scripts handle the mechanical compliance layer so that AI can focus on the architectural reasoning layer. They're not in competition — they're doing different jobs, and sorting out which job belongs where is what keeps a £37/month hobby budget from evaporating on grep operations.

Anyway, I'm going to leave it there — it's sunny outside, the BBQ is on, and there's a glass of wine with my name on it. Time to let the AI tools do their thing, while I do mine.