Most people use AI like a smarter Google. Ask a question, skim an answer, maybe copy a snippet, move on. Which is fine for lookup, but it’s limiting if you’re trying to build something that has to hold together over months: product decisions, architecture, naming (the hardest bit of any app), boundaries, consistency, all the boring bits that turn “a hobby project” into a “real project”.
The Shift
The shift for me was treating AI like a collaborator in the workflow, not a widget. It’s still software, it still hallucinates, and it absolutely does not get to be in charge, but it can be useful in the same way a good colleague is useful: you bring direction, context, and constraints, and it helps you move faster inside those boundaries.
That means I don’t mainly use it for answers. I use it to poke holes in what I’m doing. The one thing I have done with all my agents and GPTs is ask them to challenge me—do not just assume I know best (I really don’t)—and to sometimes explain things to me like I’m a junior.
- Stress-test ideas: “If I do this, what breaks later?”
- Expand rough thoughts: “What am I missing here? What are the decisions that should be made next?”
- Identify gaps: “Where is this underspecified? What does the code need to know?”
The rule is simple: it can generate ideas all day, but it doesn’t get to make decisions.
The Workflow
This is roughly how I run projects like Kanora, Murmr, and smaller tools like ShotKit. It’s not a productivity system, it’s a way of keeping AI useful without letting it turn the whole thing into a slot machine.
1. Idea Exploration (chatting, not prompting)
I’ll use my phone and iPad for this. I’m not chaining myself to a desk in the office and locking myself away of an evening. I’ll do the regular evening routine with the kids and once they are down for the night, I’ll make sure the house looks less like a hurricane has hit it, then sit down and second-screen the TV for a bit—not for doom scrolling, but to have a chat with ChatGPT about ideas for a new feature or product.
I’ll usually start with the awkward questions:
- What problem is this actually solving?
- What’s the simplest version that proves it matters?
- Where does it sit in the system? What boundaries does it touch?
- What are the failure modes? What does “bad” look like?
If I can’t answer those, I’m not ready to build it, and AI won’t fix that for me.
2. Spec & Prompt Generation (making constraints explicit)
I’ll throw stuff around, throw most things out, but whatever does stick, I’ll get ChatGPT to crystallise into an Obsidian-friendly note that I can paste into my vault for that particular project. That then feeds into the next AI coding session I have.
Not a huge document, but enough that both a human and a model can act on it without mind-reading.
That typically includes:
- The goal in one sentence
- What’s in scope / out of scope
- The user-facing behaviour (and what happens when it fails)
- Key architectural constraints (local-first, privacy-first, no unnecessary accounts)
- Design rules (DesignKit tokens/components rather than random UI)
- “Definition of done” checks (tests, linting, basic performance sanity)
Then I turn that into a prompt that is intentionally boring. The prompt is not “build me an app”. It’s “make this specific change, in this shape, under these rules”.
3. Execution (AI writes the first draft, I steer)
This is where tools like Codex / Cursor / Claude earn their keep. I give them a single unit of work: one feature, one refactor, one screen, one service boundary. I’m not trying to get a full product in one shot because the output is always better when the task is well framed.
The important part is that I stay in the loop:
- I review diffs like I would on a normal PR
- I sanity-check architecture, naming, and boundaries
- I reject changes that “work” but quietly add long-term mess
If I wouldn’t accept it from a human teammate, I don’t accept it from a model.
4. Automated Review (gatekeeping, not vibes)
AI makes it easy to generate code quickly, which also means it’s easy to generate subtle breakage quickly. I don’t rely on confidence. I rely on verification.
At minimum, I want a single command that represents “is this safe to merge?” and I run it constantly. Tests, linting, formatting, quick sanity checks. If it’s not green, the work isn’t done.
5. Manual Review (taste and accountability)
This is the part that doesn’t scale, and it’s the part that matters most.
- Does the change match the intent, or did it drift?
- Is it consistent with the system (DesignKit, architecture standards, doc patterns)?
- Is the trade-off worth it, or did we just add complexity because it was easy?
AI accelerates execution, but I still own the decisions and the consequences.
What This Enables
It enables speed, but more importantly it enables a different style of iteration. I can keep more things moving without context-switching into “blank page mode” every time.
- Faster iteration, because the first draft is cheap
- Better ideas, because rough thinking gets challenged early
- Less friction, because docs and design constraints reduce repeated explanation
I am not looking for perfection here. I am looking for tidy and readable code that follows an architecture I understand, but not perfection. I never wrote perfect code, but I wrote shippable code, and it’s no different with AI.
The AI code is probably better structured than mine would be, but that’s down to the speed at which I can refactor if I don’t like the look of it—something that usually killed off any side project in the past.
Once you realise you’ve got a big refactor ahead on a hobby project because you’ve gone down the wrong path, it stops being fun and starts feeling like work. It ends up rotting on GitHub. You might come back to it one day, but it’s like picking up a game after a few months—you remember roughly what was going on, but not how you got there or what to do next, so you just turn it off and play something else.
Or in the case of development, start another project. Rinse and repeat.
Not so with AI tools, and that’s why I love them.
What It Doesn’t Do
It doesn’t replace thinking, and it doesn’t remove responsibility.
If anything, the danger is that it makes it too easy to stay in exploration mode. You can always generate another feature, another approach, another refactor. The workflow only works if you have a clear line between exploration (messy is fine) and execution (messy is debt).
The paradigm shift for developers is real. People who want to ship apps and build things will embrace the new tools. People who love the craft of coding will be hesitant, but they can (and should) find places for these tools in their workflows.
I am definitely a builder of things. I like the code, but I like the design and results better. The best metaphor I can think of is doing long division by hand versus using a calculator. You know what you need to input, you know what you will get as output, but one method lets you move much faster.