EightyThree Apps - Indie macOS & iOS apps by Ben Reed

Most conversations about AI workflows get stuck at one of two extremes.

At one end, you have prompting: ad-hoc instructions, restated every session, with behaviour that shifts depending on how well you phrase the request. At the other, you have multi-agent systems: networks of agents coordinating with each other, which sound powerful but tend to introduce more complexity than they remove.

What’s been more useful in practice sits in the middle: a single lead agent operating within clear constraints, using a set of reusable skills to do real work.

The shift for me wasn’t adding more agents, it was moving from prompts to skills.

The Problem Skills Solve

Prompting works until it doesn’t.

You can get good results, but they’re inconsistent. You repeat yourself. Small wording changes produce different outcomes. Over time, you spend more effort maintaining the prompt than doing the work.

The underlying issue is that prompts are stateless and disposable, while the work you’re doing is not.

What you actually need is something closer to a reusable unit of behaviour.

That’s what a skill is.

What a Skill Actually Is

A skill isn’t a better prompt. It’s a structured definition of behaviour.

The important part isn’t the wording, it’s the shape. A good skill makes it obvious:

when it applies
how the work should be done
what the output must look like

That structure is what makes it reusable.

From Prompts to a System

Once you introduce skills, the workflow changes shape.

AGENTS.md acts as the always-present baseline: architecture rules, design constraints, workflow expectations.

Skills are layered on top only when relevant. That keeps context focused while still enforcing consistent behaviour.

The lead agent remains responsible for reasoning and orchestration, but it’s no longer improvising from scratch. It’s selecting from predefined ways of working.

One Skill Library, Multiple Agents

One practical detail that made this setup hold together is that the skills don’t live inside a specific tool.

They live in a single ~/AI directory, and both Claude and Codex access them via symlinks.

~/AI/skills/...

# Claude
~/.claude/skills -> ~/AI/skills

# Codex / Cursor
~/.codex/skills -> ~/AI/skills

That means both environments operate against the same skill library, the same structure, and the same expectations.

There’s no duplication, and more importantly, no drift.

Because the skills are just structured markdown, there’s nothing tool-specific about them. The “system” isn’t Claude or Codex. It’s the constraints and workflows defined in those files.

The models become interchangeable execution layers.

What a Skill Looks Like (In Practice)

At a file level, a skill is deliberately simple.

It’s just a folder containing structured markdown, sometimes with supporting files. The complexity comes from how it’s written and how consistently it’s applied, not from the format itself.

A typical setup looks something like this:

~/AI/
  └── skills/
      ├── ios-code-review/
      │   ├── SKILL.md
      │   └── examples.md
      │
      ├── ios-code-review-implement/
      │   ├── SKILL.md
      │   └── templates/
      │       └── pr-template.md
      │
      ├── ios-implement/
      │   └── SKILL.md
      │
      └── ios-localization/
          └── SKILL.md

Each skill lives in its own directory, with a primary SKILL.md file that defines behaviour. Additional files are optional, but useful for things like:

templates (PR descriptions, commit formats)
examples (good vs bad outputs)
reference material (architecture notes, patterns)

The important part is that every skill is self-contained.

If you want to see how this idea is formalised on the Claude side, their documentation is worth a read: https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview

At this point, the interesting question isn’t how many skills you have, it’s how they interact.

In practice, most of the system reduces down to a few core skills working together in a predictable loop.

The Anatomy of a Skill

A skill only works if it’s explicit enough that the model doesn’t have to guess.

In practice, that means every skill follows roughly the same shape:

trigger conditions
intent
constraints
steps
output format

Without structure, the model improvises. With structure, it executes.

Two Core Skills: Implementation and Review

Most of the value comes from pairing two skills that mirror real development work.

Implementation (Constrained Execution)

The ios-implement skill takes vague or structured input and forces it through a defined process:

apply architecture rules
enforce design system usage
shape output into incremental changes
avoid shortcuts that introduce drift

It turns fuzzy intent into constrained execution.

Code Review (Structured Verification)

The ios-code-review skill runs after implementation.

It verifies that the output meets standards:

architecture consistency
pattern alignment
design system usage
edge cases and risks

The output is structured and actionable, not conversational.

Optional: Coordinator Agent

You can introduce a coordinator to enforce ordering:

Be Selective With Skills

Once you start using this approach, it’s tempting to collect skills.

There are already plenty floating around, and it’s easy to end up with a large library that quietly works against you.

More skills doesn’t mean more capability. It often just means more noise.

Some tools (Claude in particular) will load skills upfront at the start of a session unless you control it. That means every extra skill consumes context and tokens before you’ve even started.

The approach that holds up better is to stay deliberately small:

keep the skill set focused
prefer a few strong workflows
remove overlap

You’re not collecting capabilities. You’re defining reliable paths.

What Actually Scales

What scales isn’t the number of agents. It’s the quality of the skills.

As the skill library improves, the system becomes more predictable. The model matters less because behaviour is already defined.

Takeaway

Skills are the missing layer between prompting and systems.

They turn one-off instructions into reusable workflows, enforce constraints where they matter, and make outputs predictable enough to build on.

The gains don’t come from adding more agents. They come from tightening how work is defined and making those definitions reusable.