← Back to Home

The WAT Framework: How I Structure Every AI Project

The Problem With AI Projects That Don't Ship

Most AI projects start with a great conversation. "What if we used AI to do X?" Someone pulls out a laptop. A prototype gets built in 20 minutes. The room is excited.

And then nothing happens. The prototype lives forever in a folder called "demos." The conversation is never followed up. The problem that inspired the idea is still unsolved six months later.

I've seen this pattern in enterprises, in startups, and in my own work. The gap isn't a technology problem. There's no shortage of capable AI models. The gap is structural, there's no clear path from "the AI said something impressive" to "the AI is reliably solving a business problem."

The WAT framework is my answer to that gap.

Workflows: The Foundation

A Workflow is a Markdown file. It's a standing operating procedure: the SOP for a task that the AI should be able to execute repeatedly and consistently.

A good workflow answers three questions:

  • What is the objective?. What does success look like for this task?
  • What are the inputs and tools?. What does the agent need to start, and what tools can it call?
  • What are the edge cases?. What should happen when things go wrong or when the inputs are ambiguous?

Workflows are written before the agent is built. This is the most important constraint. If you can't write the workflow, you don't understand the task well enough to automate it. The writing process forces clarity, and clarity is the most under-rated step in AI project planning.

Agents: The Connector

An Agent reads the relevant workflow and executes it. In my stack, that's Claude, but the principle applies to any capable LLM. The agent's job is to connect intent to execution.

This is where most people underestimate what's needed. A good agent isn't just "call the LLM with a prompt." A good agent:

  • Reads the workflow to understand the task
  • Identifies which tools are needed and in what sequence
  • Calls those tools, handles the outputs, and makes decisions when outputs are ambiguous
  • Escalates to a human when the situation is outside its defined scope

The agent doesn't guess. It follows the workflow. When the workflow is unclear, the agent's failures tell you exactly where the workflow needs to be improved. This is a feature, not a bug.

Tools: The Execution Layer

A Tool is a Python script. It does one deterministic thing. It calls an API, reads a database, writes a file, sends a message. It doesn't think. The agent thinks. The tool just executes.

The separation matters. When you mix reasoning and execution in the same layer, debugging becomes a nightmare. When they're separated, a tool failure is obvious and fixable. An agent reasoning failure is also obvious, and points you back to the workflow.

All credentials for tools live in .env. The tools themselves are checked into version control. This separation of secrets from logic keeps the system auditable and secure.

A Real Example: Daily Planning

Here's how WAT looks in practice for a task I run every morning, daily planning.

Workflow (Markdown SOP): Pull today's Google Calendar events. Review the active project list. Flag any deadlines in the next 48 hours. Produce a structured daily plan with priorities sorted by urgency and impact.

Agent (Claude): Reads the workflow. Calls the calendar tool. Calls the Notion tool to get the project list. Identifies deadlines. Formats the output as a structured briefing.

Tools (Python scripts):

  • google_calendar.py, Reads today's events from Google Calendar API
  • notion_tool.py, Queries the project database in Notion

The whole system runs in under 30 seconds. The output is always structured the same way. If either tool fails, the error is surfaced immediately. If the agent produces a bad plan, I know to look at the workflow, not the tools, not the LLM.

How to Start Your Own WAT Stack

Start smaller than you think you need to. The instinct is to build a comprehensive system, one that handles all your tasks, integrates all your tools, and covers all your edge cases. That instinct will kill the project before it ships.

Week 1: Pick one task you do repeatedly. Write a workflow for it. Just a Markdown file with the objective, inputs, and what done looks like. Don't build anything yet.

Week 2: Write one tool for the most important external action in that workflow. If the workflow requires reading a calendar, write a script that reads a calendar. Test it in isolation.

Week 3: Connect the agent. Give it the workflow and access to the tool. Run it. See what breaks. Improve the workflow, not the model.

The framework is not precious. Adapt it. If your tools need to be TypeScript instead of Python, use TypeScript. If your agent needs to be GPT instead of Claude, use GPT. The structure is what matters, not the specific implementations.

The goal is a system that runs reliably without you babysitting it. That's the WAT promise: structured, auditable, improvable AI that actually ships.


Want to build your own WAT stack?

I use this framework for every AI system I build. If you want help designing or building a WAT-based system for your specific use case, let's talk.

Let's Talk