Case 02 — Flow

Personal finance, actually personal.

A budget app for two people who wanted to know where the money went — without feeding their bank statements into anyone else’s servers. Designed and built end to end with AI, live as an app on their phones, in daily use by both of its users.

Hero — Flow’s home screen on a phone, dark theme: Budget Bars, month summary, FAB. Populate with sample amounts and merchant names. Hero — Flow’s home screen on a phone, dark theme: Budget Bars, month summary, FAB. Populate with sample amounts and merchant names.

In short.

Domain
Personal finance — a private budgeting app for a two-person household.
Scale
Seven years of spending history imported and analyzed; Budget Bars against personal averages, savings goals, an emergency-fund tracker, and a recurring-transaction engine — in daily use by both users.
Role
Product, design, and direction — sole designer. Claude as implementer and data analyst, working under written rules it couldn’t bend.
Stack
React + TypeScript web app on Supabase; live in production.
Methodology
Spec-driven build — 22 design specs paired to 23 implementation plans, guarded by 365 automated tests. The same research-to-build loop as the enterprise work, pointed at a shipped product.
Constraints
No bank connection, by design — statements redacted before import, even from the AI. The data never leaves home.
Outcome
Live and in daily use; surfaced recurring subscriptions nobody remembered buying; built end to end in roughly six weeks.
0
Third parties touching
the household’s data
7 yr
Of spending history
imported and analyzed
2
Daily users —
the whole household
5
Core surfaces: budgets, insights,
goals, emergency fund, recurring

What was built

A budget app for a household of two.

Flow is a mobile-first web app my wife and I open every day: Budget Bars referenced against our own seven-year averages, spending insights written in sentences instead of charts, savings goals, an emergency-fund tracker, and a recurring system that keeps the data alive without a bank connection. It was never meant for an app store — it was meant to answer one household’s question precisely.

One fact belongs this early: I designed and directed; Claude implemented — every screen, the schema, the historical import — under written rules it could not bend. The full method, pipeline and all, closes this study.

Fig. 1 — The home screen’s grammar: Budget Bars against historical averages, one tap to add, nothing requiring a manual.

The problem

The price of every budgeting app is the data.

We couldn’t say where our money was going — the usual reason people reach for a budgeting app like Mint or YNAB. But every one of those services starts with the same ask: connect your bank accounts, or upload your statements. A bank statement is the most complete record of a life that exists — every merchant, every habit, every place you’ve been. I wasn’t willing to feed that to a third party to find out we eat out too much.

So the product decision came before the product: build our own, and treat the data boundary as a feature. The discipline went all the way down — even the AI that built the app worked from statements I had redacted first.

Even the AI that built it never saw an unredacted statement.

Not a privacy detail — the design philosophy

Fig. 2 — The data stays home: no aggregators, no credentials handed over, redaction before analysis.

The founding act

Born with seven years of memory.

Most budget apps fail the same way: an empty screen, a month of dutiful data entry, then abandonment. Flow’s first feature was the opposite — before the interface existed, a one-time Claude-driven import read seven years of redacted statements and reconstructed our entire history: every transaction categorized, ownership assigned, card payments excluded so transfers never counted as spending.

The first screen Flow ever rendered already knew our habits. Budgets weren’t guesses typed into boxes — they were seeded from what we had actually spent, category by category, and every Budget Bar carries its seven-year average as a reference line. Import before interface is the structural decision everything else in the app stands on.

Fig. 3 — 9,925 transactions across 9 accounts, 2019–2026, converging into one seeded database. Run once, never part of the app.

The interface

Real screens, calm by design.

These are real screens from the live app — the one my wife and I open every day — shown with sample data so the design does the talking, not our grocery bills.

The interface earned its calm the hard way: three complete visual identities in ten days — it started warm and editorial, passed through an enterprise-density experiment, and finally settled on the dark system you see now, specced as calm financial confidence. An app you open every day about money must feel like neither a spreadsheet nor a casino.

Add-transaction bottom sheet — amount, merchant autocomplete, category pills, owner pills. Sample merchants and amounts. Add-transaction bottom sheet — amount, merchant autocomplete, category pills, owner pills. Sample merchants and amounts.
Insights drill-down — a category or tag view with an implication-first insight line above its chart. Sample amounts and merchants. Insights drill-down — a category or tag view with an implication-first insight line above its chart. Sample amounts and merchants.
Category drill-down — necessity split or sub-tag breakdown view. Sample amounts and merchants. Category drill-down — necessity split or sub-tag breakdown view. Sample amounts and merchants.

Fig. 4 — Real screens from the live app, shown with sample data.

From data to meaning

Insights that read like sentences, not stats.

Seven years of data is worthless as a wall of numbers. The insights layer was rebuilt around a single rule, written into its spec:

Implication-first. Every line leads with what the number means, not the number itself. The tone is honest and direct — not preachy, not scolding, not cheerleading.

Flow insight spec, April 2026

So a drill-down doesn’t say “$1,039 / 52%” — it says a single merchant absorbs more than half of everything in the category. Tag and sub-tag breakdowns, merchant correction, and category drill-downs all answer to the same sentence-first standard.

The interpretation itself is designed, not copywritten.

Every insight line in the spec carries display conditions: a merchant above half its category absorbs; below half, it merely leads; a peak month is named only when one exists. Most budgeting apps render numbers — Flow’s insight layer renders judgment with thresholds, which is why the lines read as if someone looked at the data rather than queried it.

Insight pair — two category/tag views, each with its implication-first sentence above the chart. Sample amounts and merchant names. Insight pair — two category/tag views, each with its implication-first sentence above the chart. Sample amounts and merchant names.

Fig. 5 — The engine thinking: a raw row crosses a threshold and comes out a sentence. Numbers in; judgment out.

What seeing it changed

The subscriptions nobody remembered buying.

The first honest outcome arrived within weeks: a set of subscriptions we had been paying without noticing — visible the moment seven years of recurring charges sat in one place — was cancelled. Not because an app nagged us, but because the data finally made the question impossible to avoid. We are simply more conscious of where the money goes now; that was the entire point.

Two features turned that awareness into structure. The necessity split tags every transaction essential or discretionary, with a review queue for the ambiguous middle. And the emergency fund view answers the question underneath all budgeting, computed from our real essential spend and mapped onto the actual savings balance:

If both salaries stopped tomorrow, how long could we run the house?

Emergency fund spec — the question the feature answers
Subscriptions pair — the forgotten recurring charges surfaced, and the keep/cancel decision beside them. Sample merchants and amounts. Subscriptions pair — the forgotten recurring charges surfaced, and the keep/cancel decision beside them. Sample merchants and amounts.

Fig. 6 — Subscriptions pile up in silence into a mountain; flattening it back down is on you. What survives is chosen.

The hardest design problem

No pipe to the bank — by design.

No bank connection meant the defining tension: statements can’t be imported weekly, but stale data kills a budget app. How does the ledger stay current without the pipe every competitor depends on?

The answer is the recurring system, built on a distinction most apps flatten: rent is constant, utilities are not. Constants post themselves as templates; variables ask one question — a push notification for one number, one tap to confirm. The constraint that looked like a weakness became Flow’s most original engineering.

Constants post themselves; variables ask one question.

The recurring system, in one line

Underneath is machinery, not a reminder list: each cycle a scheduled job walks the templates — constants confirm themselves, variables raise one notification. Thirteen run live, twenty-two sit paused. The monthly import stays the backstop, so the ledger is never more than one tap from true — the quiet center that keeps a pipe-less app honest.

Recurring pair — income grouped by employer, and the active/paused expense templates beside it. Sample amounts, merchant and employer names. Recurring pair — income grouped by employer, and the active/paused expense templates beside it. Sample amounts, merchant and employer names.

Fig. 7 — The invention, at full size: constants post themselves into the ledger; variables raise one question and take one tap. 13 live templates carry the ledger between imports.

The method, second time

The same loop, pointed at a product.

Flow used the same workflow that built this site: every feature began as a dated design spec the AI couldn’t re-litigate, with two design-rule documents fixing type, color, and interaction before any UI was generated.

The chain: ideas in Cowork, specs and build in Claude Code, verification in a headless browser, data on Supabase, deploys through Vercel — fronted by a redaction gate no statement crosses unscrubbed.

The stack was a product decision, not a fashion one: React + TypeScript (the dialect AI implements most reliably), an installable web app (a URL on both phones, no app store), and Supabase (Postgres with the ceremony removed).

Fig. 8 — How AI is wired into the work. Data is redacted before entry; one stage runs on no AI at all — the gate. When judgment says no, the light goes back to Build.

22
design specs written and built, Apr–May 2026
23
implementation plans paired to those specs
6
weeks from first spec to the recurring system
365
automated tests guarding the build (286 green — honestly counted)

The iteration story is written in the repo’s own filenames: the home screen was specced and rebuilt three times in two days (April 15–16); the insights surface took five specs in forty-eight hours before the sentence-first version survived; and the entire visual identity changed twice — a warm editorial system, then an enterprise-density experiment — before calm financial confidence stuck. Cheap regeneration is what made that standard affordable; the dated specs are what kept every rejection on the record.

The honesty extends to the worst day: a corrupted git operation once produced a commit deleting 70,508 lines — nearly the whole repository. The recovery was methodical (rewind, re-stage only the real change, verify the build), and the lesson was written into the project’s incident log so no future session repeats it. Working with AI at speed doesn’t remove the need for discipline; it raises the price of not having it.

Honest outcome

What holds, what’s open.

Holds: live in production, opened daily by both users — a market of two with full adoption and zero churn. Forgotten subscriptions cancelled, budgets grounded in seven years of real behavior, the data still home. It outlived the phase where side projects die: the novelty wore off; the daily use didn’t.

Open, and named: categorization is imperfect — some transactions still tag wrong, and drill-downs need better correction tools. 79 of 365 tests fail — inherited debt, logged, never release blockers; the 24 guarding the newest system are green. A budget app is never finished; this one is honest about where it isn’t.

Why this case is here: the site you’re reading proves the workflow; Flow is that workflow shipping a real product — backend, database, tests, two daily users whose behavior changed. Not portfolio-specific. That was the claim; this is the evidence.

Fig. 9 — The loop, closed: seven years flow in, daily use flows out — subscriptions cut, spending conscious of its average, and the data still home.