EP 01

Pepper Is Not a Chatbot

May 2026·6 min read·#concept#architecture#state-machine

Calling Pepper an "AI assistant app" is half right and half wrong.

To explain the difference, you have to start with how Pepper actually knows our family.

An AI That Knows Our Family — Family Graph and Family Vault

With a regular chatbot, the conversation resets when it ends. The next session, it has no idea who you are or how your household works. Every time, zero.

Pepper always knows two things.

The Family Graph is the relational and permissions structure of our family. Pepper knows that I can send Eunsoo a notification, but Eunsoo can only set reminders for Eunje — not the other way around. It doesn't just process messages. It operates within a context of who → can do what → for whom.

The Family Vault is where everything about our family lives. Financial records, legal documents, Eunsoo's drawings, family photos, our favorite restaurants, recipes. The more the Vault accumulates, the deeper Pepper knows us. Say "I changed my hair salon — it's OO Hair now" and the next booking is automatically updated. No need to explain it again.

That's the fundamental difference from a regular chatbot. The goal is to stop opening Gemini or Claude in daily life. And instead of asking — to be taken care of.

The Two Things Pepper Has to Do

Within that context, there are two core things Pepper needs to do.

One: Proactively look out for us. It reads my email, calendar, and portfolio on its own and reports back in the morning before I ask — "Eunsoo's tutoring pickup is today," "Your portfolio dropped 2% yesterday," "You have 3 unanswered emails." And it filters. Of those 3 emails, which ones actually require a reply and an action?

Two: Do what I ask. "Send Eunsoo a reminder to eat her snack at 3," "Put together the family schedule for this week," "Confirm the tuition payment." Natural language in — done.

That all sounds ideal. And that's where the reality sets in.

I Can't Keep Building Forever

I'm not a developer. I don't know every use case right now. As the family uses it, new requests will keep coming. Eunsoo might say "Pepper, can you post my drawing to my portfolio site?" Soyeon might say "Can you parse this notice from the tutoring center?"

I can't develop and deploy each of those one by one. I don't have the skills or the time.

So Pepper has to evolve on its own. A self-developing system. That's the core of it. We're in a world where Claude Code can build almost anything from an instruction. Pepper should work the same way.

A Decision Layer — STATE A / B / C

And that's where a question came up: When someone sends a message — how does the system decide whether to just answer, to execute something, or to build something entirely new?

I broke it into three situations.

First, things that just need an answer. "What's the weather today?" — done.

Second, things that have an existing function that can be run. "Send Eunsoo a notification" — Pepper already has that capability, so it just runs it.

Third, things that don't exist yet but could be built. Or things that require a human to step in.

There needs to be a decision layer that distinguishes between these three. Because what happens next is completely different depending on the category.

And here, another practical problem surfaced: cost.

AI APIs charge per call. This is a system four family members use every day. Running an expensive model on every request isn't sustainable. The kids might even mess around with it constantly. There's no reason to fire the top-tier model at "What's the weather today?"

So the decision layer and the cost structure had to be designed together. Light decisions go to cheap models; the expensive model only kicks in when real intelligence is genuinely needed. A mix of Gemini and Claude, deployed based on what each situation actually requires. This structure came out of planning sessions with Claude — there's no way I would have thought of it on my own.

That thinking eventually crystallized into STATE A / B / C.

STATE A — Function already exists. When a request comes in, it executes immediately. A lightweight model identifies intent, and the registered function runs. Fast and cheap.

STATE B — Build the missing function. The registered function doesn't exist, but one can be built. Pepper generates the code itself, validates it in an isolated environment, and automatically registers it into the system. Pepper expands itself without me developing anything. This is the only state where the expensive model (Claude Sonnet) is used — because this is the moment that actually requires real intelligence.

STATE C — A human is needed. Cases Pepper can't handle autonomously. But just saying "I can't do that" isn't acceptable — from the family's perspective, that breaks trust.

Say Eunsoo asks: "Pepper, can you make a reservation at a mara tang place on Naver?" and that function doesn't exist yet:

Eunsoo gets an immediate response: "I can't do that right now. I'll let you know once your dad sorts it out."
A card appears in my chat: what was attempted, what the blocker was, what's needed to resolve it.
A GitHub issue is automatically created.
Once I resolve it and deploy — Eunsoo hears: "That thing I couldn't do before? I can do it now. Want to try again?"

Failure becomes the input for the next iteration. STATE C is not giving up — it's a stage of growth.

The One Question of Phase 0

Phase 0 is not about building features. It's about validating one thing: Does Pepper's brain actually work?

Gmail integration, calendar, portfolio monitoring — these aren't features. They're a test bed. They're how we verify that STATE A / B / C actually function correctly in practice.

STATE B — the moment Pepper writes its own code — is the real finish line for Phase 0.

Next→EP 02·Before the Code — Architecture Design