EP 08

From 2480ms to 815ms

May 2026·5 min read·#performance#latency#ux

Let's start with the numbers.

Time from user message to Pepper's response. It started at 2480ms. Now it's 815ms. A 67% reduction.

But the real subject of this episode isn't the number.

The bottleneck wasn't the LLM

When it felt slow, I assumed it was the LLM — tried different models, trimmed prompts.

Then I actually measured.

DB INSERT:   241ms  (start logging)
LLM call:    481ms  (actual AI generation)
DB UPDATE:   741ms  (finish logging)
─────────────────
Total:      1463ms

The LLM was 481ms. The database writes were 982ms. Logging cost twice as much as thinking.

The issue was the structure. Wait for INSERT, run the LLM, wait for UPDATE — everything queued in sequence.

Parallel + fire-and-forget

// Before: DB blocks both ends of the LLM call
const log = await db.insert(...)    // 241ms wait
const result = await llm.call()     // 481ms wait
await db.update(log.id, ...)        // 741ms wait
// total: ~1463ms

// After: parallel + fire-and-forget
const insertPromise = db.insert(...) // no await — runs alongside LLM
const result = await llm.call()      // 481ms (the real bottleneck)
insertPromise.then(({ data }) => {
  db.update(data.id, ...).catch(() => null)  // background, after response
})
// total: ~481ms

INSERT starts at the same time as the LLM call, no await. UPDATE happens in the background after the response is already sent. The log finishes a little later — that's fine. The user doesn't need to wait for it.

Logging isn't the critical path. But it was blocking the critical path.

That alone wasn't enough

815ms. Still felt slow.

The reason 0.8 seconds feels long: there's no feedback. Press send and the screen doesn't move — even 300ms feels like something is broken.

Two things changed that.

One: the message appears on screen the moment you send it. No waiting for the server. A clock icon while it's in flight, then a checkmark on confirmation.

Two: while Pepper is thinking, a

...

typing indicator appears immediately. Pepper seems to be "typing" before the response actually exists.

// Insert a typing row before LLM generation even starts
const typingRow = await db.from('chat_messages').insert({
  is_typing: true,
  content: null,
  // ...
})
// After LLM completes, UPDATE the same row with the actual content

The mobile app subscribes to this row via Realtime. When

is_typing

flips to false and content appears, the dots are replaced with the real message.

The actual speed is still 0.8 seconds. But the experience feels different. When something is visibly happening, waiting is tolerable.

Being fast and feeling fast are different things — and sometimes feeling fast matters more.

The real bottleneck now is LLM call time. That's irreducible for the moment. But what users see while they wait is something I can control.

Next→EP 09·How to Give Pepper a Memory