// COMING 2026

Prompt Recovery

A novel about building AI systems that actually work. Think The Goal and The Phoenix Project — but for teams shipping large language models into production.

by Michael John Peña

Get notified at launch Read the premise
Prompt Recovery book cover

⚠ Cover art is a work-in-progress placeholder

ssh sarah@autoscale — agentOS-prod — tmux
sarah@autoscale:~$ kubectl get pods -n agentOS-prod NAME READY STATUS RESTARTS agentOS-router-7b4f9 0/1 CrashLoopBackOff 47 agentOS-llm-worker-3 0/1 OOMKilled 12 agentOS-eval-runner 1/1 Running (lying) agentOS-gateway-a2c1 1/1 Running 0   sarah@autoscale:~$ cat /var/log/billing-alert.log | tail -3 [CRIT] OpenAI spend: $47,231.89 / 24h (budget: $2,000) [WARN] Token burn rate: 4.2M tok/min — 3x normal [INFO] Cost anomaly detection triggered at 11:47 PM   sarah@autoscale:~$ ./recover.sh --plan --agents=all ▸ Loading recovery playbook... ▸ 25 chapters. 90 days. One shot. sarah@autoscale:~$
$ agentctl status --watch ┌─ Agent Fleet ─────────────┐ │ router ✗ crashed (47x) │ │ planner ⚠ looping │ │ retriever ✓ healthy │ │ executor ⚠ throttled │ │ evaluator ✗ lying │ └───────────────────────────┘ ↻ refreshing in 5s...
$ stern -n agentOS-prod --since 5m 02:44:12 planner → "Retrying prompt… attempt 94" 02:44:13 router → panic: nil pointer dereference 02:45:01 executor→ rate limit hit (429) 02:46:58 eval → assert failed: "accuracy" > 0.7 02:47:02 gateway → Sarah connected from 10.0.1.42 02:47:03 gateway → "Let's fix this."
incident 0:ops-triage* 1:agents 2:eval 3:logs sarah@autoscale   02:47 AM
// The Premise

Day one.
Everything is already on fire.

Sarah Chen is a seasoned engineering leader who has just been hired to run the AI platform at AutoScale — a fast-growing startup whose crown jewel, AgentOS, is held together by one exhausted engineer, duct tape, and good intentions.

Sarah's Slack notification sounded at 11:47 PM. Then again at 11:48. By 11:50, her phone was buzzing with the intensity of a trapped bee. She knew what that meant: production was on fire, and she was about to do something reckless about it.

She untangled herself from the couch, dislodging Kernel from her lap and earning a look of betrayal that only a cat could deliver with such precision.

— Chapter 1: Into the Fire

She has 90 days before the board pulls the plug. What follows is a crash course in building AI systems that survive contact with reality — told through the lens of one team's fight to turn chaos into something they can be proud of.

// What You'll Learn

Real engineering. Real consequences.

Every chapter embeds production-grade AI engineering concepts inside a story you can't put down.

🪟

Context Window Architecture

Why your prompts break at scale and how to design context as a contract, not an afterthought.

🛡️

Guardrails & Safety

Prompt injection, jailbreaks, and the layered defense patterns that keep AI systems from going off the rails.

📊

Evaluation That Doesn't Lie

Moving beyond vibes-based testing. Building eval frameworks that catch failures before your customers do.

🔁

Agent Orchestration

Multi-agent systems, cascade failures, circuit breakers, and the patterns that make AI agents reliable.

👁️

Observability & Cost Control

Tracing LLM calls, spotting the $47,000 Tuesday before it happens, and building dashboards that matter.

⚖️

AI Ethics in Practice

Not theory — the messy, real-world moments when technically correct recommendations have devastating human consequences.

// The Structure

Three acts. Twenty-five chapters.

A ninety-day journey from inherited chaos to production confidence.

Act I — Days 1–30

Inheriting Chaos

Sarah discovers what's broken: runaway costs, a single point of failure, shadow agents nobody owns, evaluations that lie, and a team on the edge of burnout.

Chapters 1–8
Act II — Days 31–72

Building the Foundation

With the clock ticking, the team rebuilds around three principles: context is your contract, reliability through orchestration, deploy with humility.

Chapters 9–18
Act III — Days 73–90

Trial by Fire

The enterprise demo. A crisis of values. A walkout. And the beginning of everything that comes after.

Chapters 19–25
// Who It's For

If you've ever been paged at 2 AM over an AI system, this book is for you.

// If You Liked

The DNA of this book.

Prompt Recovery lives at the intersection of these five books.

📕
The Goal
Eliyahu Goldratt
The theory of constraints — applied to AI pipelines
📕
The Phoenix Project
Gene Kim et al.
DevOps through narrative — our format inspiration
🤖
AI Engineering
Chip Huyen
Production ML systems — the textbook behind the story
⚙️
Release It!
Michael Nygard
Stability patterns — circuit breakers and bulkheads in practice
🧠
The Alignment Problem
Brian Christian
AI ethics and safety — the human cost of getting it wrong

If any of these are on your shelf, Prompt Recovery was written for you.

// About the Author

Michael John Peña

Michael is a software engineer, author, and Microsoft MVP who has spent over a decade shipping AI and cloud systems across startups and enterprises in Australia, the Philippines, and beyond. He has worked on everything from LLM orchestration to IoT platforms — and now writes fiction about the messy, human side of deploying AI at scale.

Website → LinkedIn → GitHub →
// Stay in the Loop

Get notified when it launches.

No spam. Just a single email on launch day — plus an optional early chapter preview for subscribers.