Brooks’s Law in Reverse: How Four Engineers and an AI Built a Million-User App in 28 Days

Dec 29, 2025

“Adding people to a late software project makes it later.”

Fred Brooks wrote this in 1975, and it became one of the iron laws of software development. When you’re trying to ship something complex under deadline, throwing more engineers at the problem usually backfires. Communication overhead multiplies. Task fragment. Integration gets messy. Nine women can’t have a baby in one month.

For fifty years, this has been gospel in software engineering. And then OpenAI shipped Sora for Android in 28 days with four engineers.

Number one in Google Play on launch day. 99.9% crash-free rate. Over a million downloads in the first weeks. Built from scratch to production by a team you could fit in a sedan.

The secret? They didn’t break Brooks’s Law. They understood it better than most people who quote it.

The Experiment Nobody Expected

When Sora exploded on iOS, the pressure to ship Android was immediate. Users were pre-registering on Google Play by the thousands. The company had only a small internal prototype. The timeline was aggressive - ship in a month or miss the momentum.

Traditional approach: assemble a larger team, divide the work, coordinate it, and hope the integration doesn’t kill you at the end. That’s what Brooks’s Law warns against. More people means exponentially more communication paths, more coordination overhead, more opportunities for things to go wrong.

OpenAI went the other direction. Four engineers. But each engineer got something that fundamentally changed the equation: Codex.

They consumed roughly 5 billion tokens over those 28 days. At current pricing, that’s somewhere between $15,000 and $75,000 in API costs. For a production app that hit number one in its category, that’s absurdly cheap. But the real story is what those four engineers were actually doing.

The First Failed Experiment

Here’s a detail most summaries skip: they tried the obvious thing first. They pointed Codex at the iOS codebase and told it to build the Android version autonomously. Let it run for twelve hours straight.

The result? Patrick Hum, one of the engineers, said it delivered something that “certainly wasn’t anything that we could show anybody.”

This is important. The fully autonomous approach didn’t work. The AI alone, even with access to the entire iOS codebase, couldn’t figure out what to build or how to structure it properly.

So they spent the first week doing something that sounds inefficient but turned out to be critical: writing code by hand. Not the app code, the architecture code. The patterns. The conventions. The examples of how things should be done.

They created what they called a “context-rich environment.” Text files documenting best practices. Exemplar features showing the right way to structure components. AGENT.md files that Codex could read to understand team standards.

This is the part that fascinates me. They treated Codex like what it actually is: a newly hired senior engineer who’s technically skilled but knows nothing about your specific project, your architecture preferences, or your product vision.

From Lines of Code to Systems of Code

Most teams today use AI for what I’d call “incremental assistance.” GitHub Copilot suggests the next line. Speed increases by maybe 20-30%. It’s helpful but not transformative.

OpenAI did something different. They used Codex for entire features, entire subsystems, entire architectural layers. Four large, intensive sessions where they’d lay out the architecture, explain the product logic, and then let Codex generate whole blocks of the application.

Codex wrote approximately 85% of the code. But what were the humans doing?

They were setting architectural boundaries. Explaining product requirements. Checking edge cases. Making judgment calls about tradeoffs. Reviewing what Codex produced. Deciding what to build next.

The role changed from “person who types code” to “person who designs systems and evaluates implementations.”

This is exactly what I keep seeing across industries. The valuable work shifts up the abstraction ladder. In manufacturing, you stop worrying about whether the weld is at the right temperature and start worrying about whether you’re welding the right thing in the right place. In software engineering, you stop worrying about syntax and start worrying about architecture.

The “Not Crazy Enough” Problem

Physicist Niels Bohr supposedly said: “Your theory is crazy, but not crazy enough to be true.”

Most companies using AI today aren’t crazy enough. They’re using it safely, incrementally, in ways that don’t fundamentally challenge how they work. Copilot for autocompletion. ChatGPT for documentation. AI as assistant, not as collaborator.

This is rational risk management. It’s also why they’re not seeing transformative results.

OpenAI went genuinely crazy with it. They let AI write the system - actually generate the bulk of the implementation, not just help write it or suggest improvements. That’s a level of trust that most organizations won’t accept, and most engineering teams aren’t prepared for.

But here’s what made it work: they were crazy in a structured way. They didn’t just turn Codex loose and hope. They built guardrails. They documented patterns. They created feedback loops. They ran multiple Codex sessions in parallel and coordinated the outputs.

Patrick Hum said they essentially ran four engineers like sixteen. Each engineer managed multiple Codex instances simultaneously, working on different features in parallel. And those sixteen “virtual engineers” were arguably more effective than sixteen humans would have been, because you didn’t have to align them all around a shared vision, they all read from the same documented architecture.

This is Brooks’s Law in reverse. Instead of adding people and multiplying coordination costs, they multiplied force per person without multiplying coordination costs.

What Actually Broke

Let’s talk about what didn’t work, because this is where you learn the real lessons.

Codex, left unguided, would drift on architecture. It would leak logic into the UI layer. It would solve immediate problems in ways that created long-term technical debt. Its instinct, as the team put it, is “to get something working, not to prioritize long-term cleanliness.”

Sound familiar? That’s exactly what junior engineers do. And it’s why you need senior engineers to review their work and maintain architectural discipline.

The solution was robust patterns, exemplar features, and constant review and not to stop using Codex. Same as you’d do with human junior engineers, except Codex works 24/7 and doesn’t get offended when you reject its PRs.

The bottleneck shifted. Instead of “how fast can we write code,” it became “how fast can we make decisions, give feedback, and integrate changes.” The constraint moved from execution to judgment.

This is the pattern I keep seeing. When you properly leverage AI, the bottleneck moves upstream. It moves from doing to deciding. From implementation to design. From execution to strategy.

The Real Cost of This Approach

Here’s what this model demands that traditional development doesn’t:

Clarity of thinking. You can’t hand Codex a vague idea and get something useful. You need to articulate exactly what you want and why. This is hard. Most people don’t actually know what they want until they see what they don’t want.

Architectural discipline. When humans write code slowly, architectural mistakes reveal themselves gradually. When AI generates code quickly, bad architecture creates massive technical debt almost immediately. You need stronger upfront design.

Constant review. You can’t just merge what Codex produces. You have to read it, understand it, and verify it does what you intended. The volume is higher, so this is actually harder than reviewing human code.

Systemic thinking. You’re not managing people who can pushback on bad ideas. You’re managing agents that will implement whatever you tell them to. If your system design is flawed, you’ll build the wrong thing very quickly.

Why This Isn’t Happening Everywhere

If four engineers can do this, why isn’t every company working this way?

First, organizational inertia. Most companies have processes built around traditional development. Code review workflows designed for human PRs. QA processes that assume slower change velocity. Management structures that equate headcount with capacity.

Second, skill mismatch. The engineers who succeed in this model aren’t necessarily the same engineers who succeed in traditional development. You need people who are good at architecture, good at articulation, good at review. People who can work at a higher abstraction level. That’s a different skill set than “good at implementing features.”

Third, risk tolerance. Most organizations won’t accept a model where AI generates 85% of their production code. The liability concerns alone would kill it in legal review. Never mind the cultural resistance from engineering teams who see this as threatening their expertise.

Fourth, and this is the one nobody talks about: it requires you to actually know what you’re building. When development is slow, you can figure it out as you go. When development is fast, unclear requirements become obvious immediately. A lot of organizations don’t actually have clear product vision, and fast development would expose that.

What I’m Doing With This

As for me, this isn’t theoretical. I already run two separate servers, each running multiple Claude Code sessions simultaneously. That’s the only way I can maintain the pace and complexity I need for my experiments.

It’s not about coding faster. It’s about testing ideas in parallel. I can spin up one session to explore approach A, another for approach B, a third to gather supporting data, and a fourth to analyze results. By the end of the day, I know which direction works. That used to take a week.

The workflow is genuinely different. I’m not writing code linearly anymore. I’m orchestrating multiple parallel exploration threads, synthesizing results, making decisions about which paths to pursue.

The Uncomfortable Question

If four engineers with AI can do what used to require sixteen engineers without it, what happens to the other twelve?

The optimistic answer: they work on different problems. The constraint shifts from “how much engineering capacity do we have” to “how many valuable problems have we identified.” If you have four engineers who can execute like sixteen, you should be finding more problems for them to solve, not firing twelve engineers.

The realistic answer: most organizations will try to do the same work with fewer people. That’s what always happens when productivity tools improve. Some of those displaced engineers will find work on new problems. Some won’t.

This is why I keep saying: the main question is how to use tools in ways that create value rather than just cutting costs. Companies that use AI purely for headcount reduction will find themselves competing against companies that use AI to expand what’s possible.

What Success Looks Like

The OpenAI example shows what’s possible when you commit fully to a new model rather than bolting AI onto an old one.

Four engineers. 28 days. A production app that hit number one in its category and maintained a 99.9% crash-free rate.

But notice what made it work: architectural discipline, clear documentation, constant review, and engineers who could work at a higher level of abstraction than traditional development requires.

The main idea is about fundamentally changing what engineering work looks like. Less time typing, more time thinking. Less time implementing, more time architecting. Less time debugging syntax, more time designing systems.

If your approach to AI feels comfortable and doesn’t create any internal resistance, it’s probably already outdated. If it feels too bold, maybe uncomfortable, you might be looking at the future.

Max Votek

Discussion about this post

Ready for more?