Files
research.ai-dev-principles/docs/examples/02-vertical-spike-before-framework.md
Christian Nennemann ad2a2e07c2 docs: add case studies for principles #2, #13, and #30
- #2 Vertical Spike Before Framework: protobuf, Rails, Meta LLaMA
- #13 Autonomous but Auditable: Constitutional AI, GPT-4 red teaming, EU AI Act
- #30 Self-Monitoring Guardian Pattern: Netflix chaos eng, Google Borg, DeepMind AlphaFold
2026-04-12 14:18:04 +00:00

5.5 KiB

Principle #2: Vertical Spike Before Framework — Case Studies

Validate architecture through working code, not docs.


Case Study 1: Google's Protocol Buffers — Spike to Standard (2001-2008)

Protocol Buffers (protobuf) were not designed as a universal serialization framework. They started as an internal tool at Google around 2001, built to solve a specific problem: efficient serialization for Google's internal RPC system. The team built a working spike for their own use case (search infrastructure), iterated on it through real production traffic, and only extracted it as a general-purpose framework years later. Google open-sourced protobuf in 2008 — seven years after the initial spike. By that time, protobuf had been battle-tested across virtually every Google service.

When followed: Protobuf's design reflects years of real-world usage patterns (backward compatibility, schema evolution, compact wire format). These features weren't theorized — they were discovered through production incidents and evolving requirements. The framework was extracted from proven code, not designed in advance.

When violated: Apache Thrift, Facebook's equivalent, was open-sourced in 2007 with a broader scope (multiple languages, multiple serialization formats, a full RPC framework). The wider scope led to a more complex system with less cohesive design. Thrift works, but its adoption never matched protobuf's — partly because it tried to be a framework from day one instead of growing organically from a spike.

Case Study 2: Ruby on Rails — Extracted from Basecamp (2004)

David Heinemeier Hansson (DHH) built Basecamp (a project management tool) first, then extracted Ruby on Rails from the working application. Rails was not designed as a web framework and then used to build Basecamp — it was the other way around. The framework emerged from patterns that proved useful in a real product. DHH has repeatedly stated this was the key design decision: "Frameworks are extracted, not designed."

When followed: Rails' conventions (convention over configuration, RESTful routing, ActiveRecord) reflect patterns that actually worked in Basecamp's codebase. This gave Rails an opinionated but coherent design that developers could learn quickly. By 2006, Rails had become one of the most popular web frameworks — because its design decisions were validated by production use.

When violated: Java's Enterprise JavaBeans (EJB) specification was designed as a framework by committee before widespread production use. EJB 1.0 and 2.0 were notoriously complex, requiring dozens of boilerplate files for simple operations. The spec was driven by anticipated needs rather than observed patterns. It took until EJB 3.0 (2006) — after years of community backlash and competition from Spring — to simplify the framework to something practical.

Case Study 3: Meta's LLaMA Release Strategy (2023-2024)

Meta's approach to open-source LLM release followed a spike-before-framework pattern. LLaMA 1 (Feb 2023) was released as a research artifact — a set of model weights with minimal tooling, no API, and a restrictive license. It was a spike to test the hypothesis that open-weight models could compete with proprietary ones. The community's response (massive adoption, fine-tuning ecosystem) validated the approach. Only then did Meta invest in LLaMA 2 (July 2023) with a proper commercial license, safety fine-tuning (RLHF), and enterprise-ready tooling. LLaMA 3 (2024) further expanded with multimodal capabilities and a full framework (torchtune, llama-stack).

When followed: Each LLaMA release incorporated lessons from the previous one's real-world usage. LLaMA 2's safety training was informed by LLaMA 1's observed misuse patterns. LLaMA 3's tooling reflected the community's actual needs (quantization, fine-tuning, deployment) rather than guesses about what developers might want.

When violated: Google's initial Bard launch (March 2023) attempted to release a full product (chat interface, integration with Google services, consumer-facing UX) without first validating the underlying model's capabilities through a research spike. The result was a product that made factual errors in its launch demo and spent months playing catch-up on reliability.


Industry Cross-References

Pattern Who Uses It Reference
Spike and stabilize Extreme Programming (XP), Lean Startup "Build the simplest thing that could work, then iterate"
Extracted frameworks Rails (from Basecamp), Django (from Lawrence Journal-World), Flask (from an April Fools' joke) Most successful frameworks were extracted, not designed
Walking skeleton Alistair Cockburn (2004) "A tiny implementation of the system that performs a small end-to-end function"
Steel thread ThoughtWorks End-to-end implementation of one feature through all layers
Tracer bullet development The Pragmatic Programmer (Hunt & Thomas, 1999) "A single tracer through all layers of the system"

Key Insight

The pattern is remarkably consistent: successful frameworks and platforms start as solutions to specific, concrete problems — and only become generalized after proving themselves in production. Teams that start with the framework (designing for flexibility, anticipating needs, building abstractions) consistently produce systems that are either over-engineered for the actual use case or miss the actual use case entirely. The principle isn't anti-planning — it's anti-speculation. Build something real first, then extract the patterns.