Moving from a startup codebase to a team codebase

Why do startup codebases look terrible? #

Every startup codebase that I work on starts out as a thing of beauty. The codebase comes together using boilerplate that I've refined over the years. After the first few hours the beauty fades. I suspect that startup-codebase beauty and product-market fit are inversely proportional.

Generally speaking, code quality matters. But, for a fledgling tech-company customer delight matters more. Of the software products that fail it's rare that the quality of code is to blame. That said, staying motivated while hacking on bad code is hard. You need to adopt an attitude of: "there's no point doing something well that doesn't need doing at all."

Products that achieve product-market fit have gone through many build-measure-learn cycles. When your product achieves product-market fit, life becomes harder still. Hardships compound because you have customers hungry for features and bug fixes. Your servers burn through money because some subroutines are inefficient. Diagnosing logic issues after several iterations and/or pivots becomes fiendishly complex. Fixing them in a way that doesn't introduce new problems is difficult. "Why," you'll ask, "did I not build using beautiful patterns, and refactor each step of the way as Fowler wanted me to!?"

Onboarding new-hires is hard #

The pain of a startup codebase intensifies when you hire people to help keep up with customer needs. It takes energy to find excellent engineers. When you do find one, you pray they don't run off to one of the FAANG as soon as they see the litany of TODO's. Hopefully they'll look past your embarrassment. They'll help write the hacks and patches needed to get things shipped. They'll help make customers happier; the codebase will become worse.

Unless you break something catastrophically you'll eventually need to hire more engineers. When you have three engineers working on the same codebase the wheels start to come off. I'm not sure why... three's a crowd maybe? At that point you start to wish you'd kept ADR's ... at least that way there would be half-sensible answers to why you had to use PUT methods to create objects using your REST API...

Why is everyone talking about a rewrite? #

Inevitably engineers will want to refactor, they've read Fowler too. The problem is that testing has likely been an afterthought. Perhaps you wrote some unit tests at the start but they became stale after your first pivot? Now, refactoring is dangerous. Each time something changes there's a fair chance something else will break. Without tests you won't always notice what's broken until it's shipped. At best your production logs will alert you to what's broken, at worst it'll be a customer. You'll long for the days when deployments didn't give you an adrenaline kick.

After a time you'll start to hear whispers of "rewrite". The whispers will usually come from the less experienced, but opinionated engineers. There's a part of you that wants to rewrite too... start again with that beautiful, trusted, elegant boilerplate. But you know that the cost of a rewrite is too high. It'll ruin your growth, kill investor confidence, and probably destroy your startup.

Under the gun #

Over time the code gets complicated. When an engineer cries because of that complexity it's time to move to a team codebase. Try to get to this stage before anyone else weeps in frustration. You must start this process before there's a revolt.

"In a startup codebase," you explain, "you need to understand the whole codebase to contribute to any part." Heads nod. "In a team codebase, anyone can contribute to any part without understanding the whole. We need to move to this. It's going to be hard."

Hopefully you've hired an SDET. If not, you'll need to get one recruited fast. Your SDET must spend time interrogating the team to understand the critical paths. You need integration tests for each critical path. You need integration tests or a careless refactor could kill your company.

Why are integration tests so flaky? #

The SDET's job is a tough one. Every third party dependency can cause an unexpected problem. "Why is this build breaking - it's got nothing to do with what I've worked on!?" You'll see this on Slack more often than you'd like. The SDET will feel frustrated as they work around the daily anomalies.

After a time the flakiness will reduce; the team will have confidence in the critical path tests. It's time to start refactoring.

Refactor Fridays #

The team can't spend all their time refactoring. The business needs to keep moving. 20% is what you can convince the other co-founders is a sensible amount of time. It's a high cost, but "nowhere near the cost of development stalling in 3 months," you say.

Some things will break with each round of refactoring. You contain the risk by releasing refactors on your lowest traffic days. If it's a weekend you'll be picking up the pieces; you need your team well-rested.

What does this code do? #

As the refactoring gets underway people will ask questions. So many questions. Code written months or years ago - "what does it do?" You have no idea, "take it out and see what happens." It's okay, you've got those integration tests... and the weekend wasn't going to be much fun anyway. It'll be more fun when the outage-adrenaline kicks in 🥳.

Team codebase ahoy #

After a time, there will be fewer questions about history. People will be working on code reuse, not breaking apart giant if/else blocks. You'll notice non-critical-path integration tests sneaking in. Questions will start to be about performance, and architecture. Questions you enjoy pondering and planning out with the people you work alongside. People you enjoy building the future with, using a codebase they are proud to contribute to.