Exploration and exploitation in technical standards

In engineering organizations, we live in constant tension between Exploration and Exploitation.

If you’re not familiar with Explore/Exploit algorithms, let me give a restaurant analogy.

Exploration is trying a new restaurant.
Exploitation is going to your favorite restaurant.

In all aspects of our life, we’re balancing two impulses: to Explore new things (and discover something wonderful), and Exploitation: to take advantage of the things we’ve discovered.

Successful engineering organizations:

are disciplined about both Exploration and Exploitation.
engage in a low level of constant Exploration, but are ruthless about cutting off experiments that aren’t successful.
bias towards standardization, because introducing something new immediately introduces debt into your whole tech stack. To be exploited appropriately, it has to be so valuable it’s worth going through and updating everything to take advantage of what you’ve discovered.

Standards reduce organizational complexity

A standard means you require a conversation to do things in a non-standard way.

Many engineers resist standards. They view them as limits on their freedom. Standards make it harder to do things in non-standard ways.

Many startups put off standards because they are fighting to survive and these seem like costly distractions.

But there can be a huge payoff to establishing lightweight standards in your engineering organization:

Fewer patterns in your code.
Less tech debt to navigate.
More deliberate conversations about tradeoffs with new approaches.
A lower cognitive load required to understand your codebase and make changes to it.
Lower onboarding requiremenst for new team members.
Faster velocity in releasing new features, due to less time spent navigating complexity.
Teams that can maintain and operate their code.
More internal mobility within the company, as engineers aren’t stuck with their proprietary systems that nobody else can understand.

The same people who resist standards are also the people who complain about the results when you lack standards: a messy codebase that is hard to reason about.

Startups should put in place minimal standards and roles around technical decision-making. Standards can ensure you have high-quality conversations about new technologies or patterns.

You avoid problems down the road by establishing patterns around which conversations to have when you want to do something new.

System-level benefits

From a systems perspective, standards help reduce overall organization complexity, along many different axes. Instead of N different ways of handling state within codebases, you’ll have 1 or 2. Instead of 8 different programming languages in use, you might have 2 or 3.

As Coda Hale points out in his marvellous article on organizational complexity, organizations can achieve great benefits if they invest in internal tooling and “force multipliers”. You make that type of work easier if there are less targets they have to work against.

Standards can also help distributed teams benefit from the overall learnings from local groups. If one team finds out that a particular coding pattern leads to disasterous results, standards (and linting against those standards, where possible) can help an entire organization not relearn everything over and over again.

Implementing standards

I generally like to see standards be a process that everyone can participate in. An RFC process, where you have a central place for standards, and people can propose new standards but there is a period of public comment, can be a good way to approach it. If you have a Chief Architect, or Technical Leadership Group, you can sometimes set up approval for standards through that group.

An anti-pattern for standards

Standards can dictate a lot of work, so beware of the danger of unfunded mandates. You’ll see groups fall into this trap in two ways:

Not take on the burden of fixing things but ask teams to update to the new standard whenever they touch something. This just spreads the cost over time.
Require a new pattern that requires a lot of work.

If you have an architecture group or technical leadership group, it’s good to have product management or engineering management involved, so they can remind the group that they can advocate for projects, but can’t will them into being.

The intention is to reduce complexity over time, both cognitive complexity (choosing between options) and code complexity (a proliferation of approaches in the codebase). Look for opportunities to automatically enforce new standards: code linting, observability tooling, and tests.

Examples of standards

New Relic’s approach to standards

Credits to the book Algorithms to Live By for a inspiring some of thinking on exploration and exploitation.

Image by Julius Silver from Pixabay