Sunday, February 18, 2024

The Genius of FB's Motto

Why "Move Fast and Break Things" is insightful, and how many companies still don't get it

Note: I have never worked for FB/Meta (got an offer once, but ended up going to Amazon instead), so I don't have any specific insight. I'm sure there are books, interviews, etc., but the following is my take. I like to think I might have some indirect insight, since the mantra was purportedly based on observing what made startups successful, and I've had some experience with that. See: https://en.wikipedia.org/wiki/Meta_Platforms#History

If you look inside a lot of larger companies, you'll find a lot of process, a lot of meetings, substantial overhead with getting anything off the ground, and a general top-down organizational directive to "not break anything", and "do everything possible to make sure nothing has bugs". I think this stems from how typical management addresses problems in general: if something breaks, it's seen as a failure or deficiency in the process [of producing products and services], and it can and should be addressed by improving the "process". This philosophy leads to the above, but that's not the only factor. For example, over time critical people move on, and that can lead to systems which everyone is afraid to touch, for fear of "breaking something" (which, in the organizational directives, is the worst thing you can do). These factors create an environment of fear, where your protection is carefully following "the process", which is an individual's shield against blame when something goes wrong. After all, deficiencies in the process are not anyone's fault, and as long as the process is continually improved, the products will continue to get better and have less deficiencies over time. That aggregate way of thinking is really what leads to the state described.

I describe that not to be overly critical: for many people in those organizations, this is an unequivocal good thing. Managers love process: it's measurable, it has metrics and dashboards, you can do schedule-based product planning with regular releases, you can objectively measure success against KPR's, etc. It can also be good for IC's, especially those who aspire to have a steady and predictable job, where they follow and optimize their work product for the process (which is usually much harder than optimizing for actual product success in a market, for example). Executives love metrics and predictable schedules, managers love process, and it's far easier to hire and retain "line workers" than creatives, and especially passionate ones. As long as the theory holds (ie: that optimal process leads to optimal business results), this strategy is perceived as optimal for many larger organizations.

It's also, incidentally, why smaller companies can crush larger established companies in markets. The tech boom proved this out, and some people noticed. Hence, Facebook's so-called hacker mentality was enshrined.

"Move fast" is generally more straightforward for people to grasp: the idea is to bias to action, rather than talking about something, particularly when the cost of trying and failing is low (this is related to the "fail fast" mantra). For software development, this tends to mean there's significantly less value in doing a complex design than a prototype: the former takes a lot of work and can diverge significantly from the finished product, while the latter provides real knowledge and lessons, with less overall inefficiency. "Most fast" also encapsulates the idea that you want engineers to be empowered to fix things directly, and not go through layers of approvals and process (eg: Jira) to get to a better incremental product state sooner. Most companies have some corporate value which aligns with this concept.

"Break things" is more controversial; here's my take. This is a direct rebuke of the "put process and gating in place to prevent bugs" philosophy, which otherwise negates the ability to "move fast". Moreover, though, this is also an open invitation to risk product instability in the name of general improvement. It is an acknowledgement that development velocity is fundamentally more valuable to an organization than the pursuit of "perfection". It is also an acknowledgement of the fundamental business risk of having product infrastructure which nobody is willing to touch (for fear of breaking it), and "cover" to try to make it better, even at the expense of stability. It is the knowing acceptance that to create something better, it can be necessary to rebuild that thing, and in the process new bugs might be introduced, and that's okay.

It's genius to put that in writing, even though it might be obvious in terms of the end goal: it's basically an insight and acknowledgement that developer velocity wins, and then a codification of the principles which are fundamentally necessary to optimize for developer velocity. It's hard to understate how valuable that insight was and continues to be in the industry.

Why the mantra evolved to add "with stable infrastructure"

I think this evolution makes sense, as an acknowledgement of a few additional things in particular, which are both very relevant to a larger company (ie: one which has grown past the "build to survive" phase, and into the "also maintain your products" phase):

  • You need your products to continue to function in the market, at least in terms of "core" functionality
  • You need your internal platforms to function, otherwise you cannot maintain internal velocity
  • You want stable foundations upon which to build on, to sustain (or continue to increase) velocity as you expand product scope

I think the first two are obvious, so let me just focus on the third point, as it pertains to development. Scaling development resources linearly with code size doesn't work well, because there is overhead in product maintenance, and inter-people communications. Generally you want to raise the level of abstraction involved in producing and maintaining functionality, such that you can "do more with less", However, this is not generally possible unless you have reliable "infrastructure" (at the code level) which you can build on top of, with high confidence that the resulting product code will be robust (at least in so far as the reliance on the infrastructure). This, fundamentally, allows scaling the development resources linearly with product functionality (not code size), which is a much more attainable goal.

Most successful companies get to this point in their evolution (ie: where they would otherwise get diminishing returns from internal resource scaling based on overhead). The smart ones recognize the problem, and shift to building stable infrastructure as a priority (while still moving fast and breaking things, generally), so as to be able to continue to scale product value efficiently. The ones with less insightful leadership end up churning with rewrites and/or lack of code reusability, scramble to fix compounding bugs, struggle with code duplication and legacy tech debt, etc. This is something which continues to be a challenge to even many otherwise good companies, and the genius of FB/Meta (imho) is recognizing this and trying to enshrine the right approach into their culture.

That's my take, anyway, fwiw.

No comments: