Monday, March 25, 2024

The problem with process

Note: This post might more accurately be titled "one problem with process", but I thought the singular had more impact, so there's a little literary license taken. Also, while this post is somewhat inspired by some of my work experiences, it does not reflect any particular person or company, but rather a hypothetical generalized amalgamation.

There's an adage within management, which states that process is a tool which makes results repeatable. The unpacking of that sentiment is that if you achieve success once, it might be a fluke, dependent on timing or the environment, dependent on specific people, etc., but if you have a process which works, you can repeat it mechanically, and achieve success repeatedly and predictably. This is the mental framework within which managers add process to every facet of business over time, hoping to "automate success".

Sometime it works, sometimes it doesn't. Often process is also used to automate against failure as well, by automating processes which avoid perceived and/or historical breakdown points. This, more often than not, is where there be landmines.

Imagine a hypothetical: you're a manager, grappling with a typical problem of quality and execution efficiency. You want to increase the former, without sacrificing the latter (and ideally, with increasing the latter as well). Quality problems, as you know, come from rushing things into production without enough checks and sign-offs; process can fill that gap easily. But you also know that with enough well-defined process, people become more interchangeable in their work product, and can seamlessly transition between projects, allowing you to optimally allocate resources (in man-months), and increase overall execution efficiency.

So you add process: standard workflows for processing bugs, fields in the tracking system for all the metrics you want to measure, a detailed workflow that captures every state of every work item that is being worked, a formalized review process for every change, sign-offs at multiple levels, etc. You ensure that there is enough information populated in your systems such that any person can take over any issue at any time, and you'll have full visibility into the state of your org's work at all times. Then you measure your metrics, but something is wrong: efficiency hasn't increased (which was expected, it will take time for people to adjust to the new workflows and input all the required data into the systems), but quality hasn't increased either. Clearly something is still amiss.

So you add more process: more stringent and comprehensive testing requirements, automated and manual, at least two developers and one manager reviewing every change which goes into the code repository, formalized test plans which must be submitted and attested to along with change requests, more fields to indicate responsible parties at each stage, more automated static analysis tools, etc. To ensure that the processes are followed, you demand accountability, tying sign-off for various stages to performance metrics for responsible employees. Then you sit back and watch, sure that this new process is sufficient to guarantee positive results.

And yet... still no measurable improvement in overall perceived product quality. Worse, morale is declining: many employees feel stifled by the new requirements (as they should; those employees were probably writing the bugs before), they are spending large amounts of time populating the process data, and it's taking longer to get fixes out. This, in turn, is affecting customers satisfaction; you try to assure them that the increased quality will compensate for the longer lead times, but privately your metrics do not actually support this either. The increased execution efficiency is still fleeting as well: all the data is there to move people between project seamlessly, but for some reason people still suffer a productivity hit when transitioned.

Clearly what you need is more training and expertise, so you hire a Scrum master, and contract for some Scrum training classes. Unsure where everyone's time is actually going, you insist that people document their work time down to 10 minute intervals, associating each block of time with the applicable ticket, so that time can be tracked and optimized in the metrics. You create tickets for everything: breaks, docs, context switches, the works. You tell your underling managers to scrutinize the time records, and find out where you are losing efficiency, and where you need more process. You scour the metrics, hoping that the next required field will be the one which identifies the elusive missing link between the process and the still lacking quality improvements.

This cycle continues, until something breaks: the people, the company, or the process. Usually it's one of the first two.

In the aftermath, someone asks what happened. Process, metrics, KPI's: these were the panaceas which were supposed to lead to the nirvana of efficient execution and high quality, but paradoxically, the more that were added, the more those goals seemed to suffer. Why?

Aside: If you know the answer, you're probably smarter than almost all managers in most large companies, as the above pattern is what I've seen (to some degree) everywhere. Below I'll give my take, but it is by no means "the answer", just an opinion.

The core problem with the above, imho, is that there is a misunderstanding of what leads to quality and efficiency. Quality, as it turns out, comes from good patterns and practices, not gating and process. Good patterns and practices can come from socializing that information (from people who have the knowledge), but more often than not come from practice, and learned lessons. The quantity of practice and learned lessons come from velocity, which is the missing link above.

Process is overhead: it slows velocity, and decreases your ability to improve. Some process can be good, but only when the value to the implementers exceeds the cost. This is the second major problem in the above hypothetical: adding process for value of the overseers is rarely if ever beneficial. If the people doing the work don't think the process has value to them, then it almost certainly has net negative value to the organization. Overseers are overhead; their value is only realized if they can increase the velocity of the people doing the work, and adding process rarely does this.

Velocity has another benefit too: it also increases perceived quality and efficiency. The former happens because all software has bugs, but customers perceive how many bugs escape to production, and how quickly they are fixed. By increasing velocity, you can achieve pattern improvement (aka: continuous improvement) in the code quality itself. This decreases the number of overall issues as a side-effect of the continuous improvement process (both in code, and in culture), with a net benefit which generally exceeds any level of gating, without any related overhead. If you have enough velocity, you can even also increase test coverage automation, for "free".

You're also creating en environment of learning and improvement, lower overhead, less restrictions, and more drive to build good products among your employees who build things. That tends to increase morale and retention, so when you have an issue, you are more likely to still have the requisite tribal knowledge to quickly address it. This is, of course, a facet of the well-documented problem with considering skill/knowledge workers in terms of interchangeable resource units.

Velocity is the missing link: being quick, with low overhead, and easily pivoting to what was important without trying to formalize and/or add process to everything. There was even a movement a while ago which captured at least some of the ideals fairly well, I thought: it was called Agile Development. It seems like a forgotten ideal in the environments of PKI's, metrics, and top-heavy process, but it's still around, at least in some corners of the professional world. If only it didn't virtually always get lost with "scale", formalization, and adding "required" process on top of it.

Anyway, all that is a bit of rambling, with which I hope to leave the reader with this: if you find yourself in a position where you have an issue with quality and/or efficiency, and you feel inclined to add more process to improve those outcomes, consider carefully if that will be the likely actual outcome (and as necessary, phone a friend). Your org might thank you eventually.

 

No comments: