Monday, March 25, 2024

The problem with process

Note: This post might more accurately be titled "one problem with process", but I thought the singular had more impact, so there's a little literary license taken. Also, while this post is somewhat inspired by some of my work experiences, it does not reflect any particular person or company, but rather a hypothetical generalized amalgamation.

There's an adage within management, which states that process is a tool which makes results repeatable. The unpacking of that sentiment is that if you achieve success once, it might be a fluke, dependent on timing or the environment, dependent on specific people, etc., but if you have a process which works, you can repeat it mechanically, and achieve success repeatedly and predictably. This is the mental framework within which managers add process to every facet of business over time, hoping to "automate success".

Sometime it works, sometimes it doesn't. Often process is also used to automate against failure as well, by automating processes which avoid perceived and/or historical breakdown points. This, more often than not, is where there be landmines.

Imagine a hypothetical: you're a manager, grappling with a typical problem of quality and execution efficiency. You want to increase the former, without sacrificing the latter (and ideally, with increasing the latter as well). Quality problems, as you know, come from rushing things into production without enough checks and sign-offs; process can fill that gap easily. But you also know that with enough well-defined process, people become more interchangeable in their work product, and can seamlessly transition between projects, allowing you to optimally allocate resources (in man-months), and increase overall execution efficiency.

So you add process: standard workflows for processing bugs, fields in the tracking system for all the metrics you want to measure, a detailed workflow that captures every state of every work item that is being worked, a formalized review process for every change, sign-offs at multiple levels, etc. You ensure that there is enough information populated in your systems such that any person can take over any issue at any time, and you'll have full visibility into the state of your org's work at all times. Then you measure your metrics, but something is wrong: efficiency hasn't increased (which was expected, it will take time for people to adjust to the new workflows and input all the required data into the systems), but quality hasn't increased either. Clearly something is still amiss.

So you add more process: more stringent and comprehensive testing requirements, automated and manual, at least two developers and one manager reviewing every change which goes into the code repository, formalized test plans which must be submitted and attested to along with change requests, more fields to indicate responsible parties at each stage, more automated static analysis tools, etc. To ensure that the processes are followed, you demand accountability, tying sign-off for various stages to performance metrics for responsible employees. Then you sit back and watch, sure that this new process is sufficient to guarantee positive results.

And yet... still no measurable improvement in overall perceived product quality. Worse, morale is declining: many employees feel stifled by the new requirements (as they should; those employees were probably writing the bugs before), they are spending large amounts of time populating the process data, and it's taking longer to get fixes out. This, in turn, is affecting customers satisfaction; you try to assure them that the increased quality will compensate for the longer lead times, but privately your metrics do not actually support this either. The increased execution efficiency is still fleeting as well: all the data is there to move people between project seamlessly, but for some reason people still suffer a productivity hit when transitioned.

Clearly what you need is more training and expertise, so you hire a Scrum master, and contract for some Scrum training classes. Unsure where everyone's time is actually going, you insist that people document their work time down to 10 minute intervals, associating each block of time with the applicable ticket, so that time can be tracked and optimized in the metrics. You create tickets for everything: breaks, docs, context switches, the works. You tell your underling managers to scrutinize the time records, and find out where you are losing efficiency, and where you need more process. You scour the metrics, hoping that the next required field will be the one which identifies the elusive missing link between the process and the still lacking quality improvements.

This cycle continues, until something breaks: the people, the company, or the process. Usually it's one of the first two.

In the aftermath, someone asks what happened. Process, metrics, KPI's: these were the panaceas which were supposed to lead to the nirvana of efficient execution and high quality, but paradoxically, the more that were added, the more those goals seemed to suffer. Why?

Aside: If you know the answer, you're probably smarter than almost all managers in most large companies, as the above pattern is what I've seen (to some degree) everywhere. Below I'll give my take, but it is by no means "the answer", just an opinion.

The core problem with the above, imho, is that there is a misunderstanding of what leads to quality and efficiency. Quality, as it turns out, comes from good patterns and practices, not gating and process. Good patterns and practices can come from socializing that information (from people who have the knowledge), but more often than not come from practice, and learned lessons. The quantity of practice and learned lessons come from velocity, which is the missing link above.

Process is overhead: it slows velocity, and decreases your ability to improve. Some process can be good, but only when the value to the implementers exceeds the cost. This is the second major problem in the above hypothetical: adding process for value of the overseers is rarely if ever beneficial. If the people doing the work don't think the process has value to them, then it almost certainly has net negative value to the organization. Overseers are overhead; their value is only realized if they can increase the velocity of the people doing the work, and adding process rarely does this.

Velocity has another benefit too: it also increases perceived quality and efficiency. The former happens because all software has bugs, but customers perceive how many bugs escape to production, and how quickly they are fixed. By increasing velocity, you can achieve pattern improvement (aka: continuous improvement) in the code quality itself. This decreases the number of overall issues as a side-effect of the continuous improvement process (both in code, and in culture), with a net benefit which generally exceeds any level of gating, without any related overhead. If you have enough velocity, you can even also increase test coverage automation, for "free".

You're also creating en environment of learning and improvement, lower overhead, less restrictions, and more drive to build good products among your employees who build things. That tends to increase morale and retention, so when you have an issue, you are more likely to still have the requisite tribal knowledge to quickly address it. This is, of course, a facet of the well-documented problem with considering skill/knowledge workers in terms of interchangeable resource units.

Velocity is the missing link: being quick, with low overhead, and easily pivoting to what was important without trying to formalize and/or add process to everything. There was even a movement a while ago which captured at least some of the ideals fairly well, I thought: it was called Agile Development. It seems like a forgotten ideal in the environments of PKI's, metrics, and top-heavy process, but it's still around, at least in some corners of the professional world. If only it didn't virtually always get lost with "scale", formalization, and adding "required" process on top of it.

Anyway, all that is a bit of rambling, with which I hope to leave the reader with this: if you find yourself in a position where you have an issue with quality and/or efficiency, and you feel inclined to add more process to improve those outcomes, consider carefully if that will be the likely actual outcome (and as necessary, phone a friend). Your org might thank you eventually.

 

Sunday, March 17, 2024

Some thoughts on budget product development, outsourcing

I've been thinking a bit about the pros and cons of budget/outsourcing product development in general. By this, I mean two things, broadly: either literally outsourcing to another org/group, or conducting development in regions where labor is cheaper than where your main development would be conducted (the latter being, presumably, where your main talent and expertise resides). These are largely equivalent in my mind and experience, so I'm lumping them together for purposes of this topic.

The discussion has been top-of-mind recently, for a few reasons. One of the main "headline" reasons is all the issues that Boeing is having with their airplanes; Last Week Tonight had a good episode about how aggressive cost-cutting efforts have led to the current situation there, where inevitable quality control issues are hurting the company now (see: https://www.youtube.com/watch?v=Q8oCilY4szc). The other side of this same coin, which is perhaps more pertinent to me professionally, is the proliferation of LLM's to generate code (aka: "AI agents"), which many people think will displace traditional more highly-compensated human software developers. I don't know how much of a disruption to the industry this will eventually be, but I do have some thoughts on the trade-offs of employing cheaper labor to an organization's product development.

Generally, companies can "outsource" any aspect of product development, and this has been an accessible practice for some time. This is very common in various industries, especially for so-called "commoditized" components; for example, the automobile industry has an entire sub-industry for producing all the various components which are assembled into automobiles, and usually acquired from the cheapest vendors. This is generally possible for any components which are not bespoke, across any industry with components which are standardized, and can be assembled into larger products.

Note that this is broadly true in the software context as well: vendors sell libraries with functionality, open source libraries are commonly aggregated into products, and component re-use is fairly common in many aspects of development. This can even be a best-practice in many cases, if the component library is considered near the highest quality and most robust implementation of functionality (see: the standard library in C++, for example). Using a robust library which is well-tested across various usage instances can be a very good strategy.

Unfortunately, this is less true in the hardware component industries, since high-quality hardware typically costs more (in materials and production costs), so it's generally less feasible to use the highest quality components from a cost perspective. There is a parallel in first-party product development, where your expected highest quality components will usually cost more (due to the higher costs for the people who produce the highest quality components). Thus, most businesses make trade-offs between quality and costs, and where quality is not a priority, tend to outsource.

The danger arises when companies start to lose track of this trade-off, and/or misunderstand the trade-offs they are making, and/or sacrifice longer-term product viability for short-term gains. Each of these can be problematic for a company, and each are inherent dangers in outsourcing parts of development. I'll expand on each.

Losing track of the trade-offs is when management is aware of the trade-offs when starting to outsource, but over time these become lost in the details and constant pressure to improve profit margins, etc. For example, a company might outsource a quick prototype, then be under market pressure to keep iterating on it, while losing track of (and not accounting for) the inherent tech debt associated with the lower quality component. This can also happen when the people tracking products and components leave, and new people are hired without knowledge of the previous trade-offs. This is dangerous, but generally manageable.

Worse that the above is when management doesn't understand the trade-offs they are making. Of course, this is obviously indicative of poor and incompetent management, yet time and time again companies outsource components without properly accounting for the higher long-term costs of maintaining and enhancing those components, and companies suffer as a result. Boeing falls into this category: by all accounts their management thought they could save costs and increase profits by outsourcing component production, without accounting for the increased costs of integration and QA (which would normally imply higher overall costs for any shipping and/or supported product). That's almost always just egregious incompetence on the part of the company's management, of course.

The last point is also on display at Boeing: sacrificing long-term viability for short-term gains. While it's unlikely this was the motivation in Boeing's case, it's certainly a common MO with private equity company ownership (for example) to squeeze out as much money as possible in the short term, while leaving the next owners "holding the bag" for tech debt and such from those actions. Again, this is not inherently bad, not every company does this, etc.; this is just one way companies can get into trouble, by using cheaper labor for their product development.

This bring me, in a roundabout way, to the topic of using LLM's to generate code, and "outsource" software product development to these agents. I think, in the short term, this will pose a substantial risk to the industry in general: just as executives in large companies fell in love with offshoring software development in the early 2000's, I think many of the same executives will look to reduce costs by outsourcing their expensive software development to LLM's as well. This will inevitably have the same outcomes over the long run: companies which do this, and do not properly account for the costs and trade-offs (as per above), will suffer, and some may fail as a results (it's unlikely blame will be properly assigned in these cases, but when companies fail, it's almost always due to bad executive management decisions).

That said, there's certainly also a place for LLM code generation in a workflow. Generally, any task which you would trust to an intern, for example, could probably be completed by a LLM, and get the same quality of results. There are some advantages to using interns (eg: training someone who might get better, lateral thinking, the ability to ask clarifying questions, etc.), but LLM's may be more cost effective. However, if companies largely stop doing on-the-job training at scale, this could pose some challenges for the industry longer-term, and ultimately drive costs higher. Keep in mind: generally, LLM's are only as "good" as the sum total of average information online (aka: the training data), and this will also decline over time as LLM output pollutes the training data set as well.

One could argue that outsourcing is almost always bad (in the above context), but I don't think that's accurate. In particular, outsourcing, and the pursuit of short-term profits over quality, does serve at least two valuable purposes in the broader industry: it helps new companies get to market with prototypes quickly (even if these ultimately need to be replaced with quality alternatives), and it helps older top-heavy companies die out, so they can be replaced by newer companies with better products, as their fundamentally stupid executives make dumb decisions in the name of chasing profit margins (falling into one of more of the traps detailed above). These are both necessary market factors, which help industries evolve and improve over time.

So the next some some executive talks about outsourcing some aspect of product development, either to somewhere with cheaper labor or to a LLM (for example), you can take some solace in the fact that they are probably helping contribute to the corporate circle of life (through self-inflicted harm), and for each stupid executive making stupid decisions, there's probably another entrepreneur at a smaller company who better understands the trade-offs of cheaper labor, is looking to make the larger company obsolete, and will be looking for quality product development. I don't think that overall need is going to vanish any time soon, even if various players shuffle around.

My 2c, anyway.