Friday, April 5, 2024

The problem of "thrashing"

"Thrashing" is a general term/issue in computer science, which refers to the situation (in the abstract) in which multiple "work items" are competing for the same set of resources, and each work item is being processed in chunks (ie: either in parallel, or interleaved), and as a result the resource access ping-pongs between the different work items. This can be very inefficient for the system if switching access to the resources causes overhead. Here's the wiki page on the topic: https://en.wikipedia.org/wiki/Thrashing_(computer_science)

There are numerous examples of thrashing issues in software development, such as virtual memory page faults, cache access, etc. There is also thread context thrashing, where when you have too many threads competing for CPU time, the overhead of just doing thread context switching (which is generally only a few hundred CPU cycles) can still overwhelm the system. When thrashing occurs, it is generally observed as a non-linear increase in latency/processing time, relative to the work input (ie: the latency graph "hockey sticks"). At that point, the system is in a particularly bad state (and, ironically, a very common critical problem in orgs is that additional diagnostic processes get triggered to run in that state, based on performance metrics, which can then cause systems to fail entirely).

To reduce thrashing, you generally want to try to do a few things:

  • Reduce the amount of pending parallel/interleaved work items on the system
  • Allocate work items with more locality if possible (to prevent thrashing relative to one processing unit, for example)
  • Try to allow more discrete work items to complete (eg: running them longer without switching), to reduce the context switching overhead

Now, while all of the above is well-known in the industry, I'd like to suggest something related, but which is perhaps not as well appreciated: the same problems can and do occur with respect to people within an organization and process.

People, as it turns out, are also susceptible to some amount of overhead when working on multiple things, and task switching between them. Moreover, unlike computers, there is also some overhead for work items which are "in flight" for people (where they need to consider and/or refresh those items just to maintain the status quo). The more tasks someone is working on, and the more long-lived work items are in flight at any given time, the move overhead exists for that person to manage those items.

In "simple" jobs, this is kept minimal on purpose: a rote worker might have a single assigned task, or a checklist, so they can focus on making optimal progress on the singular task, with the minimal amount of overhead. In more complex organizations, there are usually efforts to compartmentalize and specialize work, such that individual people do not need to balance more than an "acceptable" number of tasks and responsibilities, and to minimize thrashing. However, notably, there are some anti-patterns, specific to development, which can exacerbate this issue.

Some notable examples of things which can contribute to "thrashing", from a dev perspective:

  • Initiatives which take a long time to complete, especially where other things are happening in parallel
  • Excessive process around code changes, where the code change can "linger" in the process for a while
  • Long-lived branches, where code changes needs to be updated and refreshed over time
  • Slow pull-request approval times (since each outstanding pull-request is another in-progress work item, which requires overhead for context switching)
  • Excessive "background" organizational tasks (eg: email management, corporate overhead, Slack threads, managing-up tasks, reporting overhead, side-initiatives, etc.)

Note, also, that there is a human cost to thrashing as well, as people want to both be productive and see their work have positive impacts, and thrashing hurts both of these. As a manager, you should be tracking the amount of overhead and "thrashing" that your reports are experiencing, and doing what you can to minimize this. As a developer, you should be wary of processes (and potentially organizations) where there are systems in place (or proposed) which contribute to the amount of thrashing which is likely to happen while working on tasks, because this has a non-trivial cost, and the potential to "hockey stick" the graph of time wasted dealing with overhead.

In short: thrashing is bad, and it's not just an issue which affects computer systems. Not paying attention to this within an org can have very bad consequences.


Monday, March 25, 2024

The problem with process

Note: This post might more accurately be titled "one problem with process", but I thought the singular had more impact, so there's a little literary license taken. Also, while this post is somewhat inspired by some of my work experiences, it does not reflect any particular person or company, but rather a hypothetical generalized amalgamation.

There's an adage within management, which states that process is a tool which makes results repeatable. The unpacking of that sentiment is that if you achieve success once, it might be a fluke, dependent on timing or the environment, dependent on specific people, etc., but if you have a process which works, you can repeat it mechanically, and achieve success repeatedly and predictably. This is the mental framework within which managers add process to every facet of business over time, hoping to "automate success".

Sometime it works, sometimes it doesn't. Often process is also used to automate against failure as well, by automating processes which avoid perceived and/or historical breakdown points. This, more often than not, is where there be landmines.

Imagine a hypothetical: you're a manager, grappling with a typical problem of quality and execution efficiency. You want to increase the former, without sacrificing the latter (and ideally, with increasing the latter as well). Quality problems, as you know, come from rushing things into production without enough checks and sign-offs; process can fill that gap easily. But you also know that with enough well-defined process, people become more interchangeable in their work product, and can seamlessly transition between projects, allowing you to optimally allocate resources (in man-months), and increase overall execution efficiency.

So you add process: standard workflows for processing bugs, fields in the tracking system for all the metrics you want to measure, a detailed workflow that captures every state of every work item that is being worked, a formalized review process for every change, sign-offs at multiple levels, etc. You ensure that there is enough information populated in your systems such that any person can take over any issue at any time, and you'll have full visibility into the state of your org's work at all times. Then you measure your metrics, but something is wrong: efficiency hasn't increased (which was expected, it will take time for people to adjust to the new workflows and input all the required data into the systems), but quality hasn't increased either. Clearly something is still amiss.

So you add more process: more stringent and comprehensive testing requirements, automated and manual, at least two developers and one manager reviewing every change which goes into the code repository, formalized test plans which must be submitted and attested to along with change requests, more fields to indicate responsible parties at each stage, more automated static analysis tools, etc. To ensure that the processes are followed, you demand accountability, tying sign-off for various stages to performance metrics for responsible employees. Then you sit back and watch, sure that this new process is sufficient to guarantee positive results.

And yet... still no measurable improvement in overall perceived product quality. Worse, morale is declining: many employees feel stifled by the new requirements (as they should; those employees were probably writing the bugs before), they are spending large amounts of time populating the process data, and it's taking longer to get fixes out. This, in turn, is affecting customers satisfaction; you try to assure them that the increased quality will compensate for the longer lead times, but privately your metrics do not actually support this either. The increased execution efficiency is still fleeting as well: all the data is there to move people between project seamlessly, but for some reason people still suffer a productivity hit when transitioned.

Clearly what you need is more training and expertise, so you hire a Scrum master, and contract for some Scrum training classes. Unsure where everyone's time is actually going, you insist that people document their work time down to 10 minute intervals, associating each block of time with the applicable ticket, so that time can be tracked and optimized in the metrics. You create tickets for everything: breaks, docs, context switches, the works. You tell your underling managers to scrutinize the time records, and find out where you are losing efficiency, and where you need more process. You scour the metrics, hoping that the next required field will be the one which identifies the elusive missing link between the process and the still lacking quality improvements.

This cycle continues, until something breaks: the people, the company, or the process. Usually it's one of the first two.

In the aftermath, someone asks what happened. Process, metrics, KPI's: these were the panaceas which were supposed to lead to the nirvana of efficient execution and high quality, but paradoxically, the more that were added, the more those goals seemed to suffer. Why?

Aside: If you know the answer, you're probably smarter than almost all managers in most large companies, as the above pattern is what I've seen (to some degree) everywhere. Below I'll give my take, but it is by no means "the answer", just an opinion.

The core problem with the above, imho, is that there is a misunderstanding of what leads to quality and efficiency. Quality, as it turns out, comes from good patterns and practices, not gating and process. Good patterns and practices can come from socializing that information (from people who have the knowledge), but more often than not come from practice, and learned lessons. The quantity of practice and learned lessons come from velocity, which is the missing link above.

Process is overhead: it slows velocity, and decreases your ability to improve. Some process can be good, but only when the value to the implementers exceeds the cost. This is the second major problem in the above hypothetical: adding process for value of the overseers is rarely if ever beneficial. If the people doing the work don't think the process has value to them, then it almost certainly has net negative value to the organization. Overseers are overhead; their value is only realized if they can increase the velocity of the people doing the work, and adding process rarely does this.

Velocity has another benefit too: it also increases perceived quality and efficiency. The former happens because all software has bugs, but customers perceive how many bugs escape to production, and how quickly they are fixed. By increasing velocity, you can achieve pattern improvement (aka: continuous improvement) in the code quality itself. This decreases the number of overall issues as a side-effect of the continuous improvement process (both in code, and in culture), with a net benefit which generally exceeds any level of gating, without any related overhead. If you have enough velocity, you can even also increase test coverage automation, for "free".

You're also creating en environment of learning and improvement, lower overhead, less restrictions, and more drive to build good products among your employees who build things. That tends to increase morale and retention, so when you have an issue, you are more likely to still have the requisite tribal knowledge to quickly address it. This is, of course, a facet of the well-documented problem with considering skill/knowledge workers in terms of interchangeable resource units.

Velocity is the missing link: being quick, with low overhead, and easily pivoting to what was important without trying to formalize and/or add process to everything. There was even a movement a while ago which captured at least some of the ideals fairly well, I thought: it was called Agile Development. It seems like a forgotten ideal in the environments of PKI's, metrics, and top-heavy process, but it's still around, at least in some corners of the professional world. If only it didn't virtually always get lost with "scale", formalization, and adding "required" process on top of it.

Anyway, all that is a bit of rambling, with which I hope to leave the reader with this: if you find yourself in a position where you have an issue with quality and/or efficiency, and you feel inclined to add more process to improve those outcomes, consider carefully if that will be the likely actual outcome (and as necessary, phone a friend). Your org might thank you eventually.

 

Sunday, March 17, 2024

Some thoughts on budget product development, outsourcing

I've been thinking a bit about the pros and cons of budget/outsourcing product development in general. By this, I mean two things, broadly: either literally outsourcing to another org/group, or conducting development in regions where labor is cheaper than where your main development would be conducted (the latter being, presumably, where your main talent and expertise resides). These are largely equivalent in my mind and experience, so I'm lumping them together for purposes of this topic.

The discussion has been top-of-mind recently, for a few reasons. One of the main "headline" reasons is all the issues that Boeing is having with their airplanes; Last Week Tonight had a good episode about how aggressive cost-cutting efforts have led to the current situation there, where inevitable quality control issues are hurting the company now (see: https://www.youtube.com/watch?v=Q8oCilY4szc). The other side of this same coin, which is perhaps more pertinent to me professionally, is the proliferation of LLM's to generate code (aka: "AI agents"), which many people think will displace traditional more highly-compensated human software developers. I don't know how much of a disruption to the industry this will eventually be, but I do have some thoughts on the trade-offs of employing cheaper labor to an organization's product development.

Generally, companies can "outsource" any aspect of product development, and this has been an accessible practice for some time. This is very common in various industries, especially for so-called "commoditized" components; for example, the automobile industry has an entire sub-industry for producing all the various components which are assembled into automobiles, and usually acquired from the cheapest vendors. This is generally possible for any components which are not bespoke, across any industry with components which are standardized, and can be assembled into larger products.

Note that this is broadly true in the software context as well: vendors sell libraries with functionality, open source libraries are commonly aggregated into products, and component re-use is fairly common in many aspects of development. This can even be a best-practice in many cases, if the component library is considered near the highest quality and most robust implementation of functionality (see: the standard library in C++, for example). Using a robust library which is well-tested across various usage instances can be a very good strategy.

Unfortunately, this is less true in the hardware component industries, since high-quality hardware typically costs more (in materials and production costs), so it's generally less feasible to use the highest quality components from a cost perspective. There is a parallel in first-party product development, where your expected highest quality components will usually cost more (due to the higher costs for the people who produce the highest quality components). Thus, most businesses make trade-offs between quality and costs, and where quality is not a priority, tend to outsource.

The danger arises when companies start to lose track of this trade-off, and/or misunderstand the trade-offs they are making, and/or sacrifice longer-term product viability for short-term gains. Each of these can be problematic for a company, and each are inherent dangers in outsourcing parts of development. I'll expand on each.

Losing track of the trade-offs is when management is aware of the trade-offs when starting to outsource, but over time these become lost in the details and constant pressure to improve profit margins, etc. For example, a company might outsource a quick prototype, then be under market pressure to keep iterating on it, while losing track of (and not accounting for) the inherent tech debt associated with the lower quality component. This can also happen when the people tracking products and components leave, and new people are hired without knowledge of the previous trade-offs. This is dangerous, but generally manageable.

Worse that the above is when management doesn't understand the trade-offs they are making. Of course, this is obviously indicative of poor and incompetent management, yet time and time again companies outsource components without properly accounting for the higher long-term costs of maintaining and enhancing those components, and companies suffer as a result. Boeing falls into this category: by all accounts their management thought they could save costs and increase profits by outsourcing component production, without accounting for the increased costs of integration and QA (which would normally imply higher overall costs for any shipping and/or supported product). That's almost always just egregious incompetence on the part of the company's management, of course.

The last point is also on display at Boeing: sacrificing long-term viability for short-term gains. While it's unlikely this was the motivation in Boeing's case, it's certainly a common MO with private equity company ownership (for example) to squeeze out as much money as possible in the short term, while leaving the next owners "holding the bag" for tech debt and such from those actions. Again, this is not inherently bad, not every company does this, etc.; this is just one way companies can get into trouble, by using cheaper labor for their product development.

This bring me, in a roundabout way, to the topic of using LLM's to generate code, and "outsource" software product development to these agents. I think, in the short term, this will pose a substantial risk to the industry in general: just as executives in large companies fell in love with offshoring software development in the early 2000's, I think many of the same executives will look to reduce costs by outsourcing their expensive software development to LLM's as well. This will inevitably have the same outcomes over the long run: companies which do this, and do not properly account for the costs and trade-offs (as per above), will suffer, and some may fail as a results (it's unlikely blame will be properly assigned in these cases, but when companies fail, it's almost always due to bad executive management decisions).

That said, there's certainly also a place for LLM code generation in a workflow. Generally, any task which you would trust to an intern, for example, could probably be completed by a LLM, and get the same quality of results. There are some advantages to using interns (eg: training someone who might get better, lateral thinking, the ability to ask clarifying questions, etc.), but LLM's may be more cost effective. However, if companies largely stop doing on-the-job training at scale, this could pose some challenges for the industry longer-term, and ultimately drive costs higher. Keep in mind: generally, LLM's are only as "good" as the sum total of average information online (aka: the training data), and this will also decline over time as LLM output pollutes the training data set as well.

One could argue that outsourcing is almost always bad (in the above context), but I don't think that's accurate. In particular, outsourcing, and the pursuit of short-term profits over quality, does serve at least two valuable purposes in the broader industry: it helps new companies get to market with prototypes quickly (even if these ultimately need to be replaced with quality alternatives), and it helps older top-heavy companies die out, so they can be replaced by newer companies with better products, as their fundamentally stupid executives make dumb decisions in the name of chasing profit margins (falling into one of more of the traps detailed above). These are both necessary market factors, which help industries evolve and improve over time.

So the next some some executive talks about outsourcing some aspect of product development, either to somewhere with cheaper labor or to a LLM (for example), you can take some solace in the fact that they are probably helping contribute to the corporate circle of life (through self-inflicted harm), and for each stupid executive making stupid decisions, there's probably another entrepreneur at a smaller company who better understands the trade-offs of cheaper labor, is looking to make the larger company obsolete, and will be looking for quality product development. I don't think that overall need is going to vanish any time soon, even if various players shuffle around.

My 2c, anyway.

Monday, February 19, 2024

Mobile devices and security

Generally, passwords are a better form of security than biometrics. There are a few well-known reasons for this: passwords can be changed, cannot be clandestinely observed, are harder to fake, and cannot be taken from someone unwillingly (eg: via government force, although one could quibble about extortion as a viable mechanism for such). A good password, used for access to a well-designed secure system, is probably the best known single factor for secure access in the world at present (with multi-factor including a password as the "gold standard").

Unfortunately, entering complex passwords is generally arduous and tedious, and doubly so on mobile devices. And yet, I tend to prefer using a mobile device for accessing most secure sites and systems, with that preference generally only increasing as the nominal security requirements increase. That seems counter-intuitive at first glance, but in this case the devil is in the details.

I value "smart security"; that is, security which is deployed in such a way as to increase protection, while minimizing the negative impact on the user experience, and where the additional friction from the security is proportional to the value of the data being protected. For example, I use complex and unique passwords for sites which store data which I consider valuable (financial institutions, sensitive PII aggregation sites, etc.), and I tend to re-use password on sites which either don't have valuable information, or where I believe the security practices there to be suspect (eg: if they do something to demonstrate a fundamental ignorance and/or stupidity with respect to security, such as requiring secondary passwords based on easily knowable data, aka "security questions"). I don't mind entering my complex passwords when the entry is used judiciously, to guard against sensitive actions, and the app/site is otherwise respectful of the potential annoyance factor.

Conversely, I get aggravated with apps and sites which do stupid things which do nothing to raise the bar for security, but constantly annoy users with security checks and policies. Things like time-based password expiration, time-based authentication expiration (especially with short timeouts), repeated password entry (which trains users to type in passwords without thinking about the context), authentication workflows where the data flow is not easily discernible (looking at most OAuth implementations here), etc. demonstrate either an ignorance of what constitutes "net good" security, or a contempt for the user experience, or both. These types of apps and sites are degrading the security experience, and ultimately negatively impacting security for everyone.

Mobile OS's help mitigate this, somewhat, by providing built-in mechanisms to downgrade the authentication systems from password to biometrics in many cases, and thus help compensate for the often otherwise miserable user experience being propagated by the "security stupid" apps and sites. By caching passwords on the devices, and allowing biometric authentication to populate them into forms, the mobile devices are "downgrading" the app/site security to single factor (ie: the device), but generally upgrading the user experience (because although biometrics are not as secure, they are generally "easy"). Thus, by using a mobile device to access an app/site with poor fundamental security design, the downsides can largely be mitigated, at the expense of nominal security in general. This is a trade-off I'm generally willing to make, and I suspect I'm not alone in this regard.

The ideal, of course, would be to raise the bar for security design for apps and sites in general, such that security was based on risk criteria and heuristics, and not (for example) based on arbitrary time-based re-auth checks. Unfortunately, though, there are many dumb organizations in the world, and lots of these types of decisions are ultimately motivated or made by people who are unable or unwilling to consider the net security impact of their bad policies, and/or blocked from making better systems. Most organizations today are "dumb" in this respect, and this is compounded by standards which mandate a level of nominal security (eg: time-based authentication expiration) which make "good" security effectively impossible, even for otherwise knowledgeable organizations. Thus, people will continue to downgrade the nominal security in the world, to mitigate these bad policy decisions, with the tacit acceptance from the industry that this is the best we can do, within the limitations imposed by the business reality in decision making.

It's a messy world; we just do the best we can within it.


Sunday, February 18, 2024

The Genius of FB's Motto

Why "Move Fast and Break Things" is insightful, and how many companies still don't get it

Note: I have never worked for FB/Meta (got an offer once, but ended up going to Amazon instead), so I don't have any specific insight. I'm sure there are books, interviews, etc., but the following is my take. I like to think I might have some indirect insight, since the mantra was purportedly based on observing what made startups successful, and I've had some experience with that. See: https://en.wikipedia.org/wiki/Meta_Platforms#History

If you look inside a lot of larger companies, you'll find a lot of process, a lot of meetings, substantial overhead with getting anything off the ground, and a general top-down organizational directive to "not break anything", and "do everything possible to make sure nothing has bugs". I think this stems from how typical management addresses problems in general: if something breaks, it's seen as a failure or deficiency in the process [of producing products and services], and it can and should be addressed by improving the "process". This philosophy leads to the above, but that's not the only factor. For example, over time critical people move on, and that can lead to systems which everyone is afraid to touch, for fear of "breaking something" (which, in the organizational directives, is the worst thing you can do). These factors create an environment of fear, where your protection is carefully following "the process", which is an individual's shield against blame when something goes wrong. After all, deficiencies in the process are not anyone's fault, and as long as the process is continually improved, the products will continue to get better and have less deficiencies over time. That aggregate way of thinking is really what leads to the state described.

I describe that not to be overly critical: for many people in those organizations, this is an unequivocal good thing. Managers love process: it's measurable, it has metrics and dashboards, you can do schedule-based product planning with regular releases, you can objectively measure success against KPR's, etc. It can also be good for IC's, especially those who aspire to have a steady and predictable job, where they follow and optimize their work product for the process (which is usually much harder than optimizing for actual product success in a market, for example). Executives love metrics and predictable schedules, managers love process, and it's far easier to hire and retain "line workers" than creatives, and especially passionate ones. As long as the theory holds (ie: that optimal process leads to optimal business results), this strategy is perceived as optimal for many larger organizations.

It's also, incidentally, why smaller companies can crush larger established companies in markets. The tech boom proved this out, and some people noticed. Hence, Facebook's so-called hacker mentality was enshrined.

"Move fast" is generally more straightforward for people to grasp: the idea is to bias to action, rather than talking about something, particularly when the cost of trying and failing is low (this is related to the "fail fast" mantra). For software development, this tends to mean there's significantly less value in doing a complex design than a prototype: the former takes a lot of work and can diverge significantly from the finished product, while the latter provides real knowledge and lessons, with less overall inefficiency. "Most fast" also encapsulates the idea that you want engineers to be empowered to fix things directly, and not go through layers of approvals and process (eg: Jira) to get to a better incremental product state sooner. Most companies have some corporate value which aligns with this concept.

"Break things" is more controversial; here's my take. This is a direct rebuke of the "put process and gating in place to prevent bugs" philosophy, which otherwise negates the ability to "move fast". Moreover, though, this is also an open invitation to risk product instability in the name of general improvement. It is an acknowledgement that development velocity is fundamentally more valuable to an organization than the pursuit of "perfection". It is also an acknowledgement of the fundamental business risk of having product infrastructure which nobody is willing to touch (for fear of breaking it), and "cover" to try to make it better, even at the expense of stability. It is the knowing acceptance that to create something better, it can be necessary to rebuild that thing, and in the process new bugs might be introduced, and that's okay.

It's genius to put that in writing, even though it might be obvious in terms of the end goal: it's basically an insight and acknowledgement that developer velocity wins, and then a codification of the principles which are fundamentally necessary to optimize for developer velocity. It's hard to understate how valuable that insight was and continues to be in the industry.

Why the mantra evolved to add "with stable infrastructure"

I think this evolution makes sense, as an acknowledgement of a few additional things in particular, which are both very relevant to a larger company (ie: one which has grown past the "build to survive" phase, and into the "also maintain your products" phase):

  • You need your products to continue to function in the market, at least in terms of "core" functionality
  • You need your internal platforms to function, otherwise you cannot maintain internal velocity
  • You want stable foundations upon which to build on, to sustain (or continue to increase) velocity as you expand product scope

I think the first two are obvious, so let me just focus on the third point, as it pertains to development. Scaling development resources linearly with code size doesn't work well, because there is overhead in product maintenance, and inter-people communications. Generally you want to raise the level of abstraction involved in producing and maintaining functionality, such that you can "do more with less", However, this is not generally possible unless you have reliable "infrastructure" (at the code level) which you can build on top of, with high confidence that the resulting product code will be robust (at least in so far as the reliance on the infrastructure). This, fundamentally, allows scaling the development resources linearly with product functionality (not code size), which is a much more attainable goal.

Most successful companies get to this point in their evolution (ie: where they would otherwise get diminishing returns from internal resource scaling based on overhead). The smart ones recognize the problem, and shift to building stable infrastructure as a priority (while still moving fast and breaking things, generally), so as to be able to continue to scale product value efficiently. The ones with less insightful leadership end up churning with rewrites and/or lack of code reusability, scramble to fix compounding bugs, struggle with code duplication and legacy tech debt, etc. This is something which continues to be a challenge to even many otherwise good companies, and the genius of FB/Meta (imho) is recognizing this and trying to enshrine the right approach into their culture.

That's my take, anyway, fwiw.

Saturday, August 26, 2023

"Toxic" Answers

Preface: This observation is not intended to call out any specific people.

Something I've observed in the work environment: a tendency from some types of people to provide what I would term "toxic answers". This is when, broadly speaking, someone on a team asks a question (re tech, process, how to do something, etc.), and someone else provides an "answer" which is not really helpful. This can take several forms:

  • Reference to existing documentation which is out of date, incomplete, or inaccurate
  • Reference to process which is surface-level related, but not germane to the actual question
  • Reference to something which someone else has stated to be the answer, but is not actually the answer, and the person echoing it has not personally verified
  • Some related commentary which expresses opinions on the topic, and pretends to answer the question, but isn't actually actionable
  • Commentary which expands the scope of the question to include more questions/work, without answering the original question
  • etc.

Obviously the above could be deemed "unhelpful", but why do I think of these responses as "toxic"? I will explain.

In a work context, you have various levels of understanding of topics discussed, ranging from your subject matter experts (with in-depth knowledge) to you high level managers (with usually just buzzword familiarity), and levels in between. When someone on a team asks a question, and someone (especially a more senior person) provides a "toxic" answer, this typically has a few effects:

  • The manager(s) believe the question has been addressed by the person providing the response, even though it has not
  • The asking person might be disinclined to pursue to topic further, and thus (at best) waste time working on it solo, because they feel they cannot inquire further
  • This can create more work for the person asking (in the case of a response which expands the scope), which creates a negative motivation to seek help
  • In the case of false/misleading or out of date information, this can waste lots of time going down paths which are ultimately not fruitful
  • If the information is known not helpful by the person asking, it can strain the working relationships
  • It generally "shuts down" the discussion, with the question effectively unanswered
  • Worse, it propagates an inaccurate/damaging perception of value to the team:
    • The person asking the question should (possibly) get credit for reaching out for something difficult/nuanced, but instead they are likely perceived as less capable of independently solving problems
    • The person providing the response should (probably) be viewed negatively for damaging the team dynamics and time management, but instead will likely get credit from their management for providing timely and helpful answers

In addition to the damage above, it can be challenging even for a "good" employee to navigate the process of trying to improve this behavior, depending on the perceptions of the employees. The secondary harm of someone providing toxic answers is that over time, they are perceived as more valuable team members by their management, so negative feedback about their answers or behavior is typically seen as more of a negative for the reporters than the subject. This is an observable effect within teams, of course: you don't want to criticize the person who management views as a "star employee". This compounds the effects, ultimately driving the actually more productive employees to seek roles elsewhere, away from the toxic influences which they cannot modify.

My advice to companies and managers, with respect to the above, would be this: do proactive follow-ups for inquiries where the outcome is unobvious, and ask the team members if the answers provided led to actual resolutions. Assume people on the team are not going to proactively raise concerns about people viewed as "untouchable" or senior within the org, and factor that into your information gathering. Be on the lookout for people who just provide links for answers, without checking if the information referenced actually solved the issue presented. And understand that your best employees are not the ones providing the most "this might be related" type answers, but the ones providing the most actionable and accurate answers. If you don't identify and curtail people providing toxic answers within a team, you're going to have problems over the long run.


Tuesday, August 22, 2023

An amusing employment opportunity interaction

So I was recently doing some casual employment opportunity exploration (as one should do periodically, even if the situation is not pressing if nothing else just to see what else might be out there, and to keep one's interviewing skills updated), and a funny thing happened.

I was being screened by a developer as part of a normal process, and got a typical "test" problem to write an implementation for. In this case, it was something which would be real-world applicable, but still small enough to be feasible for an interview time slot. Germaine to the story is that, for this interview, the other party was not using an online shared text editor for sample code, but rather just having me share my favorite (or handy) IDE/editor from my local system to write the code in.

Now, for this instance the position I was being evaluated for was primarily Windows-based, so naturally I opened Visual Studio, and switched from my default most recent personal project to a new blank file, where I took down the problem description as described. As I was doing this, though, I realized that what the counterparty was describing was something I had already written for my own open-source library, which was the same project which I had already had open in Visual Studio.

So I asked if I could just show him the solution I had already written for my open source library, and explain it to him, rather than writing it again. He said that was okay, since I already had it open, and I did so. The total explanation took about a minute, he was satisfied that I fully grasped the solution (I'm sure the working and unit tested code helped with that), and we were done with that section of the examination.

Now obviously all interviews don't go like this (or mostly any), but it was pretty funny to happen to have code readily available which solved the exact problem being asked for, including being computationally optimal and templated already, which I could just point at. I feel like notwithstanding effort to make interviews objective and separable from any previous work, it would really save a lot of time if one could just point to working code one had previously written (say, open source utility libraries), and assert that you can write code based on those previous efforts. I wouldn't expect that to be the norm (especially since it's comparably easy to fake), but I can attest that it's pretty cool when it does happen like that, to an expedient and positive outcome. :)