Allow me to posit a hypothetical. Imagine you are at a company, and someone (perhaps you) has just completed a project to implement some complex feature, and you have been tasked with ensuring the implementation is documented, such that other/future developers can understand how the implementation works. For the sake of argument, we'll assume the intent and high level design have already been documented, and your task is to capture the specifics of the actual implementation (for example, ways which it diverged from the original design, idiosyncrasies of the implementation, corner cases, etc.). We'll also assume you have free reign to select the best tooling, format, storage, etc. for the documentation, with the expectation that all of these are considered in your work product.
Note: This is, in my experience, a not uncommon request from management, especially in larger companies, so it seems like a reasonable topic for general consideration.
Let's see how it plays out, looking at the various design aspects under consideration, and what the best selections for each might be.
Considerations:
Documentation locality
One aspect which is certainly worth considering is where the documentation will live. One of the very important considerations with any sort of documentation is locality: that is, where the documentation lives. Documentation in an external repository location can be both hard to locate (particularly for developers who are less familiar with the org's practices), and hard to keep in sync with the code (because it's easy for the code to be updated, and the documentation to be neglected). In concept, the documentation should be as "close" to the code as possible. An oft-quoted downside of putting documentation within a source code repository is that it cannot be as easily edited by non-developers, but in this case that will not be a concern, since presumably only developers will have direct knowledge of the implementation anyway. So in concept, the best place for this documentation is in the code repository, and as close to the implementation code as possible, to minimize the chances of one being updated and not the other.
Language and dialect
This might seem like a trivial consideration, particularly if you've only worked within smaller orgs with relatively homogeneous cultures and dev backgrounds, but I would suggest that it is not. Consider:
- Not all the developers may speak the same native language(s), and nuance may be lost when reading non-primary languages
- Some developers (or managers) may object to casual nomenclature for business products, but conversely not all developers may want to read, or be capable of writing, business professional text
- There's also the question of style; for example, writing in textual paragraphs, vs writing in terse bullet points and such
In the abstract, the choice of language and dialect should be such that:
- All developers can read and understand the nuances expressed in the documentation
- The language used does not create undue friction for either being too much, or not enough, "business professional"
- The writing style should be able to express the flow and semantics of the code in a comprehensible manner, while allowing for the various special-cases
- For example, there should be a manner in which to express special-case notes on specific areas of the implementation, like footnotes or annotations
- There should also be a way to capture corner cases, and perhaps which cases are expected to work, and which are not
Sync with implementation
This was alluded to in the locality point, but it's important that the documentation stay in sync with the implementation, to the maximum extent possible. If the documentation is out of sync, then it is not only worthless for understanding that piece of the code, but perhaps even a net negative, as a developer trying to understand the implementation from the documentation might be misled, and waste time due to bad assumptions based on the documentation. So in addition to locality (ie: docs near the code), we want to ensure that it is as easy as possible for developers to update the documentation at the same time they make any code changes, so that they will be able and inclined do so.
Expedient vs comprehensive
It would be a bit remiss to not also mention the trade-off in the initial production of the documentation, between being expedient and being comprehensive, and additionally how much the above trade-offs might impact the speed at which the documentation could be produced. Every real-world org is constrained by available resources and time, and presumably you will have some time limit for this project as well. So the quicker you can produce documentation, and the more comprehensive it is, the better your performance on this task will be.
So, what to do?
Admittedly, those readers who have thought about or performed this task already probably have some good ideas at this point, and perhaps the more intelligent readers have already figured out where this is going, just based on the objective analysis above. To recap the considerations, though:
- We need something in a form which we can produce quickly, but is also as comprehensive of a description of the implementation as possible
- The language used for the documentation must be readable by all the developers who are familiar enough with the code to work on it, regardless of their native language(s)
- The documentation must be unquestionably work appropriate (no swear words, slang, obscure references, etc.), but also terse enough to provide value without being excessively verbose
- There must be some mechanism in the structure to provide footnotes for implementation choices, corner cases, tested inputs, etc.
- The documentation should be as close to the code as possible, such that it's easy to find, and there is a minimal risk of it getting out of sync with the actual implementation over time
- It must impose the smallest amount of overhead as would be reasonable to update the documentation along with changes in the implementation over time
- Note: This is often the hardest thing to get "right" with docs in general, since the value add for future readers must be greater than both the initial production time, and the maintenance time, for documentation to be a net positive value at all
Now, the above might seem like a tall order with lots of hard to answer questions, but let me point out something which might make these decisions a bit easier. A programming language, such as it is, is fundamentally just a way to describe what you want the computer to do in a human readable form. Assuming the hypothetical selection of the same language as the implementation for the documentation, this would be:
- Able to be produced reasonable quickly
- Readable to all developers who would be familiar with the implementation code
- Unquestionably work appropriate
- Able to provide footnotes (via comments, or ancillary code such as unit tests)
- Very close to the code (could be in the same files, in fact, right next to the implementation)
- ... but it would still have some non-trivial overhead to keep in sync with the actual production code
But wait... we can solve that last problem fairly trivially, by just eliding the actual copy or translation of the code into nominal documentation form, and just rely on the code itself! Now we have gained:
- Produced instantly (once the implementation is done, the documentation is also implicitly done)
- Zero overhead to keep in sync with the actual implementation (since they are the same)
"But hold on", you might object, "what if the code is incomprehensible?" That is a valid question in the abstract, but I would counter with two observations:
- If the code is incomprehensible, and you can write more comprehensible documentation (ie: the complexity is in overhead of the implementation, not inherent to the problem space), then you can fix the code to make it more comprehensible
- If the problem space is inherently complex, then side-by-side documentation will not be less complex, and the code itself is often just as easy for a developer to read and understand than any other form of documentation
Wait, what did we just conclude?
We just concluded, based on an objective analysis of all the various design considerations, that the best way to document a software implementation is to not do any documentation at all, because every single thing you could do is worse than just allowing the code to be self-documenting. You should improve the structure of the code as applicable and possible, and then tell your management that the task is done, and the complete functional documentation is in the repository, ready to be consumed by any and all future developers. Then maybe find some productive work to do.
Note: Selling this to managers, particularly bad ones, might be the hardest part here, so I'm being slightly knowingly flippant. However, I do think the conclusion above is correct in general: wherever possible within an org, code should be self-documenting, and any other form of documentation for an implementation is strictly worse than this approach.
PS: I'm aware that some people who read this post probably have already internalized this, as this is fairly common knowledge in the industry, but hopefully it was at least a somewhat entertaining post if you made it this far and already were well-aware of what the "right" answer here was. For everyone else, hopefully this was informative. :)