This is a brand new document as of June 14, 2025. The ideas are not new - I’ve been thinking about and experimenting with these concepts for decades. But it’s still somewhat stream-of-thought and only lightly edited. It should become more refined, complete, and better organized in the near(ish) future
I’ve tried to find the “perfect documentation system” many times over the years. It’s a quest for which I’ve intentionally rationed my efforts, to keep it from becoming a major obsessive distraction from “real work”. But I consider good documentation to be extremely valuable, so I keep trying. I’ve tried a lot of different approaches. None of them have been ideal, although some come much closer than others.
The following items are my expectations for a “perfect documentation system”. I’ve found systems which provide a good (sometimes excellent) solution for each of these. A few can cover most of them. Nothing really satisfies all of them. I’m not certain any system can perfectly satisfy all of them, as some combinations introduce a bit of contradictory tension. This requires tradeoffs and compromise, which at best will be based on intentional educated choices among the available options. Ideally, making the right choices will result in the optimum available system for a specific project at that time.
Some Ingredients of a Perfect Documentation System
Separation of concerns
Creating and maintaining excellent documentation requires quite a bit of flexibility in terms of both authoring and rendering content. The degree to which a system achieves this has a major impact on it’s long term viability.
Ideally, the “source” for documentation should be concerned
- primarily with enriching semantic content, with extensible custom semantics for key domain-specific concepts
- secondarily with calling out explicit associations (links) to other content
- although most cross-referencing should be auto-generated
- especially when using associations to custom sematic concepts as opposed to links to a specific location in related content
- minimally with formatting, especially for documentation source “embedded” in other files ( i.e. comments or docstrings in a Python file)
DRY
I endeavor to keep source code - both programming and documentation - as DRY as reasonably possible. However, uncompromising adherence to that ideal can become problematic in a documentation system - it may be possible, but often imposes significant tradeoffs in other aspects of the system. A little flexibility here can go a long way. But only a little, it should still be mostly DRY.
One specific desirable directly DRY bit for documentation source is a shorthand/abbreviation mechanism for frequently repeated domain/project specific bits - especially those that might change.
- this is intentionally vague, but some examples include
- global definitions
- semantic tagging
- substitution text in meta-content, such as a relative base path for links to another section
- all of which needs to be inheritable, i.e. use defined bits from higher/enclosing scopes
Semantic Tagging
There should be a concise mechanism for tagging ontology, i.e. semantic annotation to facilitate creation and use of an external (relative to individual documentation source files) domain/project specific knowledge graph. At a minimum, this requires
- a concise syntax for identifying references to entities within the knowledge graph
- without knowing the location of the entity’s definition, only that it ought to have a system-unique definition somewhere
- a syntax for defining ontology : entities types, including their attribute and potential relationship types
- a syntax for defining taxonomy : entity instances, including actual attributes and relationships to other entities
For example, consider the concept “Project”.
- In the context of this particular site and it’s documentation,
a “Project” is, well, a project in the regular sense of the word.
But it also
- has it’s source code in one (or more) git repos
- is primarily implemented in a specific language
- may be related to other projects (as parents, children, peers, …)
Therefore I would like to be able to
- define the ontology necessary to capture these semantics
- identify entity types
- Project, repo, programming language, …
- identify relationships
- Projects can be associated with 0..N repos, languages, …
- Repos have a primary origin URL, …
- with these potentially inheriting and refining definitions from higher levels
- identify entity types
- mark and annotate entities within the documentation
to capture
- LCPF
- is a Project
- has a repo at https://github.com/Lumensalis/LCPF
- is written in Python
- LCPPF
- is written in C++
- LumenSalis.com (this website)
- is written in MDX and Typescript/Next.js
- refers to all the other projects
- LWUI (the Lumensalis WebUI)
- is a sub-project of LCPPF
- written in Typescript/Next.js
There are numerous existing approaches which already do some (or all) of this. However, finding one that plays well with the rest of the “Perfect Documentation System” isn’t so easy.
- creating the ontology “metadata” description
- some existing solutions are limited in scope and can’t fully support the desired semantic tagging feature set
- more advanced options (especially those based on XML schemas) can be extremely tedious without UI assistance
- may require using a different format/syntax than most other documentation source
- using the ontology within documentation source
- ideally is done with an editor with awareness of and full support for
- the documentation source format
- the general file source format
- example : documentation source is contained within comment blocks in a C++ source file
- the knowledge graph ontology (i.e. entities types, …)
- the knowledge graph taxonomy (i.e. entities, …)
- context sensitive auto-completion and warning/error indicators fully incorporating all of the above
- ideally is done with an editor with awareness of and full support for
There are also many systems which can partially annotate semantics automatically, especially for API documentation using tools like Doxygen.
Global Definitions
There should be a concise mechanism for establishing reusable aliases for
documentation content. For example, my current CircuitPython framework project is
rather unimaginatively named the Lumensalis CircuitPython Framework, and
referred to as the LCPF throughout the documentation. Eventually I might come up with a better name.
If and when I do, it would be preferable to have a simple way to specify
a project wide alias to change the displayed representation of LCPF whenever
it is used.
Note that this is not just a simple search and replace, as it only impacts displayed text. For example, given the Markdown source
as opposed to LCPF [device configuration](Projects/LCPF/Concepts/DeviceConfiguration) ...
- LCPF in
in the LCPF
would be replaced - LCPF in
[device configuration](Projects/LCPF/Concepts/DeviceConfiguration)
would NOT be replaced
Locality
The source for documentation is often directly correlated with artifacts defined by other “source” forms, such as software code, electronic schematics, complex layered object/vector/mesh graphic formats, etc. Several factors become significantly impacted by locality, or the “distance” between the source of the artifact and the source of it’s documentation. The greater the distance between the artifact source and the documentation source, the more challenging it becomes to keep the documentation up to date.
Various systems for “documentation in the source code” provide excellent locality as the source for the code and documentation can be adjacent in the same file. Most of these also introduce other benefits - and sometimes challenges - in a documentation system.
On the other hand, a requirement to keep document source in separate files reduces locality. This may be mandated intentionally, perhaps by corporate policy. Or it may be imposed by technical integration limitations do to binary source file formats, inflexible tools for source editing (especially with WYSIWYG editors for graphical content), etc. Separating artifact source and document source files into distinct locations (directories, CMS systems, SCM repos, …) further exacerbates the problem.
Familiarity
Anything close to a “Perfect Documentation System” will have a learning curve for anyone needing to author documentation content. This makes incorporating existing, widely used approaches or systems like advantageous.
Consider perhaps the most widely used current approach for source documentation, Markdown:
- most developers already use it
- most IDEs and devOps pipelines already have some degree of support for it
- github, gitlab, etc. automatically auto-format Markdown code
- it’s “just text” so it works everywhere
- it’s relatively easy to read “raw” without a formatted preview
There are many other potential syntaxes / formats for documentation source, some (all?) of which are arguably better than Markdown in the wider context of a “Perfect Documentation System”, but Markdown gets an A+ for familiarity and tooling.
Self documenting code
In theory, perfect code needs no documentation … HOWEVER
- code cannot be written perfectly in an imperfect language
- AND there are no perfect languages
- THEREFORE “self documenting code”, while itself being a Good Thing, can almost always be improved with well written additional documentation
There are many systems for “self documenting code”, such as Doxygen and Sphynx (both of which support C++, Python, and a variety of other languages ), PyDoc, JSDoc , …
They usually include
- prescribed techniques and syntax for including documentation source within
the programming language source code
- Typically embedded within comments
- often with domain specific markup for enhanced semantics, such as the
ubiquitous
@param ...
for documenting parameter names
- often augmented by analyzing the parsed AST surrounding or adjacent to comment-embedded documentation to extract context (associated class, method, file, …)
- occasionally can extract further documentation relevant content from language specific annotations/attributes/metadata
- usually requires some post processing pass to extract the documentation content
- although sometimes makes content directly available directly within the application without any need for additional processing, such as Python docstrings
semantic content by parsing the source, augmented by additional descriptive documentation source (typically in comments, sometimes in various language specific annotation/attribute/metatdata extension syntax such as python docstrings )
Minimal formatting in content
Documentation source should only minimally concerned with formatting. The more specific the formatting in the source, the more challenging it becomes to render the documentation in a high quality form
Simple formatting concepts like a Markdown-ish _emphasize this_
seem reasonable, but even
these can quickly become problematic. For example, if mentioning the name of an
object type in documentation source, it “feels” right to identify the type
in some manner which causes it to stand out1.
The problem here is that lacking a better alternative, those who author documentation
part time as an adjunct to other primary responsibilities may indeed add some form of emphasis,
but it will almost inevitably become applied inconsistently. Something like it "feels" right to identify the \<OurTypeConcept>type\</OurTypeConcept>
allows the semantics to be clearly and unambiguously specified in the documentation
source, while ensuring that anything marked <OurTypeConcept>
will be
formatted consistently throughout an entire rendered form of the documentation.
Of course, it’s not quite that simple…
Links and Referencing
Embedded links are an important documentation feature.
Tooling
IDE integrations, plugins for dynamic rendering, … there is a significant ecosystem out there of various frameworks, modules, extensions, plugins, etc… for assisting with authoring, rendering, and deploying your documentation.
The right ones can significantly improve the authoring experience, particularly with solid support for
- “syntax” (of the document source form) checking
- intelligent link management / cross referencing
- auto complete
- project wide automated bad link detection (i.e. ‘lint’ for your doc source links)
- preview
Intelligent indexing
- Search indexing
- auto semantic linking
Flexible Rendering
An ideal rendering system allows a common shared set of documentation source
to be cleanly rendered to many different output formats, such as PDF, HTML, RTF, XML, …
It also allows consistent theme/styling to be applied through the entire
body of documentation, especially for domain or even project-specific semantics.
For example, you might want all \<OurTypeConcept>
spans in bold with a
slightly larger font size and a highlighted background for HTML output,
but have it in a bold italicized monospace font for a custom PDF output
intended for printing handouts at a conference presentation where you want match the
recommended stylistic conventions for that conference’s proceedings.
Temporal awareness
Documentation changes over time. The ability to rerender documentation based on the source as it existed at a point in the past is important. At the very least, the ability to rerender documentation for prior version releases is critical.
This, along with other useful features like forks and branches, is more or less “free” when documentation source resides in standard files with a text-based format simply by using a good SCM (typically git). However, if your documentation source is maintained in a CMS (for instance, WordPress) this might prove more challenging.
Documentation approach for this site
I have been primarily using a mix of PyDoc style annotations and Markdown within Python and C++ source for a while, but Markdown is a bit limited - especially for additional free-form documentation outside of program source code (for example, the documentation content in these pages on Lumensalis.com).
Part of the reason I’m overhauling this site is to experiment with MDX as
a primary format for document content. This includes identifying usability
issues. For example, having a central DRY location to identifying project-specific
definitions for terms - like MDX - so that it’s use can be detected
and prettied up automatically instead of needing to manually
add a markdown link every time I mention [MDX](/terms#MDX)
. It looks like
implementing this should be feasible with a combination of custom MDX
components and post-processing.
MDX seems to be a significant improvement over vanilla Markdown, but there are still some important things it doesn’t do, especially in terms of custom semantics. I could hack in some semantics support with custom components, but since that wouldn’t be part of the “core” MDX feature set, they won’t get any special treatment from MDX integrations like IDE extensions. That’s unfortunate, because the widespread availability of MDX tooling is one of it’s major advantages.
There are other potential documentation source formats, such as AsciiDoc , DocBook , and reStructuredText , and even LaTeX have deeper support for some features useful in software documentation, and provide extension mechanisms for semantic tagging that aren’t as removed from the core functionality.
I’ve used BoostBook/DocBook, and while it provided great coverage for the kinds of things I want to be able to express in documentation source, it has it’s own issues (although it’s been a while since I tried a newer release).
Footnotes
-
as Markdown source, this would be
"feels" _right_ to __identify__ the ___type___ in some ~~manner~~ which <sub>causes</sub> it to <sup>stand</sup> <ins>out</ins>
↩