A Better Changelog

Changelogs are a frequently overlooked aspect of software release management. I’m going to outline the different approaches to keeping them, and describe a dumb yak shave I undertook to improve the situation for my own projects.

Changelog approaches

No changelog

A disturbing number of projects don’t keep any changelog. Those responsible are awful people who clearly don’t want anybody using their software. Sure does make releases easier, though.

Source control driven

Leveraging existing tools is always wise. On the face of it, using a Git/Hg/etc commit log as a changelog is a big time saver. Unfortunately this falls down in practice:

  • With no massaging, real world commit histories are frequently unreadable & fail to serve the purpose of a changelog.
  • Massaging the commit log of any nontrivial/community project is a lot of effort.
  • Massaging commit history almost always requires heavy interactive rebasing, which does not play well with tools like Github.

Manually updated

Inevitably, those seeking quality changelogs go for a “fully curated” solution, either initially or arriving at it after getting fed up with source control changelogs.

Writing changelogs by hand results in useful output for minimal effort, but has one major deficiency: it’s a pain to manage if you keep multiple release branches. Multi-release-line, hand-written changelogs mean either lots of manual copy/paste, or (almost worse) forgetting to do so, resulting in inaccurate changelogs.

Hybrid

Fabric’s changelog (from the early 1.x days through the 1.6 release; source, output) takes a hybrid approach: manually curated, but organized such that merging between branches is much more natural. It’s a discrete commit-log-like timeline instead of per-release lists, as is tradition.

Unfortunately it’s not the most natural to read - even with a ‘howto’ at the top it’s frequently confusing for users. Ideally, we can improve this so it’s both readable and easy to maintain.

A new solution

I decided to automate the process I’d been expecting users to perform by hand: parse my ‘stream formatted’ changelog into a per-release format. (Originally I considered a wholly new, more machine-oriented, source format - which turned out to be unnecessary.)

The problems

Naively, one could parse a copy of my changelog into releases by following the normal methodology of putting ‘feature’ or ‘support’ items into their subsequent X.Y.0 releases, and ‘bugfix’ items into bugfix releases (X.Y.Z where Z != 0).

However applying that to a single copy of the existing changelog doesn’t suffice, because my original design relied on branch differences:

  • The contents of a bugfix release can only be determined by viewing the changelog from its release branch - in a newer branch’s copy, the feature items for the next feature release are mixed in with bugfixes.
  • Similarly, “major” bugs released as part of feature releases can’t be told apart from bugs in bugfix releases, without comparing branches.

In addition, backported features or support issues can’t be reliably determined - if a given change was made in, say, 1.5 but also ported back to 1.4 and 1.3, this isn’t distinguishable from the perspective of 1.5’s changelog.

The solutions

Adding an optional ‘keyword’ to the issue number field is trivial, given how my changelog setup currently works. This allows me to:

  • Specify that a backported keyword tells the new parser to add feature/support items to bugfix releases as well as its feature release;
  • Add a major keyword for use on bug items so they get added to feature releases instead of bugfix releases.

I then used a one-time script (seek.rb) to tell me which entries needed this metadata retroactively applied.

Finally, and definitely not least, I overhauled my existing Sphinx extensions to perform the parsing needed. At a high level (leave a comment if you’d like me to explain the gory details in another post; the raw code is linked below):

  • It replaces the bug/support/feature, and release, ReST roles with intermediate objects (versus what they were before, fully HTML-ifiable node lists);
  • Prior to writing HTML, those objects are retrieved and organized into per-release buckets, going by the above heuristics & keywords;
  • Those release buckets are then turned into display-friendly nodes (headers + sub-lists) and used to replace the stream-like list from the initial parse.

Final result

It’s pretty simple:

I’m happy with the result. It gives me nearly hassle-free maintenance of as many release branches as I want to maintain, and gives users an easy-to-read (or so I hope) changelog.

EDITED TO ADD: The following day I did the work required to rip this out into its own codebase so I could use it in multiple projects; the result is Releases. A quick example of porting a (small) project’s handwritten changelog to using Releases can be found in Invoke’s history.

Comments