The Well-Maintained Test: 12 Questions for New Dependencies

A well-maintained bicycle.

Joel Spolsky’s infamous Joel Test is a quick heuristic test for checking a software engineering team’s technical chops. I’ve come up with a similar test that we can use to decide whether a new package we’re considering depending on is well-maintained.

I do not have the hubris to name the test after myself, so I present: The Well-Maintained Test.

The Test

Answer “yes” or “no” for the below questions by checking the new dependency’s website (if any), project page (npm, PyPI, etc.), and source control hosting (GitHub, GitLab, etc.).

The package scores one point for each “yes”. You’ll have to determine how many points are required to pass, based on your risk tolerance.

Bear in mind, whenever you answer “no”, that is an opportunity to contribute! You may find some issues are easily remediated.

  1. Is it described as “production ready”?
  2. Is there sufficient documentation?
  3. Is there a changelog?
  4. Is someone responding to bug reports?
  5. Are there sufficient tests?
  6. Are the tests running with the latest <Language> version?
  7. Are the tests running with the latest <Integration> version?
  8. Is there a Continuous Integration (CI) configuration?
  9. Is the CI passing?
  10. Does it seem relatively well used?
  11. Has there been a commit in the last year?
  12. Has there been a release in the last year?
Some lovely flowers

The Questions in Detail

Let’s examine each question in a bit more depth.

1. Is it described as “production ready”?

We want to see evidence that the maintainers consider the software as ready for use in production.

The documentation shouldn’t have any banners or wording implying a future stable release.

The version number should not be a pre-release, alpha, beta, release candidate, etc. Note that some maintainers stick with a “zero version number” like 0.4.0, even when they consider the package production ready.

2. Is there sufficient documentation?

If we can’t find information on what the package currently does, it seems doubtful the future will be easy.

“Sufficient” varies based upon: the scope of the library, the ecosystem, and your preferences.

Documentation comes in many forms: a README file, a documentation site, a wiki, blog posts, etc. Hopefully the package doesn’t make you hunt for it.

3. Is there a changelog?

A changelog, or a release notes page, is vital for our ability to update the package. The changelog is the main place for communication of breaking changes. (A case for changelogs is made at keepachangelog.com.)

Changelogs come in many forms: a single file, a documentation section, GitHub release descriptions, etc. Again, hopefully the package doesn’t make you hunt for it.

Note that some projects “have a changelog”, but it has stopped being maintained since the project’s inception. So check that the changelog covers recent releases.

4. Is someone responding to bug reports?

If recent bug reports have gone unanswered, it may be a sign that the package is no longer maintained. It’s worth ignoring any “spammy” open issues, and checking for recently closed issues since they are activity.

Check for issues like “is this still maintained?”… the answer is probably “no”, per Betteridge's law of headlines.

5. Are there sufficient tests?

Tests give us confidence that future changes will not result in bugs.

Again, “sufficient” is context-dependent: testing norms in our language and ecosystem, ease of testing the functionality, and personal preferences.

Measurement of test coverage is normally a sign that the tests are higher quality. With coverage, maintainers can at least tell when changes affect untested code.

If there’s no proof of coverage, it’s worth opening a few test files, to check that they aren’t auto-created empty skeletons.

6. Are the tests running with the latest <Language> version?

Most programming languages iterate regularly. Python has annual releases, as does JavaScript (ECMAScript). If a package we’re considering doesn’t support the latest version, it may prevent us from upgrading.

We can grant some leeway for very recent language versions. If Python 3.10 was released last Tuesday, we cannot expect every package to be up to date.

Testing against a new language version can be an easy way to contribute. Often the new version only needs adding to the test matrix, although that may reveal some bugs.

7. Are the tests running with the latest <Integration> version?

<Integration> here could mean a framework that the package is based on, like Django, or something the package interfaces with, like PostgreSQL. It could mean several things, in which case we can check them all.

The same conditions apply as for the latest <Language> version. And again, adding tests for a new version may be an easy way to contribute.

8. Is there a Continuous Integration (CI) configuration?

If there are tests, it’s likely there’s a CI system set up, such as GitHub Actions. We should check that this in place, and running correctly for recent changes.

9. Is the CI passing?

Some projects configure CI but then ignore it or leave it unmaintained. CI may be failing, for one or more <Language> or <Framework> versions. If this has gone on for a while, it is a sign that maintenance is lagging.

Sometimes CI failure is caused by a single small bug, so fixing it may be a quick contribution. It can also be the case that old versions of <Language> or <Integration>s can simply be dropped.

10. Does it seem relatively well used?

We can guesstimate usage by checking recent download counts, and to a lesser extent, popularity metrics like GitHub’s “stars”. Many package indexes, like npm, show download counts on package pages. For PyPI, we can use pypistats.org.

We can only compare usage relative to similar packages, popularity of any <Integration>s, and our <Language>. A particularly niche tool may see minimal usage, but it might still beat any “competitor” packages.

11. Has there been a commit in the last year?

Maintainers tend to abandon packages rather than explicitly mark them as unmaintained. So the probability of future maintenance drops off the longer a project has not seen a commit.

We’d like to see at least one recent commit as a “sign of life”.

Any cutoff is arbitrary, but a year aligns with most programming languages’ annual release cadence.

12. Has there been a release in the last year?

A backlog of unreleased commits can also be a sign of inattention. Active maintainers may have permission to merge but not release, with the true “owner” of the project absent.

The scroll of questions

Other Considerations

Here are some other things we should consider alongside the test.

License

Before considering a dependency, we should check its license is compatible with our needs. The main concern is the GPL family of licenses, which can be restrictive.

Quality

Carefully created packages tend to be carefully maintained. It can be hard to pin down what makes code and documentation “good”. But we can briefly inspect the package’s code and documentation to get a feel for it.

Reputation

If the package maintainer(s) is/are known for other high quality packages, this counts positively. Reputation will probably correlate with the test.

Skin in the Game

If the maintainer(s) are using the package themselves, in production application, we know they have a vested interest in its continued development. Such “skin in the game” is a valuable signal. But it can be hard to determine if a package is actually in use (and how much), as most organizations do not openly discuss this.

Size

Update (2022-05-27): Added after mention by Christopher Trudeau on The Real Python Podcast.

Small packages are less of a risk than big ones. If you’re considering adopting a small package that you think you could later copy into your project, or rewrite in a few hours or days, you can lower the bar. But if you’re thinking of taking on a large “platform package”, such as a web framework, you’ll want to be stricter.

Metricificated Alternatives

There are a few projects that attempt to quantify how well-maintained packages are. The resulting metrics exhibit the same problems as all metrics: Garbage In, Garbage Out (GIGO), not accounting for harder-to-acquire but important data, incentivizing superficial action, etc.

We should use such tools with caution. They are more useful as guides for our investigation, as opposed to absolute answers.

(I was originally going to cover a couple of these projects here, but they didn’t come out very favourably. I don’t want to just shit on others’ work.)

Fin

Thanks to the following people their contributions to the original Twitter thread and discussion: @fabiocerqueira, @hugovk, @rmcomplexity, Dan Palmer, Daniel Hepper Frank Wiles, Gordon Wrigley, Henry Schreiner III, Julius Šėporaitis, Jürgen Gmach, Tom Viner, Will McGugan, and Zellyn Hunter.

May you find well-maintained packages for your projects,

—Adam


Newly updated: my book Boost Your Django DX now covers Django 5.0 and Python 3.12.


Subscribe via RSS, Twitter, Mastodon, or email:

One summary email a week, no spam, I pinky promise.

Related posts:

Tags: