How to clean up unused code with Git

Time to go digging through your project’s ancient history…

Projects often accumulate an amount of unused code. This cruft makes a prjoect harder to navigate, and can confuse future development or debugging.

One of the reasons unused code sits around is because it can be hard to determine whether it is still in use. In this post we’ll look at a few tools in Git that can help you gain some surety that it’s safe to remove a piece of old code.

A general process

Here’s a general process for finding and deleting unused code:

  1. Find a piece of code that you suspect to be unused: a function, CSS class, template file, image, etc.
  2. Check the current code doesn’t seem to use it.
  3. Check the history to try and find when it was last used.
  4. Delete it!

Git cannot help you with #1, since it knows nothing about the programming languages or tools you’re using. You’ll have to find code that you suspect is unused first, by searching, checking test coverage, or other means.

But once you have found potentially unused code, Git provides several tools that can help with #2 and #3. Let’s take a look.

Find current usage with git grep (or rg)

The git grep command allows you to search through your project’s files for a regular expression. It only searches files tracked with Git, so it won’t give you results from generated or installed files. It provides a more flexible alternative to the find-in-all-files features in text editors.

For example, imagine you find a possibly unused CSS class called splash. You can search through the repository for usage like so:

$ git grep splash

If this doesn’t output anything, that means there are zero results. This means you can be reasonably sure that the class is unused.

Dynamically generated names

Search results cannot, unfortunately, give you 100% confidence that a given piece of code is unused. Most programming contexts allow dynamic references, so there could be arbitrary code that creates the name you’re searching for:

In [1]: "spl" + "ash"
Out[1]: 'splash'

…even obscurely:

In [2]: "hsalps"[::-1]
Out[2]: 'splash'

Such code is pretty unlikely for simple function calls, single word CSS classes, etc. But it is more common for templating part of a multi-word name, such as:

body_class = "splash-" + ("red" if form.errors else "blue")

As such, you might search for a single part of the name only, e.g. just “splash” instead of “splash-red”.

Some familiarity with the language and project will also help you know what dynamic name techniques are in use.

Limit searched files

You might get many false positives in your git grep results if the term you’re searching for has several meanings in the project. Imagine “splash” was a term used both in the CSS, for “splash screen”, and the data model, representing wave height calculations. The search for “splash” could contain many irrelevant results, related to the second meaning:

$ git grep splash
example/core/models.py:splash = models.IntegerField(
example/core/models.py:wave_height = self.splash * self.force
example/core/forms.py:"splash",

You can limit the files searched by providing Git pathspecs. For example, to limit the search to just HTML files:

$ git grep splash -- '*.html'

…or HTML and JavaScript files:

$ git grep splash -- '*.html' '*.js'

That’ll help see the wood for the trees.

Go faster with ripgrep (rg)

ripgrep, or rg, is an alternative super-fast file searcher. It obeys .gitignore files by default, so it also won’t give you results from generated or installed files. It’s worth checking out since on even moderately sized projects, its performance advantage will save you time.

Check the history with git log

Chesterton’s fence states we should not remove something until we understand why it was put there in the first place. So even if you are pretty sure that a piece of code is no longer used, it’s worth checking the Git log before deleting it. It’s not always possible to git a clear answer, but it’s worth trying.

Find changes with the pickaxe, git log -S

git log’s -S flag takes a string, and limits the log to commits that only added or removed that string. The flag is also known as “the pickaxe”, since it helps you mine the log. Hi-ho, it’s back through the log we go!

It’s useful to combine -S with (at least):

  • -p (--patch), which shows commit diffs, so you can see which lines were affected.
  • --stat, which shows stats, so you can see which files were changed in a commit.

(I use an alias which sets these flags and some others.)

For example, to look for use of the “splash” class, you could run:

$ git log -p --stat -S splash
...
commit 01234567890abcdef01234567890abcdef012345
Author: Adam Johnson <...>
Date:   Tue Sep 3 13:15:18 2019 +0000

    Remove splash screen template
---
 example/templates/splash.html  | 83 -----------------------------------------------------------------------------------
 1 files changed, 83 deletions(-)

diff --git a/example/templates/splash.html b/example/templates/splash.html
deleted file mode 100644
index f5902e35f..000000000
--- a/example/templates/splash.html
+++ /dev/null
@@ -1,83 +0,0 @@
-{% extends 'modal.html' %}
-
-{% block content %}
-  <div class="splash">
...

If you’re in luck you’ll find a “smoking gun” commit like the above. Looks like the relevant HTML was removed in 2019, but someone forgot to also remove the corresponding CSS!

If you’re not so lucky, you may see many unrelated results. These false positives mean you have to look back through several commits to find relevant results. You can help avoid this by again using pathspecs to limit the search:

$ git log -p --stat -S splash -- '*.html' '*.js'

…or:

Use a regex with --pickaxe-regex

By default -S performs plain string matching. But if you add --pickaxe-regex, the search string will be interpreted as a regular expression, for great search power.

For example, imagine using -S had many false positives due to words containing “splash”, like “unsplash” or “splashing”. You could skip over these results by using a regex search:

$ git log -p --stat --pickaxe-regex -S '\bsplash\b'

This regex uses the \b special sequence to check for word boundaries, so it will only match the whole word “splash”.

Note there is also the -G flag for git log, which searches for a regex. The difference is that it matches against the full commit diff, not just the changed lines. Thus, it can give some false positives from unrelated but proximate code changes.

Check out history with git blame

A final tool is git blame, which shows git history per line of a file. You can use this to check when a potentially unused piece of code was last touched, and follow it back to when it was first added.

For example, to try find the history of the CSS class, you might blame the file:

$ git blame -- example/static/main.css
abcdef1234 (Adam Johnson 2019-01-01 12:34:56 +0100   1) * {
abcdef1234 (Adam Johnson 2019-01-01 12:34:56 +0100   2)  box-sizing: border-box;
...
1234abcdef (Adam Johnson 2020-02-02 20:20:02 +0000  99) .splash {
3456abcdef (Adam Johnson 2020-04-02 16:20:02 +0000 100)   padding: 2em;
..

It looks like the .splash rule may have been added in 1234abcdef. You can use git show 1234abcdef to see that commit. If this commit didn’t add the rule, and instead just moved it around or adjusted the line in some way, you can check the older blame to go back further in time. To do so, pass the commit hash with ~1, to mean “the commit before”:

$ git blame 1234abcdef~1 -- example/static/main.css

You can then iterate until you have found satisfactory history.

I find it easier to use a GUI for this task, since it is fiddly to iteratively run git blame and git show. With GitHub, you can use the “blame” button on a commit page, and then click the “layered box” icon to jump to the previous blame. See the relevant docs for more info.

And finally... delete that code!

After you have investigated to your satisfaction, you can smash up Chesterton’s fence and remove that unused code. To help code reviewers and future code archaelogists, summarize your investigation in your commit description:

Remove some unused styles

  • .splash unused since 0123456789.
  • .splash-purple added in 1234abcdef, but never used.
  • ...

(Lil tip: when removing multiple things, build your commit message in a scratch file as you go.)

Then look around the room with satisfaction at a cleanup well-executed.

Fin

May your code base grow ever cleaner,

—Adam


Read my book Boost Your Git DX for many more Git lessons.


Subscribe via RSS, Twitter, Mastodon, or email:

One summary email a week, no spam, I pinky promise.

Related posts:

Tags: