Nobody's just reading your code

now about contact

Feb 25, 2018

Nobody's just reading your code

A guest post by Stephen Malina, my partner in crime on Mu.

Most programmers agree that we don't read enough code. The interviews in Peter Seibel's book, “Coders at work” highlight a comical contradiction: almost all the programmers interviewed by Seibel recommend that others read code for fun, but none of them routinely do so themselves. Seibel even asked Hal Abelson (of SICP fame) directly about this phenomenon:

“I want to dig a little deeper on this. You, like many other people, say programmers should read code. Yet when I ask what code have you read for fun or edification, you—also like many other people—answer that you read students’ code, which is your job, and review code at Google, which is also your job. But it doesn’t sound like you sit down of an evening with a nice printout and read it.

Seibel, James Hague and others have all tried to justify why code reading is so uncommon, and they make good points. But perhaps the conversation is led astray by use of the word ‘read’. I wonder if Abelson and the others would have had more examples if Seibel had asked them what code they had learned about for fun. Perhaps the word ‘read’ put them in a passive frame of mind, causing them to filter out programs they'd hacked on?

We all read code already; it’s just that we usually read when we want to edit. And the comprehension that questions about reading are really concerned with—it comes from both reading and writing, interleaved in complex ways.

That hacking produces better comprehension than passive, linear reading fits with what we know about learning. Barbara Oakley, Herbert Simon, Cal Newport, and Anders Ericsson all describe how solid understanding emerges from active exploration, critical examination, repetition, and synthesis. Hacking beats passive reading on three out of four of these criteria:

Active exploration: When you hack, you want to eventually produce a change in the codebase. This desire guides your path through the code. When you read passively you let the code’s linear flow guide you.
Critical examination: When you hack, you evaluate existing code in light of the change you want to make. Deciding what to use and remove keeps you from accepting the existing system as canon. When you read linearly, you lack a goal against which you can critically examine the existing code.
Synthesis: To change the program as you desire, you synthesize existing code with new code.
Repetition: Neither hacking nor linear reading involve useful repetition, unless you treat your change to make like a kata and mindfully re-implement it multiple times.

Learning through hacking also leverages the natural structure of a codebase. Good books guide their readers through series of questions and their answers, but codebases are inherently non-linear, like a map. You can ask an infinite number of questions of a map. How far is it from A to B? Which is the nearest town to C? But you can’t expect a map to tell you what questions to ask, and it makes no sense to read a map linearly from top to bottom, left to right.

Reframing reading as ‘navigation’ suggests that our conventional discussions of clean code and interfaces ignore the things that actually make unfamiliar code accessible to outsiders. Clean, solidified abstractions are like well-marked, easy-to-follow paths through a forest — very useful if they lead in the direction we need to go, but less useful when we want to blaze arbitrary new paths through the forest.

Instead, let's focus on guiding exploration, making it easier for readers to answer their own questions about codebases. I’m still figuring out how to do this; so far I have just a couple of preliminary ideas:

Suggest features in your code that make good exercises for re-implementation. Provide an initial Git commit without the feature, give them hints where necessary, and link them to the actual change plus others’ attempts at producing it.
Rather than conceiving of documentation as something that explains individual modules, focus on overviews of how the modules fit together (like Fabien Sanglard's for Git).

Afterword

Others have explored similar ideas from different perspectives:

“How To Be A Hacker”: Eric Raymond discusses what he calls the “the incremental-hacking cycle”, a process by which someone gradually expands their understanding of a codebase by making bigger and bigger changes to it.
“How to read a math textbook”: David Maciver describes a problem- and theorem-driven approach for learning math which you could adapt to reading programs.
“The Benjamin Franklin method of reading programming books”: James Koppel's take on Anders Ericsson.

comments

Kartik Agaram, 2018-02-27: This was a fun post to watch develop, not least because it proved such fertile ground for forming new associations in my thinking around the subjects of programming literacy and comprehending the global structure of codebases. For example:

a) Conventional approaches to making codebases readable are like still lifes or depictions of a single scene. But since codebases are non-linear, what they need is something more like a mural. Something that can be read in many different ways. Like a Thangka (https://en.wikipedia.org/wiki/Thangka), or a codex:

(from http://publicdomainreview.org/2017/12/07/pods-pots-and-potions-putting-cacao-to-paper-in-early-modern-europe) Something with multiple "centers" of narrative, to use Dave West's terminology (http://akkartik.name/davewest-ducksRhemes.pdf)

b) We typically focus on the ability to find the right information. Your post is pointing out that knowing what to ignore is under-emphasized, and the desire to modify provides a powerful "ignore this" heuristic.

c) There's the old story about how paving cowpaths is often a better way to decide where footpaths should go. (The best source I could find is the second occurrence of 'footpath' in http://tomslee.net/2008/03/mr-googles-guid.html) In these terms, perhaps we programmers are prematurely paving narratives in our codebase. In the terms of James C Scott (https://www.ribbonfarm.com/2010/07/26/a-big-little-idea-called-legibility), programmers are being Authoritarian High Modernist, in assuming that they understand all the ways future readers may try to read their code. Some humility for the inherent illegibility of the activity of programming (particularly how other people read code) may be in order.

d) There's a body of work on how the human brain comprehends messes (https://www.ribbonfarm.com/2017/01/05/tendrils-of-mess-in-our-brains). If you see your codebase as a mess, that results in some set of ideas for improvements. But the jungle metaphor results in a whole other set of ideas.

e) When we programmers read code, it is purposefully but non-linearly. Like Bruce Willis in "Die Hard" (http://www.bldgblog.com/2010/01/nakatomi-space): barging through walls, moving through elevator shafts and air-conditioning ducts. However, when we write code, we assume an idealized reader who will read our creation aimlessly but in order, content to follow our lead in all regards. This is a pretty big disconnect in all our minds.

f) Often you have something useful to tell the reader, but no obvious place to put it that the reader is likely to see at the right time. Perhaps making code easier to explore is just a matter of creating Schelling points (https://en.wikipedia.org/wiki/Focal_point_%28game_theory%29).

Shalabh, 2018-02-27: Something that would really help me with new codebases is having a clear map of the what the live, running system looks like. What are the static and dynamic structures, what are the events and when are they fired, etc. The live system parts can often be very different from the static file based 'modules' and low level functions, but it is primarily how I understand the purpose and function of the system and its parts.

What would be even better is if this live system map is hyperlinked to the static source files, so I can click on an object and jump to the source class definition, for instance.

Andres Moreno, 2018-02-27: Yes! Often trying to get a system up and running in my own system is the hardest part of the battle! Even with something like a front-end framework, this involves quite a bit of spelunking!

I really appreciate repositories with clear instructions for how to build as well as perspective on the key run time decisions and complications so that I can troubleshoot effectively.

Kartik Agaram, 2018-02-27: Yes, this has long been a priority for me. With Mu, for example, you get up and running in 3 commands:

git clone https://github.com/akkartik/mu cd mu ./mu

That's it! A big reason for this is minimizing dependencies. Every library a tool needs, every compiler that it requires a specific version of, is a place where somebody will trip up and have a bad experience.

ibi sum, 2018-02-27: Professionally, Code-Reading is definitely a refined skill .. its one of those techniques that gets better, the more you do it - and it may be strenuous at first. I have found over the years (30+) I've been in the software business, that its not enough to read code - you have to build it, and run it too. Without these other steps, you're missing on a key component of comprehension: application.

That said, I believe that the resistance to code-reading as a common method is derived from an unwillingness to look things up. Too many times we read code without actually knowing whats happening behind the words and symbols - and this, I feel, is where the "Great IDE Debate" comes into play. You can read all you like - but if you can't build it, or can't get hints or even executables from the environment you're using to do the reading, then that is definitely a hindrance. But yet, IDE's like to hide a lot of details about whats really happening under the hood, and so having a comprehension of the build process is also key. Code-reading is not enough: Code-building is also significant.

So a big part of developing code-reading skills is in refining a) ones ability to use the IDE as a tool to navigate and build, and b) not depending too much on skill-a), also.

Denis Bell, 2018-03-09: I agree totally. Just scanning through code is definitely not enough. These are the five(5) things I think are needed to fully understand code: 1. Understand the problem being solved by the program/application. 2. Ensure that you are comfortable with the programming language. 2. Draw the high level components of the entire code base. Writing stuff down helps makes thing much clearer. 3. Understand the module that you are reviewing. 4. Run the code. 5. Modify the module in question.

ashok bakthavathsalam, 2018-05-16: ...and if possible, write some test cases, if the code base doesn't have any.

John Fenley, 2018-02-27: I really like the commentary audio tracks on DVDs that I buy. I've thought before about a tool to make an audio walkthrough of code... some of the explanations behind why decisions were made might be better done by starting a recording and talking a little about a section of code. It would open the door to much more insight than just the code, and may help people understand the nuance behind why something was done the way it was, as well as help make a more personal connection to the individual behind the code.

Clipart, 2019-10-11: The live system parts can often be very different from the static file based 'modules' and low level functions, but it is primarily how I understand the purpose and function of the system and its parts.

Hassan

Kartik Agaram, 2020-02-25: "..in Peter Seibel's Coders at Work, Donald Knuth was the only person interviewed who read others' programs regularly.."

https://news.ycombinator.com/item?id=18698651#18699718

Comments gratefully appreciated. Please send them to me by any method of your choice and I'll include them here.