Gestalt Principles for Data Visualization

Common Fate, Parallelism and Connectedness

Introduction

Movement and implication of movement can improve the ability of readers to identify patterns being displayed but can also damage that ability if not mindfully implemented. Understanding the gestalt principles related to movement and implied movement is critical to well-crafted data visualization. The basic symbols in use throughout these examples are simply lines and circles but how they move and their angle and their visual relationship with each other will produce different visual structures.

Even in charts that don't utilize motion, these principles are important. Graphical primitives such as lines have slope, a sort of visual potential energy, and can mean one kind of relationship in the context of a line chart and a very different kind in the context of a dendrogram or network visualization. Well-designed charts account for unintended disruption of knowledge transmission due to the graphics being interpreted in one way when the chart utilizes them in another.

Common Fate

Of the traditional gestalt principles, common fate stands out because it refers to visual structure apparent in animated graphics. When we see shapes moving, we group together the shapes that are moving in the same direction and speed.

In the case of this example, this signal is accurate. If you examined the code, you'd see there are two groups of circles being called and animated. If it were the case that the movement of circles is accidentally aligned, or external factors not relevant to the data caused shapes to be moved as a group, then this would be a false (and strong) visual signal being sent to a reader.

Parallelism

If we draw the paths of the animated circles, we see another principle available for mis/use. Parallelism can be thought of as the fossilized animation of elements, as lines with the same or very similar slopes are visually associated as being part of the same group. Here we see two groups, mirroring the same two groups from the common fate example.*

Charts using lines in a traditional way--to show time series--see parallelism always working in their favor. Shared slopes are a meaningful grouping, as it indicates a shared pattern of increase or decrease over time. It's particularly prominent in slopegraphs, which rely on parallelism as their main grouping method. If this were a slopegraph, then we would see three things decreasing in value at the same rate while two things increase in value at the same rate, and those groups would be more meaningful than the fact that two of the datapoints share almost the same destination (the third and fourth lines).

But parallelism is also a key principle to account for in other methods of data visualization. That's because so many charts use lines to indicate relationships between elements. In these cases, a line is a topological indicator, showing that two elements are explicitly connected.

* We might also see a triangle with radiating bars (an unfortunate side effect of the size of the example). The ability to visually shift between two groups of three parallel lines and the triangle with radiating bars points to the importance of multistability in data visualization, a more complex topic for a later time.

Connectedness

To undestand this better, we need to save our lines and bring back our circles and now bring a few more circles for the ends of the lines that are lacking circles. This effectively merges the two lines and three circles in the lower middle of the screen via the principle of connectedness.

There now exist four shapes, three of which are made up of two circles and a line and one shape that is made up of three circles and two lines. This could be a very simple network diagram showing people connected to friends. Or it might be a diagram showing where minerals are extracted and where they're refined. In either case, the strong signal coming from the layout of the lines and circles makes a reader want to assume that circles near each other are somehow related and that those connected by similarly sloped lines are related.

For this reason, unless there's an unambiguous directionality to a visualization of connectedness, the elements are typically not laid out in discrete steps like this. A genealogical chart, for example, would make sense having steps such as these. For data visualization of networks where there are no obvious steps, a force-directed network layout is a common choice.

Networks

Network visualization is roundly criticized in part because it contains so much noise from these principles. While the basic concept of having symbols like circles represent nodes and lines representing connections between nodes seems to be an attractive one, readers are often befuddled by anything more than a simple visual representation of a network. One primary reason for this is that proximity does not necessarily indicate similarity in network visualization. But most of the other difficulties stem from the principles described in this essay.

Connectedness is problematic in network visualizations because some nodes are placed above unrelated edges. False parallelism signals are always at play because node position is only meaningful in a relative sense (this is not a geographic map where position is latitude and longitude) and so lines having the same angles are more likely not related to each other rather than related. During the animation phase of force-directed network layouts, the trajectory of nodes might correctly indicate groups (when connected nodes are dragged along for the ride with a more central node they are conncted to). In complicated networks with many factors at play, this movement can also indicate false structures.

Complex Edges

Adjustment of the settings of the force algorithm can help combat problems brought on by connectedness. Parallelism is less easily handled. One possible solution is to represent connections with curved edges instead of straight lines. Sometimes curved edges also encode the directionality of the connection. It is harder to see parallels of curves and so visually they dampen the parallelism signal and become more like the generic markers of topology that they are.

A problem of curved edges is that it increases the length of the lines and therefore increases the graphical space being given to lines on a chart. A line on a dendrogram or network visualization that is stretched out across the screen is not necessarily more important or powerful than a line that is very short. In fact, it is more commonly the case that these lines are less powerful, though possibly more significant due to their representing a connection between otherwise disparate elements. Curving the edges further increases the amount of ink per datapoint.

Edge bundling provides a more sophisticated method for disrupting parallelism and better aligning the graphical display of topological relationship with the data. It also avoids the problem of increasing graphical signal of edges by collapsing them into shared arteries. Unfortunately, like many complex data visualization methods, edge bundling is both difficult and expensive to implement.

Conclusion

Complex data visualization methods often use motion and explicit graphical representations of connectedness. In those charts, remember that the whole is not greater than the sum of the parts but rather the whole is other than the sum of the parts. The individual graphical elements, and the subaltern wholes that those elements create due to connection, parallelism or common fate (or due to other gestalt principles) will always exist for a reader. These will prove attractive escapes from trying to understand the overall pattern if it is presented poorly, further increasing the challenge of using complex data visualization. This is why you see readers fixate on a particularly interesting cluster of a network visualization, or a prominent path in a dendrogram, and declare it to be valuable and powerful while other readers insist that the whole thing is an unreadable "hairball".

Elijah Meeks - April 2015