waldo

  testthat, waldo

  Hadley Wickham

We’re stoked to announce the waldo package. waldo is designed to find and concisely describe the difference between a pair of R objects. It was designed primarily to improve failure messages for testthat::expect_equal(), but it turns out to be useful in a number of other situations.

You can install it from CRAN with:

install.packages("waldo")

waldo basics

There’s really only one function in waldo that you’ll ever use: waldo::compare(). Its job is to take a pair of objects and succinctly display all differences. When comparing atomic vectors, compare() uses the diffobj package by Brodie Gaslam to show additions, deletions, and changes:

# addition
compare(c("a", "b", "c"), c("a", "b"))

#> `old`: "a" "b" "c"
#> `new`: "a" "b"


# deletion
compare(c("a", "b"), c("a", "b", "c"))

#> `old`: "a" "b"    
#> `new`: "a" "b" "c"


# modification
compare(c("a", "b", "c"), c("a", "B", "c"))

#> `old`: "a" "b" "c"
#> `new`: "a" "B" "c"

Large vectors with small changes only show a little context around the changes, not all the parts that are the same:

compare(c("X", letters, letters), c(letters, letters, "X"))

#> `old[1:4]`: "X" "a" "b" "c"
#> `new[1:3]`:     "a" "b" "c"
#> 
#> `old[51:53]`: "x" "y" "z"    
#> `new[50:53]`: "x" "y" "z" "X"

Depending on the size of the differences and the width of your console you’ll get one of three displays. The default display shows the vectors one atop the other. If there’s not enough room for that, the two vectors are shown side-by-side. And if there’s still not enough room for side-by-side, then each element is shown on its own line:

with_width <- function(width, code) {
  withr::local_options(width = width)
  code
}

old <- c("x", "y", "a", "b", "c")
new <- c("y", "a", "B", "c", "d")

with_width(80, compare(old, new))

#> `old`: "x" "y" "a" "b" "c"    
#> `new`:     "y" "a" "B" "c" "d"

with_width(20, compare(old, new))

#>     old | new    
#> [1] "x" -        
#> [2] "y" | "y" [1]
#> [3] "a" | "a" [2]
#> [4] "b" - "B" [3]
#> [5] "c" | "c" [4]
#>         - "d" [5]

with_width(10, compare(old, new))

#> old vs new
#> + "x"
#>   "y"
#>   "a"
#> - "B"
#> + "b"
#>   "c"
#> - "d"

As you can see, in situations where colour is available, additions are coloured in blue, deletions in yellow, and changes in green.

Nested objects

For more complex objects, waldo drills down precisely to the location of differences, using R code to describe their location. Unnamed lists show the position of changes:

compare(list(factor("x")), list(1L))

#> `old[[1]]` is an S3 object of class <factor>
#> `new[[1]]` is an integer vector (1)

But most complex lists have names, so if they’re available waldo will use them:

compare(
  list(x = list(y = list(z = 3))),
  list(x = list(y = list(z = "a")))
)

#> `old$x$y$z` is a double vector (3)
#> `new$x$y$z` is a character vector ('a')

If named valued are the same but with different positions, waldo just reports on the difference in names:

compare(
  list(x = 1, y = 2),
  list(y = 2, x = 1)
)

#> `names(old)`: "x" "y"    
#> `names(new)`:     "y" "x"

waldo also reports on differences in attributes:

compare(
  structure(1:5, a = 1),
  structure(1:5, a = 2)
)

#> `attr(old, 'a')`: 1
#> `attr(new, 'a')`: 2

And can recurse arbitrarily deep:

x <- list(a = list(b = list(c = structure(1, d = factor("a")))))
y <- list(a = list(b = list(c = structure(1, d = factor("a", levels = letters[1:2])))))
compare(x, y)

#> `levels(attr(old$a$b$c, 'd'))`: "a"    
#> `levels(attr(new$a$b$c, 'd'))`: "a" "b"

To illustrate how you might use waldo in practice, I include two case studies below. They both come from my colleagues at RStudio, who have been trying it out prior to its public debut.

Case study: GitHub API

The first case study comes from Jenny Bryan. She was trying to figure out precisely what changed when a certain request to the GitHub API was performed with and without authentication:

# Use default auth
x1 <- gh::gh("/repos/gaborcsardi/roxygenlabs")
# Suppress auth
x2 <- gh::gh("/repos/gaborcsardi/roxygenlabs", .token = "")

# Strip part of the results that might expose my GitHub credentials
attr(x1, "response") <- NULL
attr(x1, ".send_headers") <- NULL
attr(x2, "response") <- NULL
attr(x2, ".send_headers") <- NULL

The individual objects are rather complicated!

str(x1, list.len = 10)

#> List of 77
#>  $ id               : int 229545533
#>  $ node_id          : chr "MDEwOlJlcG9zaXRvcnkyMjk1NDU1MzM="
#>  $ name             : chr "roxygenlabs"
#>  $ full_name        : chr "gaborcsardi/roxygenlabs"
#>  $ private          : logi FALSE
#>  $ owner            :List of 18
#>   ..$ login              : chr "gaborcsardi"
#>   ..$ id                 : int 660288
#>   ..$ node_id            : chr "MDQ6VXNlcjY2MDI4OA=="
#>   ..$ avatar_url         : chr "https://avatars3.githubusercontent.com/u/660288?v=4"
#>   ..$ gravatar_id        : chr ""
#>   ..$ url                : chr "https://api.github.com/users/gaborcsardi"
#>   ..$ html_url           : chr "https://github.com/gaborcsardi"
#>   ..$ followers_url      : chr "https://api.github.com/users/gaborcsardi/followers"
#>   ..$ following_url      : chr "https://api.github.com/users/gaborcsardi/following{/other_user}"
#>   ..$ gists_url          : chr "https://api.github.com/users/gaborcsardi/gists{/gist_id}"
#>   .. [list output truncated]
#>  $ html_url         : chr "https://github.com/gaborcsardi/roxygenlabs"
#>  $ description      : chr "Experimental roxygen tags and extensions"
#>  $ fork             : logi FALSE
#>  $ url              : chr "https://api.github.com/repos/gaborcsardi/roxygenlabs"
#>   [list output truncated]
#>  - attr(*, "method")= chr "GET"
#>  - attr(*, "class")= chr [1:2] "gh_response" "list"

While all.equal() identifies that there is a difference, it doesn’t make it easy to see what the difference is:

all.equal(x1, x2)

#> [1] "Names: 3 string mismatches"                           
#> [2] "Length mismatch: comparison on first 76 components"   
#> [3] "Component 74: Modes: list, NULL"                      
#> [4] "Component 74: Lengths: 3, 0"                          
#> [5] "Component 74: names for target but not for current"   
#> [6] "Component 74: current is not list-like"               
#> [7] "Component 75: Modes: character, numeric"              
#> [8] "Component 75: target is character, current is numeric"
#> [9] "Component 76: Mean relative difference: 0.5"

waldo makes it easy: the request with auth returns a new key that contains the permissions.

waldo::compare(x1, x2)

#> `old` is length 77
#> `new` is length 76
#> 
#>      names(old)          | names(new)              
#> [71] "open_issues"       | "open_issues"       [71]
#> [72] "watchers"          | "watchers"          [72]
#> [73] "default_branch"    | "default_branch"    [73]
#> [74] "permissions"       -                         
#> [75] "temp_clone_token"  | "temp_clone_token"  [74]
#> [76] "network_count"     | "network_count"     [75]
#> [77] "subscribers_count" | "subscribers_count" [76]
#> 
#> `old$permissions` is a list
#> `new$permissions` is absent
#> 
#> `old$temp_clone_token` is a character vector ('')
#> `new$temp_clone_token` is NULL

Case study: Spatial data

The second case study comes from Joe Cheng who received a request from Roger Bivand to update map data bundled in the leaftlet package. Roger Bivand had helpfully provide the updated data, but Joe wanted to understand exactly what had changed:

old <- readRDS("storms-old.rds")
new <- readRDS("storms-new.rds")

Again, the individual objects are complicated:

str(old, list.len = 5, max.level = 5)

#> Loading required package: sp

#> Formal class 'SpatialLinesDataFrame' [package "sp"] with 4 slots
#>   ..@ data       :'data.frame':  24 obs. of  3 variables:
#>   .. ..$ Name    : Factor w/ 24 levels "ALPHA","ARLENE",..: 1 2 3 4 5 6 7 8 9 10 ...
#>   .. ..$ MaxWind : num [1:24] 45 60 35 65 60 130 140 75 60 45 ...
#>   .. ..$ MinPress: num [1:24] 998 989 1002 991 980 ...
#>   ..@ lines      :List of 24
#>   .. ..$ :Formal class 'Lines' [package "sp"] with 2 slots
#>   .. .. .. ..@ Lines:List of 1
#>   .. .. .. ..@ ID   : chr "1"
#>   .. ..$ :Formal class 'Lines' [package "sp"] with 2 slots
#>   .. .. .. ..@ Lines:List of 1
#>   .. .. .. ..@ ID   : chr "2"
#>   .. ..$ :Formal class 'Lines' [package "sp"] with 2 slots
#>   .. .. .. ..@ Lines:List of 1
#>   .. .. .. ..@ ID   : chr "3"
#>   .. ..$ :Formal class 'Lines' [package "sp"] with 2 slots
#>   .. .. .. ..@ Lines:List of 1
#>   .. .. .. ..@ ID   : chr "4"
#>   .. ..$ :Formal class 'Lines' [package "sp"] with 2 slots
#>   .. .. .. ..@ Lines:List of 1
#>   .. .. .. ..@ ID   : chr "5"
#>   .. .. [list output truncated]
#>   ..@ bbox       : num [1:2, 1:2] -101.4 10.7 6.6 68.8
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : chr [1:2] "x" "y"
#>   .. .. ..$ : chr [1:2] "min" "max"
#>   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
#>   .. .. ..@ projargs: chr "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

all.equal() is bit more helpful here, at least getting us to the right general vicinity:

all.equal(old, new)

#> [1] "Attributes: < Component \"proj4string\": Attributes: < Names: 1 string mismatch > >"                         
#> [2] "Attributes: < Component \"proj4string\": Attributes: < Length mismatch: comparison on first 2 components > >"
#> [3] "Attributes: < Component \"proj4string\": Attributes: < Component 2: 1 string mismatch > >"

But waldo gets us right to the change: the definition of the spatial projection has changed, and it now contains a comment with a lot more data.

waldo::compare(old, new)

#> old@proj4string@projargs vs new@proj4string@projargs
#> - "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
#> + "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
#> 
#> `comment(old@proj4string)` is absent
#> `comment(new@proj4string)` is a character vector ('GEOGCRS["unknown",\n    DATUM["World Geodetic System 1984",\n        ELLIPSOID["WGS 84",6378137,298.257223563,\n            LENGTHUNIT["metre",1]],\n        ID["EPSG",6326]],\n    PRIMEM["Greenwich",0,\n        ANGLEUNIT["degree",0.0174532925199433],\n        ID["EPSG",8901]],\n    CS[ellipsoidal,2],\n        AXIS["longitude",east,\n            ORDER[1],\n            ANGLEUNIT["degree",0.0174532925199433,\n                ID["EPSG",9122]]],\n        AXIS["latitude",north,\n            ORDER[2],\n            ANGLEUNIT["degree",0.0174532925199433,\n                ID["EPSG",9122]]]]')