NIPS 2017 — Day 2 Highlights

Published in

Insight

7 min readDec 6, 2017

— Emmanuel Ameisen, Ben Regner, Jeremy Karnowski

A taxonomy of Machine Learning Datasets from Kate Crawford’s bias talk

Want to learn about applied Artificial Intelligence from leading practitioners in Silicon Valley or New York? Learn more about the Insight Artificial Intelligence Fellows Program.

Are you a company working in AI and would like to get involved in the Insight AI Fellows Program? Feel free to get in touch.

We are back with some highlights from the second day of NIPS. A lot of fascinating research was showcased today, and we are excited to share some of our favorites with you. If you missed them, feel free to check our Day 1 and Day 3 Highlights!

One of the most memorable sessions of the first two days was today’s invited talk by Kate Crawford, about bias in Machine Learning. We recommend taking a look at the feature image of this post, representing modern Machine Learning datasets as an attempt at creating a taxonomy of the world. Since we already covered a talk on the topic yesterday, we’ll give the spotlight to some other topics below.

Capturing uncertainty

For many Deep Learning applications, receiving a simple prediction from our model is insufficient. When having to make life threatening decisions, such as diagnosing patients or steering a self-driving car, we would like to get a measure of how confident we are in our predictions. Unfortunately, most Deep Learning models aren’t great at effectively measuring the certainty of their predictions. Recently, the field of Bayesian Deep Learning has been growing, in part because it can address these questions through measuring the variance of a model. We present one of the results of this research below.

What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?

Alex Kendall, Yarin Gal

The authors of the paper present a unified Bayesian framework to identify and estimate two types of uncertainty:

Aleatoric uncertainty: Uncertainty due to noise inherent in the observations, such as sensor noise for example, which cannot be reduced by acquiring more data.
Epistemic uncertainty: Uncertainty due to an incomplete modeling of the data, which can be explained away by acquiring more examples.

Through modeling their network that way, they both manage to improve model performance by a couple percent, compared to non Bayesian approaches, and show high variance examples in practical self-driving application (see figure above).

Model Interpretability

Class explanation illustration from “Streaming Weak Submodularity: Interpreting Neural Networks on the Fly”

As Machine Learning models become more sophisticated, the increase in accuracy they provide often comes at the cost of being harder to interpret. In most practical cases, such as when approving loans, or predicting criminal activity, having an understanding of the factors that contributed to a model’s decision are crucial in order to debug, validate, and regulate models. This is one of the methods that is useful to detect and reduce some of the bias we see in models. A lot of research has recently focused on providing explanations for all models, and some exciting developments were presented today.

Unified approach to interpreting model predictions

Scott M Lundberg, Su-In Lee

This paper combines diverse recent approaches that have been successfully used to interpret models. Methods such as LIME repeatedly toggle inputs and store how the model prediction changes, in order to fit a linear model around a certain instance, thus providing local interpretability. The authors show that LIME and other recent methods all fall within the category of additive feature attribution methods. They then propose a unified feature importance, SHAP values (SHapley Additive exPlanation). SHAP values provide explanations that match more closely with human interpretations than LIME, while being only slightly more expensive to compute.

Streaming Weak Submodularity: Interpreting Neural Networks on the Fly

Ethan Elenberg, Alexandros Dimakis, Moran Feldman, Amin Karbasi

Here, authors attempt to optimize an other aspect of black-box explainers: their speed. In the paper presented at NIPS, a new method is introduced using an efficient streaming algorithm in order to provide explanation up to 10 times faster than the aforementioned LIME. Most of the contribution lies in the STREAK algorithm, which allows for a more efficient way to find the features that maximize the likelihood of an instance being classified in a particular class.

Since most explainers still work mostly on individual instances, a common way to get global explanations for a model is to sample many examples in a representative way, explain each of them individually, and aggregate these explanations. This process usually becomes painfully slow, so progress towards making explainers faster can have a big impact on our ability to understand models.

Beating Poker

Don’t let the complicated name fool you, the paper describe below stood out as an incredibly impressive piece of work, and was rightly rewarded with a top paper award by the NIPS reviewers. Using the methods described, the authors built an artificial intelligence named Libratus that soundly beat several professionals in a heads-up no-limit Texas hold ’em poker match.

Safe and Nested Subgame Solving for Imperfect-Information Games

Noam Brown, Tuomas Sandholm

Lets unpack the title starting from the back. In games with perfect information, the board is visible to all players. Examples include Chess and Go. Although significant technical barriers were overcome to produce human-level (or superior) performance on games with perfect information, these types of games have the advantage that as the game is played the space of possible future moves is constrained by previous moves. Thus, a subgame develops that is independent of other possible subgames, enabling a more tractable solution space.

In contrast, in games with imperfect information players are not aware of the complete game state. For example, in poker a player knows the card in their own hand, but not the hand of their opponents. These kinds of games are ubiquitous, and apply to many real life situations, like negotiations or auctions. With imperfect information, as a game develops the strategy for a given subgame may depend on a separate subgame. If you’ve ever conceded an argument to catch a friend acting hypocritically later, you’ve experienced how subgames can effect each other under imperfect information.

What are safe subgame solutions? This gets a little complicated and I encourage you to go to the paper. With imperfect information the decision space might be too large to compute all possible solutions, but a simplified abstraction can often be derived that can be solved. Using this blueprint strategy, an unsafe subgame solution assumes the opposing player will play using the simplified strategy for the abstracted game. However, this leads to easily exploitable behaviors. Going back to the argument example, you are exploiting your friend’s strategy in which they assume they will act consistently with what they say. A safe subgame solution instead assumes nothing about the opponents strategy, but does assume that the blueprint strategy is correct. This gives the player a strategy to follow, but also allows deviations from that strategy as the opponents strategy changes.

Finally, the “nested” part comes in when the opponent takes actions that are not part of the simplified abstraction. One way to handle these cases is simply map it to the nearest action in the abstracted game, and had been state-of-the-art before this paper. A better way is to generate a new subgame that itself may be a simplified abstraction. A new, nested subgames can be generated if the opponent continues to take actions outside the new abstraction. In the argument case, and this is getting a bit artificial, if your friend doesn’t express shame as expected but tries to argue a side point, your strategy of feeling self-righteous will fall apart and you’ll be forced to adapt.

Again, I encourage you to read the paper for more details. This paper solved a problem many people thought was years away from a solution, and the ideas in this paper will likely play a key role in developing increasingly sophisticated artificial intelligence. Operating in situations with imperfect information is ubiquitous in the real world, we now have even better tools to tackle them. Many people are excited about the potential for sophisticated AI, this paper suggests that’s a safe bet.

Day 2 has been packed, and we have been blown away by some of the work we’ve been able to see, we hope you enjoyed our recap. Keep an eye on our blog for more live updates in the next few days, or follow Jeremy and Manu on Twitter.