« Smerity.com

Hunting through the ICLR 2017 submissions

Important note: There are far too many papers for me to have accurately selected all of the interesting ones! Every time I go through the list again, I find an additional set of papers for me to read. This is dangerous as this is already a formidable list ;)

If you're interested in tackling the list of papers yourself, check out the ICLR 2017 Conference Track submissions. Bonus points if your over-eagerness to read all the papers crashes the web server again ;)

When I add (biased) it's as I'm an author on the paper or a colleague of one of the authors. I will note that I believe my bias to have a good foundation - my colleagues and I produce good work ;)

With a basis in math

Word Vectors

Tying word vectors has been shown insanely highly effective to language models. This advantage likely extends to other specifics tasks as well.

Recurrent Neural Networks

The "sparse things are better things" category:

  • Training Long Short-Term Memory with Sparsified Stochastic Gradient Descent
    Not directly applicable yet but it involves Nvidia researchers and they explicitly note that "These redundant MAC operations can be eliminated by hardware techniques to improve the energy efficiency and the training speed of LSTM-based RNNs" ;)
  • Exploring Sparsity in Recurrent Neural Networks
    Baidu have a long history of optimizing RNNs. This work reduces the size of RNN weights by up to 90%, resulting in a speed-up of 2-7x. This is also highly useful for mobile where memory accesses (and hence model size) are a primary drain on battery life (h/t to Mat Kelcey for telling me that).

Training recurrent neural networks is still fraught with terror. I've written previously about orthogonality in RNN weights. These works explore the recurrence within RNNs through these lenses.

Machine Translation

Doesn't quite fit category