« Smerity.com

About Me

My name is Stephen Merity, though I'm most commonly referred to as Smerity.

I've been lucky enough to work with fascinating people and groups over the years including the MetaMind and Salesforce ohana, Google Sydney, Freelancer.com, the Schwa Lab at the University of Sydney, the team at Grok Learning, the non-profit Common Crawl, and IACS @ Harvard. You can read a more full history in my resume or at LinkedIn. Feel free to contact me in person at smerity@smerity.com or stalk me on my various social networks!

Publications

An Analysis of Neural Language Modeling at Multiple Scales (2018) [pdf]
Stephen Merity*, Nitish Shirish Keskar*, Richard Socher (* equal contribution)
arXiv

Scalable Language Modeling: WikiText-103 on a Single GPU in 12 hours (2018)
Stephen Merity*, Nitish Shirish Keskar*, James Bradbury, Richard Socher (* equal contribution)
SysML

Regularizing and optimizing LSTM language models (2018) [pdf]
Stephen Merity*, Nitish Shirish Keskar*, Richard Socher (* equal contribution)
ICLR 2018

Revisiting Activation Regularization for Language RNNs [pdf]
Stephen Merity, Bryan McCann, Richard Socher
ICML 2017

A Flexible Approach to Automated RNN Architecture Generation [pdf]
Martin Schrimpf, Stephen Merity, James Bradbury, Richard Socher
ICLR 2018 Workshop

Quasi-Recurrent Neural Networks (2016) [pdf]
James Bradbury*, Stephen Merity*, Caiming Xiong, Richard Socher (* equal contribution)
ICLR 2017

The WikiText Long Term Dependency Language Modeling Dataset (2016) [dataset]
Stephen Merity, Caiming Xiong, James Bradbury, Richard Socher

Pointer Sentinel Mixture Models (2016) [pdf]
Stephen Merity, Caiming Xiong, James Bradbury, Richard Socher
ICLR 2017

Dynamic Memory Networks for Visual and Textual Question Answering (2016) [pdf]
Caiming Xiong*, Stephen Merity*, Richard Socher (* equal contribution)
ICML 2016

Integrated Tagging and Pruning via Shift-Reduce CCG Parsing (2011) [pdf][bib]
Stephen Merity (supervisor: Dr James R. Curran)
Honours Thesis (First Class + University Medal), The University of Sydney, Sydney, Australia

Best Student Presentation: Frontier Pruning for Shift-Reduce CCG Parsing (2011) [pdf][bib][presentation]
Stephen Merity and James R. Curran
Proceedings of the 2011 Australasian Language Technology Association Workshop, ALTA 2011

Accurate Argumentative Zoning with Maximum Entropy models (2009) [pdf][bib]
Stephen Merity, Tara Murphy and James R. Curran
Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, ACL-IJCNLP 2009, pages 19-26

Talks

State-of-the-Art Large Scale Language Modeling in 12 Hours With a Single GPU (March 2018)
NVIDIA GPU Technology Conference, San Jose (California)

Backing off towards simplicity - why complex isn't necessarily better in deep learning (October 2017)
Data Institute SF Annual Conference, San Francisco (California)

Backing off toward simplicity: Understanding the limits of deep learning (September 2017)
O'Reilly AI Conference, San Francisco (California)

Attention and Memory in Deep Learning Networks (May 2017)
ai.bythebay.io, San Francisco (California)

Attention and Memory in Deep Learning Networks (March 2017)
Strata + Hadoop World 2017, San Jose (California)

New Methods for Memory, Attention, and Efficiency in Neural Networks (November 2016)
Netflix, Los Gatos (California)

The Pointer Sentinel Mixture Model (October 2016)
Stanford University's Deep Learning Reading Group, Palo Alto (California)

The Frontiers of Memory and Attention in Deep Learning (September 2016)
Quora, Mountain View (California)

Dynamic memory networks for visual and textual question answering (April 2016)
NVIDIA GPU Technology Conference 2016, San Jose (California)

Dynamic memory networks for visual and textual question answering (March 2016)
Strata + Hadoop World 2016, San Jose (California)

Using the whole web as your dataset (Video Link) (July 2015)
Dato Science Summit & Dato Conference 2015, San Francisco (California)

Internet Scale Analytics With Common Crawl (Video Link) (May 2015)
Big Data, Analytics & Machine Learning Israeli Innovation Conference, Tel Aviv

A Web Worth of Data: Common Crawl for NLP (Video Link) (April 2015)
Text By The Bay, San Francisco (California)

Common Crawl for NLP (November 2014)
Web-Scale Natural Language Processing in Northern Europe, Oslo

Experiments in web scale data (November 2014)
Big Data Beers, Berlin

AWS at Common Crawl (October 2014)
Advanced Amazon Web Services, San Francisco (California)

Measuring the impact: Google Analytics (July 2014)
Open Data Bay Area, San Francisco (California)

Machine Learning Made Scary (All New Content from Cyberdyne Systems & Aperture Labs) (March 2013)
Sydney DataPreneurs, Sydney

Data Science for Managers (February 2013)
General Assembly, Sydney

Start-up Metrics (Smetrics?): Inspiration from Dave McClure & David Jones (December 2012)
Incubate.org.au, Sydney

Machine Learning for your Robotic Army: A Crash Course using Python's Scikit-Learn (October 2012)
Sydney Python (SyPy), Sydney

NCSS

A true about me would have to mention the National Computer Science School (NCSS), a summer school run at the University of Sydney involving talented high school students from all over Australia. Having had the privilege of going there eight times (first in 2007 as a student, then in 2008 as a returning student and finally as a tutor in 2009, 2010, 2011, 2012, 2013 and 2014), it has been one of the most delightful experiences of my entire life.

Over the course of a little over a week, students are introduced to a programming language, taken step by step through a set of educational challenges, and then work together to launch a fully fledged working product. This product has been a search engine, a social network, a group of maze navigating robots and every variation inbetween!

If you're a student or know someone that general age, sign up to NCSS or into NCSS Challenge!