Project Malmo competition returns with student organizers and a new mission: To democratize reinforcement learning

Published August 9, 2019

By Noboru Sean Kuno , Senior Research Program Manager

Share this page

When I was asked about my favorite movie in a game with friends after my wedding ceremony, I replied Star Wars. That was about two decades ago, and, yes, it’s still the case. I especially like Return of the Jedi. The third installment in the original trilogy is almost perfect to me. Luke Skywalker returns to fight back against the Empire as a member of the Rebel Alliance with the help of his old friend Han Solo and new friends the Ewoks. It’s must-see as far as I’m concerned. Third stories have proven to be special in other franchise masterpieces, too, such as The Lord of the Rings, Back to the Future, and Indiana Jones.

The MineRL competition (opens in new tab) is the third in a trilogy of a different sort—contests based on Project Malmo (opens in new tab), an AI experimentation platform built on top of Minecraft—and it’s distinguishing itself from other contests and its Malmo predecessors in really exciting ways.

MineRL is the first of its kind to put a premium on agent training efficiency, and we believe it’s the first competition to explicitly take advantage of an approach that combines reinforcement learning and imitation learning with a large dataset. And while The Malmo Collaborative AI Challenge (opens in new tab) in 2017 was organized by Microsoft and The Multi-Agent Reinforcement Learning In Malmo (MARLO) Competition (opens in new tab) in 2018 was co-organized by Microsoft, Queen Mary University of London, and CrowdAI, now AIcrowd, this year’s competition was proposed by and is based on the work of students from Carnegie Mellon University.

The power of competition

CMU PhD student William Guss (opens in new tab), the competition’s lead organizer, has long been interested in doing machine learning in Minecraft, drawn to the game by the ability of its open-world environment to reflect the nature of real-world tasks and challenges. It’s why researchers here at Microsoft Research like it, too. William was intrigued by Project Malmo, but saw there were limitations in current reinforcement learning tools and methods that were making it difficult to fully take advantage of the unique training ground provided by the game and platform. State-of-the-art reinforcement learning systems require rapidly increasing amounts of samples and computing resources, making it hard to replicate and improve those systems let alone apply them in the real world. Additionally, the reward functions reinforcement learning employs aren’t conducive to specifying the kind of general intelligence researchers hope their agents can eventually achieve.

In response, William, Brandon Houghton, and several other CMU students developed technology to record the completion of various tasks in Minecraft, creating a large-scale dataset of human demonstrations called MineRL-v0 (opens in new tab). They realized, though, the dataset wouldn’t be nearly as valuable without more efficient algorithms to use it. Having seen the success of machine learning competitions such as the ImageNet challenge (opens in new tab) in galvanizing research in a particular direction, they began considering a competition designed around sample-efficient and imitation-based reinforcement learning using their dataset. With this in the back of their minds and ready to release the dataset, they reached out to Microsoft about collaborating in general. Both parties came to the realization that partnering for a competition was a natural fit, and the MineRL competition was born.

Making AI more inclusive

In the competition—which is in partnership with Queen Mary University of London (opens in new tab), AIcrowd (opens in new tab), Preferred Networks (opens in new tab), and Microsoft—participants have to develop a system to obtain a diamond in Minecraft using only four days of training time and no more than 10 million samples. To put the challenge into perspective, it’s taken between 44 million and more than 200 million samples to train deep reinforcement learning models to play ATARI 2600 games as well as a person. These imposed training limitations are important to encouraging efficiency, which the CMU team envisions serving the larger goal behind the competition’s design: the democratization of reinforcement learning.

Reinforcement learning is so data-dependent that only those with access to such resources are able to work in and make contributions to the space, limiting the scope and pace of advancement. Inclusivity is so integral to what the competition is trying to accomplish that its infrastructure includes computational and travel grants, provided by Microsoft, to support those underrepresented in the research community in participating in the competition and traveling to the 2019 Conference on Neural Information Processing Systems (NeurIPS) (opens in new tab). MineRL is part of the NeurIPS competition track (opens in new tab), and the CMU team will host a workshop showcasing methods from the competition at the conference. As William put it, “The concentration of computational power and resources to those currently within the field and already with the means to research reinforcement learning, in some sense, impacts those underrepresented communities the most.” With MineRL, the CMU team hopes to lower the barriers of entry by changing the current state of reinforcement learning by making it more sample efficient.

Get in on the competition

The first round of the competition is open on the AIcrowd platform (opens in new tab), and submissions are being accepted until October 25, 2019. The CMU team’s MineRL Python package, which includes a Malmo extension and tools for downloading the MineRL-v0 dataset, has already been downloaded more than 10,000 times, and more than 700 teams have signed up for the competition, the most sign-ups for a NeurIPS competition. “Seeing the work that we’ve put into this competition having a tangible effect on the research community has been the most fulfilling aspect of organizing,” William told us.

If you want to learn more about MineRL-v0—which is more than 60 million samples strong—check out the paper “MineRL: A Large-Scale Dataset of Minecraft Demonstrations (opens in new tab).” The CMU team will be presenting the paper at the 2019 International Joint Conference on Artificial Intelligence (opens in new tab) Aug. 10–16 in Macao, China. To contribute to the dataset, visit the MineRL server (opens in new tab) that has been set up for data collection.

“Our collaboration with the team led by CMU has been fantastic,” said Katja Hofmann, Research Lead of Project Malmo and Principal Research Manager of Microsoft Research Cambridge (opens in new tab). “I am very happy to see such an exciting competition being organized on Project Malmo, which we have developed and made open source (opens in new tab) to the research community. This competition is a great example of how the platform enables a very wide range of research.”

Return of the Jedi is not just the third story of the original trilogy; it opened up the prequel trilogy and the sequel trilogy. We’re looking forward to seeing another story of ambitious students who take advantage of the Malmo platform to pursue their research agenda.

The MineRL competition organizing team

William H. Guss, Carnegie Mellon University
Mario Ynocente Castro, Preferred Networks
Cayden Codel, Carnegie Mellon University
Katja Hofmann, Microsoft Research
Brandon Houghton, Carnegie Mellon University
Noboru Kuno, Microsoft Research
Crissman Loomis, Preferred Networks
Keisuke Nakata, Preferred Networks
Stephanie Milani, University of Maryland, Baltimore County and Carnegie Mellon University
Sharada Mohanty, AIcrowd
Diego Perez Liebana, Queen Mary University of London
Ruslan Salakhutdinov, Carnegie Mellon University
Shinya Shiroshita, Preferred Networks
Nicholay Topin, Carnegie Mellon University
Avinash Ummadisingu, Preferred Networks
Manuela Veloso, Carnegie Mellon University
Phillip Wang, Carnegie Mellon University