Unleashing the Power of AI: Revolutionizing Rubik’s Cube Solving

Robert Bates
4 min readJul 17, 2023
My Cubotino

Imagine a world where a simple toy like the Rubik’s Cube can be conquered effortlessly by a robot. Sounds like a distant dream, right? Well, not anymore. Let me take you on a journey of how I transformed this classic puzzle into a mind-boggling spectacle using the latest advancements in Artificial Intelligence (AI) and machine learning.

Fresh out of college, I was on a mission to explore the endless possibilities that lay ahead. Eager to hone my programming skills, I stumbled upon the fascinating realm of open source projects. Among them, one project stood out — a 3D-printable, Rubik’s cube-solving robot called Cubotino. Inspired by this innovation, I embarked on my own venture to create an iOS app, which I aptly named CubePress.

CubePress became the bridge between an iOS device and the marvelous Cubotino. With this app, users could scan the Rubik’s Cube, find the solution, and wirelessly command the robot to conquer the puzzle. The journey to create CubePress led me down an exciting path of learning, especially in the realm of training machine learning algorithms.

To effectively scan the Rubik’s Cube, I turned to the power of machine learning. After all, visual recognition is one of AI’s most prominent applications, from self-driving cars to facial recognition on our smartphones. Hence, I utilized Apple’s Create ML framework to construct an image classification model. This computer algorithm excels at categorizing pictures based on their content, ultimately helping me decipher the puzzle’s complex patterns.

Traditionally, scanning a Rubik’s Cube involved multiple steps: taking pictures of each side, extracting individual squares, and determining their respective faces. However, I wanted to break free from the limitations imposed by color-dependent identification. By employing machine learning, my app could detect any visual identifier — patterns, numbers, or symbols — rendering the color scheme irrelevant.

A color-dependent identification approach would work with the left cube, but not the right.

But here’s the catch: machine learning thrives on one thing — data. And lots of it. To achieve the high accuracy I desired, I needed to collect thousands of examples for each face piece. Imagine the painstaking process of individually photographing over 12,000 images! Luckily, I devised ingenious techniques to streamline this data collection.

Leveraging the Cubotino’s ability to rotate and flip the cube, I captured all six sides using the device’s camera. Then, I extracted individual square images and fed them to my algorithm. With the help of a solved cube in a known starting orientation, sorting the incoming images became a breeze. Furthermore, I utilized special images generated by combining multiple pictures from various parts of the cube. This creative approach allowed me to generate multiple “new” training images without additional photoshoots.

Some examples of test images.

Creating a robust training set presented its own challenges. I had to ensure that my model could adapt to different conditions, so I focused on varying three variables: light level, subject alignment, and cube type. Over-representing specific conditions would blind the algorithm to crucial details. To avoid this, I balanced my training set by adding or removing images during an iterative process. Through these efforts, I managed to elevate the model’s accuracy from around 65% to an impressive 95%.

To simplify the extraction of individual squares, I devised an alignment method using the smartphone’s camera feed. Users would align their device with the cube, following the on-screen boxes’ guidance. Though alignment could be slightly off due to slight hand movements, the machine learning approach compensated for these discrepancies.

When it came to feeding the algorithm, Apple’s Create ML interface proved to be a user-friendly tool. Supplying a folder with sub-folders containing example images of squares, I transformed each sub-folder name into a training category. The interface offered various options to augment the training data, but I treaded carefully. Altering the images could both enhance and mislead the model’s accuracy, so I kept the alterations to a minimum.

What makes machine learning truly remarkable is its ability to perceive the world differently from human eyes. By manipulating the training images, I drew attention to crucial aspects and obscured accidental ones. I essentially gave my model a compound eye with lenses seeing colors differently. This diverse perspective significantly enhanced its accuracy in extrapolating the actual colors.

What started as a mere side project soon blossomed into an extraordinary app, thanks to the power of machine learning. The versatility of this approach opened doors to novel possibilities, revolutionizing how we conquer complex puzzles. Of course, constructing a robust training set was no small feat, requiring clever techniques to tackle vast amounts of data and avoid algorithmic bias. Machine learning offers incredible potential in deciphering intricate trends that defy conventional definition, but it also demands careful guidance to teach it what to look for.

As I continue on this AI-driven journey, my aim remains to push boundaries and unlock new realms of possibility. Who knows what exciting frontiers lie ahead? One thing is for sure: with the right data, a dash of ingenuity, and the power of AI, we can transform the ordinary into the extraordinary.

--

--