MIT's Bitcoin-Inspired 'Enigma' Lets Computers Mine Encrypted Data

MIT says it's found a new, more efficient way to blend data mining with the privacy protections of encryption.
encrypt
Getty Images

The cryptography behind bitcoin solved a paradoxical problem: a currency with no regulator, that nonetheless can’t be counterfeited. Now a similar mix of math and code promises to pull off another seemingly magical feat by allowing anyone to share their data with the cloud and nonetheless keep it entirely private.

On Tuesday, a pair of bitcoin entrepreneurs and the MIT Media Lab revealed a prototype for a system called Enigma, designed to achieve a decades-old goal in data security known as “homomorphic” encryption: A way to encrypt data such that it can be shared with a third party and used in computations without it ever being decrypted. That mathematical trick---which would allow untrusted computers to accurately run computations on sensitive data without putting the data at risk of hacker breaches or surveillance---has only become more urgent in an age when millions of users constantly share their secrets with cloud services ranging from Amazon and Dropbox to Google and Facebook. Now, with bitcoin's tricks in their arsenal, Enigma's creators say they can now pull off computations on encrypted data more efficiently than ever.1

“You can see it as a black box,” says Guy Zyskind, an MIT Media Lab graduate researcher and one of Enigma’s creators. “You send whatever data you want, and it runs in the black box and only returns the result. The actual data is never revealed, neither to the outside nor to the computers running the computations inside.”

Enigma’s technique---what cryptographers call "secure multiparty computation"---works by mimicking a few of the features of bitcoin’s decentralized network architecture: It encrypts data by splitting it up into pieces and randomly distributing indecipherable chunks of it to hundreds of computers in the Enigma network known as “nodes." Each node performs calculations on its discrete chunk of information before the user recombines the results to derive an unencrypted answer. Thanks to some mathematical tricks the Enigma creators implemented, the nodes are able to collectively perform every kind of computation that computers normally do, but without accessing any other portion of the data except the tiny chunk they were assigned.

To keep track of who owns what data---and where any given data's pieces have been distributed---Enigma stores that metadata in the bitcoin blockchain, the unforgeable record of messages copied to thousands of computers to prevent counterfeit and fraud in the bitcoin economy. (Like other bitcoin-style decentralized crypto schemes, Enigma's architecture can seem almost like a Rube Goldberg machine in its complexity. For a full technical explanation, read the project's whitepaper here. In addition to that whitepaper, Zyskind and Nathan say they plan to publish the open-source code for the project by the end of the summer.)

"I can take my age, this one piece of data, and split it into pieces, and give it to ten people," says Zyskind. "If you ask each one of those persons, they have only a random chunk. Only by combining enough of those pieces can they decrypt the original data."

Courtesy Oz Nathan

It's important to note that any new and unproven encryption scheme should be approached with caution. But if Enigma's encryption works as its creators promise, it would have vast implications. Private databases could be hosted and queried in the cloud without any risk of revealing the database's contents. It could also enable a search engine to return search results without ever seeing the user's unencrypted search request. Enigma's creators suggest the project could also enable Internet users to safely share all sorts of data with pharmaceutical companies and advertisers without any privacy risks---the companies could run computations on the encrypted data and get useful results without the access to see any specific user's data. "No one wants to give their data to some company when you don’t know what they‘ll do with it," says Oz Nathan, Enigma's co-creator. "But if you have guaranteed privacy, data analysis can be a lot more powerful. People will actually be willing to share more."

The Enigma creators are far from the first to suggest a scheme for achieving homomorphic encryption's goals; IBM researcher Craig Gentry achieved a major breakthrough in 2009 when he came up with the first fully homomorphic encryption scheme---a mathematical technique that allowed any computation to be performed on encrypted data with no security compromises and none of Enigma's complex network of distributed computers. But Gentry's method was also extremely slow: Performing a computation such as a Google search using it could take as much as a trillion times longer than doing the same task without encryption. Since then, Gentry has dramatically sped up the process, but it still multiplies the time necessary for a calculation by close to a millionfold.

Enigma's creators say their decentralized encryption process, on the other hand, only multiplies the computing requirements for a calculation by less than 100 fold. They hope to further reduce that in the near future to a tenfold increase. They also note that the computing requirements for any Enigma computation depend on the number of nodes involved. The more computers involved, the more secure the user's data, but the slower the process.

A considerable hurdle for Enigma, however, is that it requires hundreds or even thousands of users adopt the system and run its code before it can start working securely. To get that initial buy-in, Nathan and Zyskind have created an incentive scheme: Every time someone requests a computation from the Enigma network, he or she pays a bitcoin fee. A tiny part of that money is paid to a computer in the bitcoin network to record Enigma's metadata in the blockchain. But a larger portion of the fee goes to the nodes in the Enigma network as a reward for storing and processing the user's encrypted data. And the Enigma software can also be configured to reward the owner of the data, so that an Enigma customer, like an advertiser, can pay users for the privilege of mining their data---but without ever seeing it in a decrypted form.

That attempt to recruit as many nodes as possible is designed to combat a fundamental vulnerability in Enigma's scheme: If enough Enigma nodes work together, they can team up to decrypt and steal the user's data. But that kind of collusion isn't likely, says Zyskind. He compares the problem to a so-called "51 percent attack" in bitcoin, in which a majority of the bitcoin nodes collectively agree to take over the blockchain and defraud users. That sort of bitcoin attack has never occurred, Zyskind points out, and he says the same malicious collaboration problem in Enigma is even less likely.

To keep Enigma nodes honest and ensure that the nodes' computations are accurate, the system also includes a "security deposit" that each must pay in bitcoin to join the network. If a node is found by other nodes in the network to be dishonest, its deposit is seized and distributed to the other nodes. "It all balances out and kills the incentive for people to cheat," says Zyskind.

Zyskind and Nathan's adviser on Enigma is Sandy Pentland, a well known MIT data scientist who gained fame for his work in data-mining social interactions. In one experiment, for instance, Pentland's researchers put sensor devices called "sociometers" around hundreds of subjects' necks within work environments, and used the resulting data about who talked to whom and even in what tone of voice to learn lessons about what type of group within the office was most productive or who its real managers were, as opposed to those with the highest titles on the org chart.

Enigma may be able to make that mining of deeply personal data safer from a privacy perspective. "My work...has always explored a future where sensors and computers are far more ubiquitous than they are today," Pentland writes in an email to WIRED. "The advent of bitcoin changed these discussions profoundly by adding tools to protect privacy in a whole new way. Enigma is the result of that collision between bitcoin and privacy and security research."

If Enigma can enable computation on encrypted data, says Zyskind, perhaps it can eventually entice users to even make more data available for mining, without the Big Brother fears that data mining usually brings with it.

"How can we do more with data, and from a privacy perspective, how can we protect it?" Zyskind asks. "This is a way to get data privacy now."

1Correction 7/1/2015: An earlier version of the story referred to Enigma as a type of homomorphic encryption. In fact, it's designed to achieve the same goal as homomorphic encryption---computations on encrypted data---but uses a different technique known within cryptography secure multiparty computation.