The Effects of the Spectre and Meltdown Vulnerabilities

On January 3, the world learned about a series of major security vulnerabilities in modern microprocessors. Called Spectre and Meltdown, these vulnerabilities were discovered by several different researchers last summer, disclosed to the microprocessors’ manufacturers, and patched­—at least to the extent possible.

This news isn’t really any different from the usual endless stream of security vulnerabilities and patches, but it’s also a harbinger of the sorts of security problems we’re going to be seeing in the coming years. These are vulnerabilities in computer hardware, not software. They affect virtually all high-end microprocessors produced in the last 20 years. Patching them requires large-scale coordination across the industry, and in some cases drastically affects the performance of the computers. And sometimes patching isn’t possible; the vulnerability will remain until the computer is discarded.

Spectre and Meltdown aren’t anomalies. They represent a new area to look for vulnerabilities and a new avenue of attack. They’re the future of security­—and it doesn’t look good for the defenders.

Modern computers do lots of things at the same time. Your computer and your phone simultaneously run several applications—­or apps. Your browser has several windows open. A cloud computer runs applications for many different computers. All of those applications need to be isolated from each other. For security, one application isn’t supposed to be able to peek at what another one is doing, except in very controlled circumstances. Otherwise, a malicious advertisement on a website you’re visiting could eavesdrop on your banking details, or the cloud service purchased by some foreign intelligence organization could eavesdrop on every other cloud customer, and so on. The companies that write browsers, operating systems, and cloud infrastructure spend a lot of time making sure this isolation works.

Both Spectre and Meltdown break that isolation, deep down at the microprocessor level, by exploiting performance optimizations that have been implemented for the past decade or so. Basically, microprocessors have become so fast that they spend a lot of time waiting for data to move in and out of memory. To increase performance, these processors guess what data they’re going to receive and execute instructions based on that. If the guess turns out to be correct, it’s a performance win. If it’s wrong, the microprocessors throw away what they’ve done without losing any time. This feature is called speculative execution.

Spectre and Meltdown attack speculative execution in different ways. Meltdown is more of a conventional vulnerability; the designers of the speculative-execution process made a mistake, so they just needed to fix it. Spectre is worse; it’s a flaw in the very concept of speculative execution. There’s no way to patch that vulnerability; the chips need to be redesigned in such a way as to eliminate it.

Since the announcement, manufacturers have been rolling out patches to these vulnerabilities to the extent possible. Operating systems have been patched so that attackers can’t make use of the vulnerabilities. Web browsers have been patched. Chips have been patched. From the user’s perspective, these are routine fixes. But several aspects of these vulnerabilities illustrate the sorts of security problems we’re only going to be seeing more of.

First, attacks against hardware, as opposed to software, will become more common. Last fall, vulnerabilities were discovered in Intel’s Management Engine, a remote-administration feature on its microprocessors. Like Spectre and Meltdown, they affected how the chips operate. Looking for vulnerabilities on computer chips is new. Now that researchers know this is a fruitful area to explore, security researchers, foreign intelligence agencies, and criminals will be on the hunt.

Second, because microprocessors are fundamental parts of computers, patching requires coordination between many companies. Even when manufacturers like Intel and AMD can write a patch for a vulnerability, computer makers and application vendors still have to customize and push the patch out to the users. This makes it much harder to keep vulnerabilities secret while patches are being written. Spectre and Meltdown were announced prematurely because details were leaking and rumors were swirling. Situations like this give malicious actors more opportunity to attack systems before they’re guarded.

Third, these vulnerabilities will affect computers’ functionality. In some cases, the patches for Spectre and Meltdown result in significant reductions in speed. The press initially reported 30%, but that only seems true for certain servers running in the cloud. For your personal computer or phone, the performance hit from the patch is minimal. But as more vulnerabilities are discovered in hardware, patches will affect performance in noticeable ways.

And then there are the unpatchable vulnerabilities. For decades, the computer industry has kept things secure by finding vulnerabilities in fielded products and quickly patching them. Now there are cases where that doesn’t work. Sometimes it’s because computers are in cheap products that don’t have a patch mechanism, like many of the DVRs and webcams that are vulnerable to the Mirai (and other) botnets—­groups of Internet-connected devices sabotaged for coordinated digital attacks. Sometimes it’s because a computer chip’s functionality is so core to a computer’s design that patching it effectively means turning the computer off. This, too, is becoming more common.

Increasingly, everything is a computer: not just your laptop and phone, but your car, your appliances, your medical devices, and global infrastructure. These computers are and always will be vulnerable, but Spectre and Meltdown represent a new class of vulnerability. Unpatchable vulnerabilities in the deepest recesses of the world’s computer hardware is the new normal. It’s going to leave us all much more vulnerable in the future.

This essay previously appeared on TheAtlantic.com.

Posted on January 26, 2018 at 6:12 AM56 Comments

Comments

Hamish January 26, 2018 7:04 AM

What I’m not seeing is much in the way of commentary on:

  • the cost impact on customers due to additional compute resource to cover any performance deficit after patching for speculative execution. This could actually lead to more cash for chip manufacturers and everyone between them and the customer.
  • lack of succinct advice on what must be patched to protect against meltdown or spectre. To illustrate, imagine a server running a VM, running a container, with a GUI, and a browser. As the user of the browser, how many layers need to be protected to secure my browsing activity? Bare metal firmware and os + VM firmware and guest os + container software and os + browser + any other compiled binaries on systems along that path? That’s a lot of pain and not necessarily all under my control.

The industry reaction to these vulnerabilities has been shocking, from forcefully rebooting cloud compute without warning, to lack of authoritative advice on what amount of patching will give you the protection that you need depending on who you are.

Who? January 26, 2018 7:21 AM

From the entry on this blog:

Spectre and Meltdown attack speculative execution in different ways. Meltdown is more of a conventional vulnerability; the designers of the speculative-execution process made a mistake, so they just needed to fix it.

The designers will not fix it. In fact, I think they cannot fix it. Operating system developers are now working on software fixes to avoid out-of-order execution being abused to read data from any memory address mapped to the memory space of the rogue process that wants to exploit it. I wish there would be a microcode-based fix to this race condition. Right now I think most teams are working on isolating the kernel and user land memory spaces.

First, attacks against hardware, as opposed to software, will become more common. Last fall, vulnerabilities were discovered in Intel’s Management Engine, a remote-administration feature on its microprocessors. Like Spectre and Meltdown, they affected how the chips operate.

Intel ME is, in some way, just software and it can be fixed if Intel wants. Don’t trust on Intel working on these fixes. They announced two months ago that Intel ME 6.x and 7.x are EOL’d and will not receive the fix for the attack you are describing, even if Intel acknowledges these ME releases are vulnerable too. A shame, and we have not even started talking about backdoors in firmware!

In my humble opinion, Meltdown and Spectre are not comparable to Intel ME. Meltdown and Spectre are truly hardware design flaws, comparable to Rowhammer. Intel ME is just an operating system, even if it can be only fixed by Intel. As said above, Intel is not exactly willing to fix these vulnerabilities in machines older than four years.

Meltdown, Spectre and Rowhammer are hardware bugs that can only be fixed by replacing the broken hardware. There are not true fixes for these bugs but just what industry calls “mitigations,” i.e. changes in software that do not fix these vulnerabilities but try to make them harder to exploit.

Hardware bugs are a new research field, and anything discovered here will have a much higher impact on our computing infrastructure.

Who? January 26, 2018 7:44 AM

The most serious danger of hardware bugs is that industry just tries to “sweep under the carpet” these bugs (e.g., by increasing the refresh frequency on DDR3 memory, or applying partial workarounds in the case of Spectre v2) at a performance/power cost for the customers.

Some day wrongdoers will learn ways to exploit these now sleeping bugs in a very effective way because these bugs had not been really fixed.

Bauke Jan Douma January 26, 2018 8:58 AM

@Bruce
“Now that researchers know this is a fruitful area to explore, security researchers, foreign intelligence agencies, and criminals will be on the hunt.”
(emphasis mine)

Shouldn’t that read:
“Now that researchers know this is a fruitful area to explore, security researchers, intelligence agencies, and criminals will be on the hunt.”

Petre Peter January 26, 2018 9:39 AM

It seems that the ability to process is turning into the ability to predict, and the ability to predict is turning more into a need to access data rather than faster clock speeds. Dear Speculative Execution, welcome to the privacy battle.

TheInformedOne January 26, 2018 9:51 AM

Used to be, the “Art of Compromise” was ply’d by script kiddies in their parents basement looking for braggin’ rights on the BBS. Those were the days when hacking was more entertaining than malicious. Nowadays, the “Art of Compromise” is a commercial venture undertaken by the dark-web greed-fueled mafia of the worlds developing nations. Therefore, it has naturally progressed from simple scripting or Nigerian Prince emails to AI-assisted attack-tool automation. Every attack vector is explored and x-ploited to increase profits. Who says the U.S. has a monopoly on capitalism? Certainly not the hackers. Anyone with a Raspberry Pi or old discarded Android phone can now get “paid to play”….

echo January 26, 2018 10:20 AM

Sorry to be akward but besides a legal commentary on UK/EU consumer rights law has anyone done a human rights and equality impact study of the dangers presented by these vulnerabilities? How many people globally will be harassed or imprisoned or die because of them? Is it unreasonable to suggest a scheme such as a colour coded bar be used to highlight the assurance level of consumer goods and level of risk?

On a lesser note I’m disappointed these kinds of vulnerabilities are being pushed into hardware. I always assumed hardware people would be more careful.

de La Boetie January 26, 2018 10:54 AM

The vulnerabilities are emphatically not new, and side channels have been known to competent processor designers for ever. In the race for dodgy optimisations and profit without giving users extra cores (that would harm their wonderful world of profit gouging and product segmentation), they have skimped and ignored responsible engineering because as discussed many times here – they do not significantly bear any share price harm or corporate executive harm. No jail, waltz off with those bonuses!

I would also very much like to know – and will not likely do so – what knowledge the NSA and their partners had of this class of vulnerability, for how long, and whether it was raised in the VEP.

Hacker Uno January 26, 2018 11:38 AM

For at least the last couple years, I’ve argued that trying to lock down an endpoint device in an enterprise environment is a futile effort. The ME, Meltdown, and Spectre vulns I believe basically prove my point. And, don’t even get me started on containers!

Instead of worrying about the endpoint devices, it is the network that needs to be locked down and thoroughly monitored. There must be 100% cleartext capture of all traffic crossing any trust boundary; 100% log capture of every endpoint and infrastructure device; 100% netflow logging of all infrastructure device flows; and real-time monitoring and analysis of all of this captured and logged traffic.

Also, all outside-the-enterprise enterprise devices (e.g., corporate laptops, cell phones, etc.) must be configured to only allow internet connectivity to occur thru a VPN tunnel into the corporate network, so all traffic to/from that device can be captured and monitored; if it is a corporate device, then it must always be subjected to corporate network security policies.

In my professional opinion, an organization’s best Security ROI is money spent on the SOC, providing it the best possible staffing, training, infrastructure, and tools. Monies spent elsewhere is mostly security theater.

VinnieG January 26, 2018 11:56 AM

@Bauke Jan Douma: No. “Now that researchers know this is a fruitful area to explore, security researchers, AS WELL AS FOREIGN AND DOMESTIC INTELLIGENCE AGENCIES AND OTHER CRIMINALS will be on the hunt.”

VinnyG January 26, 2018 12:06 PM

@echo re: “I’m disappointed these kinds of vulnerabilities are being pushed into hardware. I always assumed hardware people would be more careful.”
I have an interesting question. From what quarter came the pressure on Intel to go so far down the speculative execution road? It’s difficult to believe that it was solely a response to competition: aside from the miniscule challenge from AMD (and it’s arguable that Intel requires AMD to survive to avoid anti-trust scrutiny) Intel appears to almost entirely lack competitors in its market sectors, at least those from which it makes a profit. A self-initiated drive to excel? All the evidence points to Intel expending only those resources necessary to maximize shareholder value, in the classic manner of a corporation. If the incentive wasn’t directly financial, where else could it have come from?

keiner January 26, 2018 2:23 PM

Sorry, but Linus analysis (of the (non-)patches) is more skilled.

And, by the way: What exactly was Intel doing since last summer? Sitting on its hands and hoping, nobody would publish this? This is all highly criminal. In Russia some guys would go to Siberia for this. Guys, in the US you are real looooosers.

Who? January 26, 2018 3:52 PM

@ keiner

The selective disclosure of this vulnerability to close collaborators only is a disgrace. I doubt, however, that Congress of the United States will do the right thing to these corporations.

On the other hand, why is there not a tenth question in the letter? It is, perhaps, the question that better resumes what happened in the last months:

Why did the embargo last until the end of the Christmas sales period?

albert January 26, 2018 4:33 PM

@keiner,
Cool, indeed. Congress did something good. Must be an accident. Hope they follow up on it. A public hearing would be nice.

@Who?,
“…Why did the embargo last until the end of the Christmas sales period?…”
Because Intels fiscal year ends in December. 🙂

. .. . .. — ….

Charlotte H January 26, 2018 6:01 PM

Meltdown is more of a conventional vulnerability; the designers of the speculative-execution process made a mistake, so they just needed to fix it. Spectre is worse; it’s a flaw in the very concept of speculative execution.

I don’t agree, or maybe don’t understand the distinction Bruce is trying to make. With Meltdown, CPU speculation has visible effects across two tasks (kernel and some ordinary program). With Spectre, same basic idea, except it’s two userspace tasks. In either case, the root cause is the same: state from one task can be seen by another that shouldn’t be able to see it.

Spectre isn’t a flaw in speculative execution as a concept. The speculator state is task state, associated with the process/task being speculated for; like all such state it needs to be saved and restored (or cleared) on task switches. Intel got greedy by hiding this as a microarchitectural detail, and doing so incorrectly; the fix is “just” to expose it, or to ensure it can’t be shared between processes (as they ensure for the TLB, via PCID or by clearing on CR3 write). Or to the extent it’s useful to provide cross-task speculation, be damn sure there’s a documented safe way to manage that.

(Not mentioned in Bruce’s quote is cache state, which will be harder to fix. Obviously each cache entry will need to be linked to a process somehow. And then there’s hyperthreading functional-unit-occupancy side channels, which are really hard to fix… though easy for the OS to punt on by requiring all hyperthreads to come from the same process.)

Charlotte H January 26, 2018 6:34 PM

@Who?

The designers will not fix it. In fact, I think they cannot fix it. Operating system developers are now working on software fixes to avoid out-of-order execution being abused to read data from any memory address mapped to the memory space of the rogue process that wants to exploit it.

Do you mean it can’t be fixed in microcode or there’s no theoretical way to design a CPU that’s immune? The problem can be worked around in software, and fixed in future CPUs with less of a performance impact.

You’re right that the ME is “just software”. It can be disabled—not easily now, but a BIOS update could make it easy and/or fix the bug(s). Perhaps what Bruce should have said, and which is relevant here, is that the boundaries between hardware and software have been getting more and more blurry. I have a Pentium 3 that coreboot requires no binary blobs for, but it still has upgradeable microcode. Earlier Intel CPUs had non-upgradeable microcode, or no microcode at all. (Ironically, the CPU used in the ME doesn’t accept microcode.) Later CPUs can’t be made to use DRAM without a binary blob.

@echo

Is it unreasonable to suggest a scheme such as a colour coded bar be used to highlight the assurance level of consumer goods and level of risk?

I’m going to say this is obviously unreasonable, for now. The affected CPUs are used at all assurance levels, including in government installations. If the NSA et al. allowed top-secret data be processed on these CPUs, having failed to find or perhaps even look for these flaws—or put their governments at risk by refusing to disclose—how can we expect anyone to give us a reliable evaluation?

A decade from now it might be doable. Even safety certification is a fairly recent trend in mainstream CPUs/SOCs, and that’s just guarding against errors. “Random” events have measurable probabilities and distributions, and there’s a limit to how much HW/SW developers can fuck up and still get something that mostly-works. Formal verification can help there. With security, a one-in-seven-billion chance can affect everyone at the same time; and formal verification is harder, because the model you’re checking against has to avoid bugs and oversights too.

Taz January 26, 2018 8:11 PM

I still can’t find verified data for older processors which are immune to Spectre.

Would have though this might have been the first response?

Gerard van Vooren January 27, 2018 12:41 AM

@ keiner,

And, by the way: What exactly was Intel doing since last summer?

I can’t tell you because I am not an Intel guy, but my probably lousy guess would be to just let it disappear. But I wonder what “solutions” Intel comes up with to really solve this crap.

This is all highly criminal. In Russia some guys would go to Siberia for this. Guys, in the US you are real looooosers.

Well, welcome to the corrupted world of greed, the world of Capitalism.

XOR January 27, 2018 3:08 AM

“It’s looking like optimization is always premature.”

Optimization requires a holistic whole picture approach. Faster isn’t faster if the wheels come off.
Pushing out a patch that reboots servers or slows/bricks unaffected chips isn’t responsivity.
Lying to try to avoid an unfavorable public opinion reality doesn’t get you out of the actual woods.

There are ways to speculatively compute things without allowing some random pointer to access a forbidden cache one bit at a time, there’s no question there are ways to avoid this problem in design. The question is what’s the NEXT problem that we’re not currently looking for, and does anyone trust Intel to look comprehensively for it or even admit it if they do find it?

Clive Robinson January 27, 2018 4:21 AM

@ Cassandra,

It looks like things were lining up for Meltdown and Spectre to be discovered a lot earlier than some people may have first thought.

They have been lining up since the 1980’s if not before. In Intel’s case they have way to much backwards compatibility they are trying to carry forward[1].

However the problem is well known and it’s security dangers likewise.

If you look back on this blog you will find one of my mantras is “Efficiency-v-Security” and as I’ve explained as you increase efficiency you open up time based side channels that can relatively easily be exploited. Unless you realy know what you are doing it the design domain then the net result is security decresses faster than efficiency rises.

This is the fundemental problem behind the four issues that have arrisen with Spector and Meltdown.

But specifically Cache Timing Attacks were known in the Open Community aways prior to the start of the AES competition. We have good reason to belive that the NSA finessed both NIST and the AES competition to ensure that cache timing attacks would be in most if not all implementations of AES so that the leakaged of key information would be recoverable not just within the computer but fairly far out into the network (ie past many LAN to WAN transitions).

So yes I would expect parts of the NSA to be fully cognizant of the cache timing and similar issues that are the cause of Spector and Meltdown, “No if’s, but’s or maybe’s”.

@ ALL,

The real underlying problem is one of physics and trying to cheat natures laws.

In this case the speed of light in an ideal medium given as “C” in most equations. Which is a little under “3 times 10 to the eight meters per second” or 300,000,000 m/S. Sounds a lot but it’s realy not, To see why, time is in effect the inverse of frequency (f=1/t) thus as clock speeds go up distances go down way down. At 1GHz light travels ~1ft that is a 12inch / 30cm ruler you have on your desk, so a 3GHz clock gives 4inches or 100mm tops. At one pico second –a speed engineers talk about routinely– you are looking at a distance similar to a small grain of sand, which is way way way less than the dimensions of IAx86 chips.

If you work out how far the “round trip distance” is on a Printed Circuit Board (PCB) you will be somewhat supprised at just how short distances need to be. But remember that because of the board dialectric and transmission line effects the effective distance is reduced. This reduction is known as “Velocity factor” and in ordinary coax you are looking at 0.6-0.7C in twisted line that is used in Cat1 and above Unshielded Twisted Pair (UTP) and similar cables it can get down to 0.1C.

It’s the one reason why the bus speed between the CPU chip pins and the memory chip pins is kept down to ~0.333GHz or 1/10th of clock speeds on chips. Thus the distance would be around 50cm or a little over one and two thirds feet round trip in free space. But with velocity factor and “data clocking” and “gate delay” in chips and PCB routing issues to ensure the PCB traces are the same length you are down to around 15cm or 6inches distance max…

This 1/10th the speed of the CPU internal clock is why CPU chips have multi level “Cache Memory” on board to bring the effective distance between the memory and the internal Arithmetic Logic Unit (ALU) where the actual “Data Processing” is done down way down.

But another fly hits the ointment at high speed which is to do with von Neumann architectures and Virtual memory. The von Neumann architecture shares data and instructions in the same memory, which has significant implications in that OS’s as we know them. They can not run without “instructions being treated as data to load programs” etc and the sharing of data and instructions causes real slow down issues the closer you get to the ALU. So most modern CPUs internally are Harvard architecture thus have seperate data and instruction caches close in. But the seperation of instructions and data means there are addressing issues which need to be dealt with. Likewise the internal CPU view of addresses called the “logical address” view for similar reasons is not that of the external memory and it’s “physical address” view. The chaging of this view is traditionaly done by the Memory Managment Unit (MMU), but… In the IAx86 it was designed with “a poor man’s MMU” known as “segmentation” done via the segment registers which are as close to the ALU as the other registers thus inside of the address translation done in the MMU that was added later. This unfortunatly is a major source of backwards compatability issues[1] that most other CPU architectures just do not have for very good reason.

Thus close to the ALUs in a core you have seperate data and instruction caches, that use some warped internal logical addressing. The other side of the MMU are other on chip caches that use physical addresses, that make the Physical RAM look both a lot closer and a lot faster. So it’s possible for the likes of loops to only use the caches and not external memory. Unfortunatly due to the idea of using multiple cores and CPU chips for parallel processing there is a whole load of other logic involved in keeping the real external memory in sync with the internal cache memory. Like the instruction decode logic this runs at the CPU clock speed and thus generates even more heat.

Which brings out the issue of “efficiency” the drive for marketing reasons would have been the old “Specmanship” speed of work done for a chosen test type. This invariably means more heat at any given efficiency rating, so to avoid “heat death” efficiency had to be improved which mostly means taking things out to minimise component count etc. This has a side effect of opening up time based side channels over and above the other reductions in security…

But ultimately security in computers boils down to one thing, “Memory” because it holds the configuration and settings information. The only thing protecting the security is the MMU, the configuration page tables of which are kept in “memory”. Even the “Secure enclaves”, “Managment Engine” and other things are controled by the contents of “Memory”.

Thus if you can get at the “Memory” you “Own the System”. There are three basic ways to protect memory,

1, Physical segregation
2, Logical segregation.
3, Encryption.

Physical segregation uses entirely seperate memory chips with entirely seperate address and data buses. Which is not just expensive in components, it also needs entirely seperate address and data lines. Which means increased PCB real estate, which makes things difficult at best, which is why it is rarely to never used these days.

Which is why logical segregation via VM and page tables is the usuall solution for security these days. But Rowhammer made it obvious that “reach down” attacks were more than possible. Further Spector and Meltdown have shown that “reach around” attacks are possible. Also the long known sidewise attacks through I/O and DMA. There are also “bubbling up” attacks from low level gates and “test harness” logic that have yet to be weaponised and Open POC code developed, and is one of the “scary things” about “supply chain poisoning”, which is why the DoD put quite a bit of money into developing defences, with the most promising lines having now “Gone Black”.

Which leaves us with encryption.

Encryption currently is, always has been and probably always will be both a problem and messy.

Memory encryption is not a new idea, it’s been around for over a third of a century, the problem is it’s never realy been implemented in the correct way for a number of reasons. The simple comparison to understand why is with that of mutable storage such as hard dosks. The fairly well known problem that of the difference between Full Disk Encryption (FDE) and File encryption.

FDE only gives protection to “data at rest” where as File Encryption protects whilst the system is up and running with multiple users.

The same applies to RAM encryption using a single key for the whole RAM might protect against attackers getting at the external to the CPU RAM with logic analysers and the like, but it will not protect against multiple users in the system. Like File Encryption you need to protect each user process/thread space using different encryption keys. Thus if one users data does get pulled into cache by another users process it’s value is effectively meaningless.

What it will not protect against is applications that have multiple threads that should be kept secure from each other, unless the thread spaces can be encrypted from each other. Which might be difficult in some applications.

The point to remember is “security was taken out to improve throughput”. You can not expect to put the security back without incurring a penalty atleast proportional to that which was originally taken out…

Look at it this way, take the safety features out of a car and it will either go faster or further for the same fuel. But you will die or be injured a lot more easily than with the safety features that have a time/cost penalty. The choice is yours, go faster but at greater risk, or be safer at a more pedestrian rate…

The same applies to the Intel CPU chips… The real trick in both cases is finding the engineering “sweet spots”, but that as they say is a conversation for another day.

[1] Abother major source of backwards compatability issues is that the IAx86 Instruction set is overly complex, and way beyond that which most programmers can gey their heads around effectively. The reason is that a complex instruction set is a form of compression that means more “information” per external bus clock cycle is transfered which speeds things up, or atleast that was the idea. The problem is a very large and complex instruction decode logic which chews up power and makes things way way more complex than they need to be especially with the managment of data… Which has knock on effects.

(required) January 27, 2018 5:01 AM

“Intel CPU chips… The real trick in both cases is finding the engineering “sweet spots”,

The real trick is not putting in ridiculous system-on-die backdoors with NO password requirement whatsoever – and then pretending you never did that when you get caught, then slowly over YEARS rolling out rudimentary and unverified “solutions” to the problem that you baked into hardware deliberately while shamelessly punting that large chunk of responsibility remaining to individual vendors/OEMs who will do minimal-if-anything about it. Proving PR is the #1 metric of a successful microprocessor leader, and people are stupid. That’s the real trick.

echo January 27, 2018 5:17 AM

@Charlotte

I agree with what you say about formal verification and the difficulties of disclosing vulnerabilities in equipment used for high security purpose. I was thinking more of an informal guesstimate level type consumer level indicator similar to the graphic bar used for energy efficiency lightbulbs. A different colour bar might be used for informal self-certified and independently audited formally certified. Not everyone is an expert and I believe this kind of thing is a useful way of communicating without being patronising plus it can be a gateway for educational purposes.

@Clive

Thanks. Your essay was interesting. (Old stuff but always good.) Some software vendors have previously sold caching binary translators to run different CPUs and for Windows a kind of super WINE which ran windows natively on other OS until copyright/patent and manufacturer issues halted this. The question I have is it possible to provide a secure caching binary translator to mitigate issues with Intel or other CPUs like Google on the fly binary patching and allow us all to move on from historically bad designs?

tfb January 27, 2018 6:57 AM

@VinnyG

Although it may seem like Intel are a monopoly who can simply rest on their laurels, and hence have no need to implement speculative execution, I don’t think this is true: while they are dominant in some markets they are not dominant in others, and I don’t think there’s a strong reason why the processors used in those markets would not come to dominate the markets Intel currently dominate.

Two example markets:

  • desktop and laptop processors, where Intel are dominant (although there are many more non-Intel processors in a desktop than there are Intel ones, most of the money goes to the Intel ones);
  • phone and tablet processors, where ARM processors are dominant, manufactured by a number of companies.

Let’s assume that the processors in phones are five years behind those in desktops and laptops in terms of performance (the exact number doesn’t matter). There is aggressive competition in the phone market for higher performance, so processors there will continue to increase in performance for some time. If Intel simply sit on their laurels and stop working to make their processors much faster phone processors will catch up in about five years, at which point they will be as fast as Intel processors and much better in terms of power consumption.

At that point the only thing that is keeping people from switching to phone-derived processors in desktops and laptops is binary compatibility. This was a critical issue for many years as the dominance and non-portability of Windows meant that running x86 (and later x86-64) code was a required feature of any desktop processor. I don’t think that’s really true any more: Windows is not the massively dominant platform it once was, and Windows itself is now much more portable. There will still be legacy code and legacy installations which require Intel processors but, well, at that point Intel will be relegated to supporting legacy platforms: everything new, on desktops, laptops and servers, will be built around non-Intel processors.

So, yes, Intel are ahead, and are dominant in some markets, but if they want to remain dominant they need to keep ahead of the competition.

tfb January 27, 2018 7:08 AM

@Hamish

I think the cost impact is interesting. I work somewhere where we use very large, Intel-based, computers to run numerical simulations: the current machines cost us something over a hundred million dollars I think.

Like most HPC applications we do lots of I/O so we are likely to be quite badly hit. And apparently we are supposed to patch the machines (it’s not clear why since the allocation unit is an entire node: it may be they’re worried about something getting privileged access on a node and then spreading to other nodes, or it may be just reflex).

If it costs us 20% in performance then either we’ll be 20% later for various projects for which we are already late, or we need a great extra mass of nodes (much more than 20% since we’re up against scaling limits for our code).

This is tens of millions of dollars of damage, in one installation.

Charlotte H January 27, 2018 10:23 AM

@echo

<

blockquote>I was thinking more of an informal guesstimate level type consumer level indicator similar to the graphic bar used for energy efficiency lightbulbs. A different colour bar might be used for informal self-certified and independently audited formally certified.

I have difficulty with the “informal self-certified” idea. Take a look at locks sometime (e.g. padlocks, bicycle locks)—many rate themselves on a scale of 1 to 10, and it’s largely bullshit. Nobody rates themselves less than a 5—I’ve had little difficulty picking/shimming some of those—and some brands expand their own scale above 10 for high-end locks.

With strict objective criteria the idea could have some merit. We might say a “medium-security” product has to have auto-updates for at least X years, has to patch “major” security incidents within Y days, etc. It’s just that people who should know better have made some really boneheaded mistakes, like last year’s Intel’s ME where it would accept a blank password (or all those IoT devices that check passwords in Javascript or a client app). Self-certification mechanisms are checklists and it’s really hard to enumerate everything that can go wrong.

External certification is better, but it’s a race to the bottom and hugely asymmetric. The manufacturer will look at everyone who can give the requisite stamp of approval and pick the cheapest or least stringent. So we’ll get one or a few people looking at it from a defense point of view, with strict time/cost limitations, whereas capable attackers number in the tens of thousands or more. And attackers can spend almost as much as the attack is worth: if something allows them to steal $100 billion at 10% risk of being caught, there’s an expected profit at any exploit-development cost below $90 billion.

Those are our best “practical” ideas right now, and I feel like many of the products with embarrasing flaws could’ve jumped any hurdle to get the “highest” certification. Then one day, instantly, we’ll notice one of them’s been 100% insecure the whole time. And we know from experience that a (“Warhol”) worm can affect 90% of those machines in 15 minutes, and the discloser isn’t always the first one to have found it, so we just kind of have to hope we’ll get a patch before it has blown up. I’m not saying the ideas are worthless, nor do I know they’ll give more than an illusion of security. In any case we’ll need something better, and all I can say for sure is that reduction of TCB size/interaction-complexity will need to play some kind of role.

keiner January 27, 2018 11:44 AM

@ Gerard vV

Watch out the next months how Intel will be handled, compared to Volkswagen, which (correctly) had to hand over some billions in the US (unfortunately not in Europe, in Germany tax money was handed out to buyers of new Diesel cars, which finally ended at Volkswagen).

Gerard van Vooren January 27, 2018 12:36 PM

@ keiner,

What is this, wishful thinking or knowledge? AFAIK only a couple of small universities created class action suits. I haven’t heard anything about the EU yet.

hmm January 27, 2018 2:22 PM

@ Keiner

  • Krzanich’s stock deals… if they had prior knowledge + tried to hide it for the stock value, double it.

I hope the EU rips them a new backdoor and shows the US what actual regulation should look like.

SpaceLifeForm January 27, 2018 2:27 PM

@XOR

Premature optimization is the root of all evil.

Quote from the smart Donald. Knuth.

Bong-Smoking Primitive Monkey-Brained Spook January 27, 2018 2:48 PM

@SpaceLifeForm,

Premature optimization is the root of all evil.

More likely a quote by Vātsyāyana. I take it the Kama Sutra hasn’t reached your galaxy yet.

Grauhut January 27, 2018 4:42 PM

@Gerard: Intel is tbtf… 😉

@Bruce: “The press initially reported 30%, but that only seems true for certain servers running in the cloud.”

“Certain servers” is every high iops big data system using ssd caching locally, for instance. All iops optimized ssd cached systems like storage servers for hypervisors are hit hard when patched. My ZILs hate it! 😉

Who? January 27, 2018 5:23 PM

@ Charlotte H

Do you mean it can’t be fixed in microcode or there’s no theoretical way to design a CPU that’s immune? The problem can be worked around in software, and fixed in future CPUs with less of a performance impact.

I know almost nothing about microcode, how it works or what it can do. I have read Meltdown fix consists on isolating the kernel and user address spaces. It would be great if a microcode update can fix the underlying problem (a race condition that sometimes allow a rogue process to access memory addresses it should not be allowed to read). This would allow us to use the shared cache in a supposedly secure way. On the other hand, we do not want these serious vulnerabilities fixed on the three more recent microarchitectures only. The industry needs a global response for machines that cannot be easily replaced.

An immune CPU is possible, think on anything earlier than a Pentium Pro processor. I think the fix for new microarchitectures will be deploying a cache per physical core (I am not sure about hyperthreading) but then threads that share a core (or virtual core) should not be considered isolated if I am right.

On december 4th I ordered a new workstation to replace a twenty one years old PC that I plan to use for less computing intensive tasks. Ironically, I asked for a workstation with a four cores seventh generation i5 processor, as it assures its processor will not support hyperthreading—in other words, less heat emitted, no bottlenecks when all “virtual” cores are used at once and, most important, no side channels or other bugs as a consequence of hyperthreading technology like this well-known one:

http://www.daemonology.net/papers/htt.pdf

Now my twenty years old PC is the only one immune to Spectre and Meltdown and this new workstation has a huge set of side channels (right now three are known, but there may be more that will be discovered in the next months).

Who? January 27, 2018 5:28 PM

@ keiner, albert

A eleventh question that should be asked:

Ok, all this selective disclosure is to protect us. Fine. Then, why has Intel released a vulnerable microarchitecture (“Cofee Lake”) during embargo? Why have OEMs manufactured new computers based on this broken microarchitecture?

Nick P January 28, 2018 12:16 AM

re origins and immunity

I told Colin Percival here that cache-based channels were discovered in early 1990’s. Gave some links in that and referenced comment since I was responding to a flurry of them. Thomas Ptacek shows what mainstream security thinks of that old research and high-assurance field here which I slam hard. Less so on side channels vs covert channels terminology as we’ve even debated that here. Anyone wondering about mitigations for hardware leaks can look at this list of examples.

re Intel impact

Who knows. I predicted they’d make plenty of money this year. Financial results are in: they’re doing great. Anyone thinking they should’ve tried to make more secure CPU’s should keep in mind they lost a billion or so attempting that. They’re instead doing the backward compatibility and lock-in that meets their actual goals of high profits for executives and shareholders. Incentives like that are why we need regulation.

Alyer Babtu January 28, 2018 5:22 AM

Improvements should come from a better understanding of the problem. Thus, there is never a place for “optimization”.

Clive Robinson January 28, 2018 5:35 AM

@ Nick P,

They’re [Intel] instead doing the backward compatibility and lock-in that meets their actual goals of high profits for executives and shareholders. Incentives like that are why we need regulation.

What we need is “the right regulation”, which given the current US legislators trends are actually going to be worse than no regulation…

Part of the problem in this respect is “Fines not Terms” mentakity from the various USG entities when it comes to sanctions. We see this with the Finance Industry, with eye wateringly large fines given to banks and the like, who just pass the cost on to their customers with a bit more to generate a bit of slack for the “Next Taxing”. It kind of looks good till you realise the banks and large corps are set up to not just pass the cost to their customers, but also write off the fine against tax. So the net impact on the company is marginal at best…

In the case of Intel’s CEO, his share trading is something anyone reading about it with more than half a brain will have questions about along the lines of “how many years…”.

The second thing people will likewise want to know, was why everyone was pushed into secrecy till after the Xmas spending boom, which Intel will have done very nicely out of…

This kind of tells you what legislation Intel’s lobbyists will buy from the legislators.

So I’m guessing the only action of any serious kind would come from Europe.

The problem in Europe like most other places is the feeling that you would be cutting your own throat if you did hit Intel hard.

There is after all no compulsion other than profit for Intel to sell into Europe or else where in the world. Thus they could simply “turn off the gas” on the supply line side of things, by rigging contracts and terms they have with major suppliers. Microsoft did similar in the past so Intel could do the same. In fact they would only need limit the supply of “new CPUs” alowing only the near EOL or less popular lower profit etc CPUs in.

The result would be much the same which is “fear of being less competitive” would put preasure on the politicos to ease up on Intel…

As we are going to see with the current “crisis” most people do not use even a fraction of the CPU power they have so the 30-50% slowdown will have next to no discernable effect on them. It’s only the high end users who will be hit. But that’s were it gets interesting. One of the reasons various large US Corps set up data centers in Europe was to stop competition developing. Intel rateling the chain may well act to there advantage…

The downside though is Intel’s percieved effectively Monopoly market. Well things are changing in that area and have been for a while. The running costs of Intel Architecture and it’s price points are very uncompetitive and getting worse. It’s why some are looking at ARM –which used to be British– with FPGA for supercomputer and cloud applications. Fujitsu for instance have indicated that their future lies in the ARM direction for the chips for their future supercomputers. Rosey as Intel’s position appears to be, they have some quite genuine fears. The desktop market is dying as you watch, most home computing is based on “portable convenience” and thus “good battery life” both of which have favoured ARM for quite some time now. Heck you can by smart phones with more grunt and less power consuption than desktops of even five years ago.

So this has the potential to be a several bowls of pop-corn event…

Winter January 28, 2018 8:11 AM

@Clive
“Part of the problem in this respect is “Fines not Terms” mentakity from the various USG entities when it comes to sanctions. ”

I noticed that the European banking crisis alleviated after shareholders and other investors were relieved of their ownership without compensation if their bank collapsed. Effective bankruptcy is a good tool to incentivive owners to select sane management. It also takes care of bonuses paid out in shares.

Alyer Babtu January 28, 2018 8:00 PM

Perhaps better hardware design through principles, and safe efficiency, by cribbing/adapting from Michael A. Jackson (“not the singer”), Principles of Program Design (1975). E.g. see Chapter 12., Optimization, p. 251 “Rule 1 – don’t do it, Rule 2 – don’t do it yet”; and, p. 254 “the structuring technique presented in this book tends to produce the fastest possible program”.

Who? January 29, 2018 3:46 AM

Can cloud computing be secured against Meltdown and Spectre?

Just a random thought, I am not (and will never be) a cloud computing user.

Let us suppose I am a wrongdoer. Spectre and Meltdown vulnerabilities require fixes at both the operating system and microcode (latter is true at least for the 6th and upward Intel processor generations).

A cloud computing customer expects the cloud service provider to do its part on fixing both the microcode and host operating system. The customer will patch the guest operating system that runs in the hypevisor.

What happens if a wrongdoer hires an hypervisor on a cloud service provider and installs an unpatched guest operating system on it? Will it break the protection against Meltdown and Spectre and allow the wrongdoer to read all the memory compromising other hypervisors running on the same computer? In other words, can an unpatched guest operating system compromise the security of other (patched) guest operating systems running on the same physical hardware?

Cassandra January 29, 2018 5:14 AM

@Clive Robinson @Nick P

Clive, thank-you very much for your piece on “Efficiency-vs-Security” which, as ever, makes for both interesting and educational reading for those wishing to learn.

Nick P, for the links in your post regarding High Assurance systems and things that seem to be ancient history (and therefore unregarded) by some people new to ‘the industry’. I had very much peripheral involvement in High Availability systems, but even with my sparse experience, I see many interesting and questionable decisions made with system design these days. I’m old enough to know better, but I still get surprised by people repeatedly hastily implementing the mistakes of the past. There is a well known aphorism penned by George Santayana that springs to mind.

+++

The history of exploitable side-channel attacks is long, and not confined to microprocessor cpus. As this declassified NSA document shows, exploitation of side-channels certainly started at least as early as 1943 [ TEMPEST: A Signal Problem ]. As usual, this is preaching to the converted, but it may be an interesting nugget for some.

+++
An interesting 2007 paper Cache Based Remote Timing Attack on the AES

We present experiments and concrete evidence that our attack can be used to obtain secret keys of remote cryptosystems if the server under attack runs on a multitasking or simultaneous multithreading system with a large enough workload.

Current thinking appears to be that you won’t easily know if Meltdown or Spectre variants have been used against you. I have not seen any reports of the techniques being used actively, but absence of evidence is not evidence of absence. This should give anyone running services in ‘the cloud’ pause for thought, especially if processing data that is sensitive in any way. And, of course, a PC or mobile phone, regarded from the viewpoint of an attacker, is simply another remote multitasking or simultaneous multithreading system.

Who? January 29, 2018 9:27 AM

@ hmm

Can you believe they told CHINA before they told the US GOVERNMENT?

Indeed, I believe it. I feel safer this way. Telling the US Government about Meltdown and Spectre first would have put Intel and the research teams that found these vulnerabilities at risk of receiving a NSL.

hmm January 29, 2018 12:36 PM

@ Who

You’re maybe underthinking it. Intel already has NSL’s in hand. What do you think Intel ME is.
*(Despite any and all reddit denials by the same guy who sold his intel stock a year+ ago)

But they told Chinese partners including Lenovo and others before they told the US, which basically is 100% failing their own vuln disclosure policy and giving the Chinese government a chance to exploit the vuln in realtime before the US partners even know it’s possible. That’s more than a fail if they knew about it a year ago. That’s a crime with international security implications.

“I feel safer this way” – Well don’t let anyone tell you how to feel, but that’s not logical.

Clive Robinson January 29, 2018 1:04 PM

@ Cassandra,

The history of exploitable side-channel attacks is long, and not confined to microprocessor cpus.

We know it was demonstrated to the British ligation in the US during WWII that the telex One Time Tape super encipherment system was vulnerable to time bassed attacks.

Apparently the pull-in and release times for the XOR function (see Vernam Cipher patent from 1917) were sufficiently different that you could strip off the super encipherment by eye on an oscilloscope…

It’s one of the reasons Winston Churchill signed off on a Canadian prof of engineering designing a replacment.

Although Benjamin “Pat” Bayly designed the Rockex that solved one set of problems, there were other TEMPEST issues with the power supply. It’s an interesting design especially in it’s tricks to ensure the ciphertext out was always in the A-Z range.

Who? January 29, 2018 2:52 PM

@ hmm

As lots of people here I do not like Intel ME at all. I would bet it is compromised by the US Government; Intel ME is the perfect place to install a permanent and powerful backdoor on our computers.

I do not want to see it happening to the processors running on our computers too.

In a perfect world the way Intel has managed this issue would have been a shame. US-CERT has more experience managing these incidents and coordinating the industry as a whole, but the way IC collects vulnerabilities makes me feel no secure at all. Do you really think US Government would have allowed this powerful design flaw to be fixed? I guess these bugs would have been silently hidden once communicated by an official channel.

In the last year it seems that industry and government are not playing together at all. Industry has been weakened and hit by the IC and the White House has not very good relationships with industry leaders.

The way Intel has managed this issue is a shame, but it is better than having another wannacry.

Intel and its customers had been doing a lot of mistakes since june. But as I see it the way this information has been shared with international partners at least assures that industry will try to fix these vulnerabilities.

I would call it “the OpenBSD way,” provide a full and timely description of any security incident so it can be managed by development teams around the world. Security by obscurity and embargoes are not good allies.

Intel’s mismanagement of this security incident at least assures that we know the vulnerability exists. Now we need a serious compromise to fix it, a compromise that is critical on these actors that are the only ones allowed to fix certain parts of these vulnerabilities (e.g. microcode).

Who? January 29, 2018 2:59 PM

Bu the way, now that I use the word “embargo”… Intel and its customers had seven months to fix these vulnerabilities. A good way to probe embargoes are useless is looking at what is happening right now. The way this incident is being managed after seven months is not better than an incident that is publicly disclosed as soon as it is discovered. What do we have now? Broken microcode, bad patches, ugly fixes, unwanted reboots, poor performance, withdrawn patches…

Clive Robinson January 29, 2018 7:33 PM

@ Who?,

A good way to prove embargoes are useless is looking at what is happening right now. The way this incident is being managed after seven months is no better than an incident that is publicly disclosed as soon as it is discovered.

It proves that not just “embargoes” are usless, it also proves that “free markets” that are actually monopolies / cartels are likewise usless as they are not markets as most people would think of them. But it will also prove that there are also other knock on effects that can be used to create faux markets for rent seekers.

I could go on at length about why Intel has “rigged the market” over the years and the fact the US Government has turned a blind eye to Intel / Microsoft and other US high tech companies doing so. But I won’t as this will turn the conversation into one about the “politics of regulation” and miss a more important point, that will almost certainly cost more.

My actual point is that this whole Spector / Meltdown situation shows another significant failing, which is rather more insidious (but is driven by my previous point).

Since the floating point bug some years ago now, Intel has tried to run their hardware product like a software product. That is by developing a policy of “Patch not Repair” with “Deficient product roll out”. They have taken steadily worse steps with product reliability to speed up “time to market”. So much so they are now well beyond the point that they ship products that are so unreliable, they has to be patched every time it is turned on or reset. Which gives rise to all sorts of security issues not all of which are immediately obvious, nor their eventual toll on society.

Further it has pushed Intel into a position that to keep up with supposadly “market driven forces” they have gone into a significant tail spin… Which unless it is checked will result in an unfortunate crash of their own making.

Which raises the question of “Have Intel gone beyond the tipping point?” and if they have “What will be the result to the industry?”.

Both Intel and Microsoft have put themselves into this “Patch not Repair” behaviour, that inturn has led to “Deficient product roll out” as a norm. Thus both of them shipping ubreliable, “not fit for market” products from their combined monopoly positions.

One of the fundemental reasons they have been able to do this is “Net Neutrality”. Which has alowed them to avoid the real asymetric costs of repair[1]. It is those asymetric costs that drove the whole electronics industry into one of improving quality control, back in the 1960’s. not decimating it for short term “Marketing Driven” whims.

As far as I’m aware nobody has asked the real question about how a “rent seeking business model” will effect the “patch not repair” marketing driven ethos now that net neutrality is gone in the US?

Look at it this way, as a network service provider you see two of the richest businesses in the world “free loading” off of your business. Now you nolonger have to worry about “Net Neutrality” you can put a “toll gate” on your stretch of the “information super highway”. Thus charge not only Intel and Microsoft a high fee for alowing their “patch” packets into your network, you can also as the old telephone service providers did charge your customers for receiving the “patch” packets. All on the excuse of “load leveling” or even “Safety” etc[2].

The net result will be even for the patch conscientious a delay in patching thus opening up the “attack window” for the likes of ransomware etc.

We are moving into “Interesting Times” whether or not we like it, as the “rent seekers” have their way with us…

[1] I’ve mentioned this in the past. Put simply it costs a lot less to ship product out than to get product back for repair by several orders. But there is also a hidden side to transport costs which economists tend to ignore which is “Distance costs”, which has a fundemental size constraint on a market. Thus has a “hidden hand effect” that alows competition in a geographic way that alows new competitors to enter a market. When distance costs approach zero the size of a market tends to the maximum, which makes the market “local” in effect thus a “winner takes all” effect crushes out competition resulting in a monopoly market place.

[2] Such are the joys of an unregulated transportation market, that history has taught us for thousands of years is a bad idea. As it gives rise to high taxation and inefficient service provision. Imagine if you will every major road junction turning into a “toll both” and how that would effect your drive to work[2].

[3] But we don’t have to, the few that have tried it have caused an out cry from motorists that see unfettered access to roads not as a privilege but a necessity. Some will pay up, many will find “rat runs” or other ways around the tolls. Thus the rent seekers will put up the toll cost in terms of fees, time or both, whilst having a significant knock on effect in environmental damage due to the inefficiencies they create.

David February 3, 2018 9:09 AM

Hopefully Intel is working on hardware solution of this flaw. Obvious solution is adding fully isolated device that performs scheduled encryption for all sensitive information, (that does not used for computation ), then decryption is done only when request is longer then access time of meltdown.. Such solution will help not only for Meltdown, but also for any attempt to get password without touching keyboard.

sahlberg February 19, 2018 2:57 AM

Hunt • January 26, 2018 10:59 AM
There is evidence NSA has been known to have on the hunt for exploitation of
hardware bugs in the past.

What are you talking about?
Every single intelligence agency from every single country have teams looking for these exploits.

Singling out NSA for being “bad” for doing this is just stupid.

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.