So, we are in Meltdown! Thanks for this Intel. So, Intel has been having a very hard time lately. AMD has more or less just blind-sided them with Ryzen. They’ve been rushing new platforms to market as a result and compressing their release schedules in order to respond.
It’s worth stating here that a lot of media spots on this keep citing individual companies like Apple or Intel. The truth is this effects all processors implementing the x86 architecture. This covers a vast majority of processors in the entire world. Intel, AMD, ARM etc. It’s true that the ultimate source of this issue is designs from Intels 8086 processors, AMD also had input into the overall x86 architecture.
Having said all that Intel is now facing one of the worst hardware issues to ever hit any technology. If you think that sounds over-dramatic you’re wrong. I think it is completely fair to hold Intel responsible for this. Some security researchers are referring to Meltdown as an industry wide catastrophe. It’s genuinely a complete nightmare. I dread to think of the industry wide costs this is going to incur. it would be next to impossible to calculate on a global scale.
Since so many CPUs implement the same architecture (x86) all chips based on the design are effected in some way. Basically the entire world is about to lose a potentially significant portion of processing power. Everyone. It boggles the mind.
The issue is basically a problem with the way chips handle context switching. CPUs have two modes, kernel mode and user mode. The kernel is trusted and therefore has access to lots of security data. Whereas user mode has a lower security status. When the context is switched, this mode switching protects sensitive data by preventing user mode processes from reading the contents of protected memory. Due to the security issue that has been found, CPUs now need to flush their kernel mode cache far more often than would otherwise be required. Since this issue allows user processes to read kernel mode memory, when the context switches from kernel to user mode, the kernel now has to flush the protected memory. This was even if a nefarious user mode process tries to read the memory there isn’t anything useful left lying around.
This switching and cache flushing is the source of the potential performance impact. Obviously, if you are performing tasks that don’t cause much context switching the impact will be minimal. However, if the task performs many context switches, such as virtualisation, the performance hit will increase accordingly.
If I was a large scale hosting provider, or cloud computing platform I’d be really concerned. Imagine the worst case scenario where 30% of your computing resources vanished over-night. Poof! Gone. If you had been running your platform with anything less than around 50% headroom in relation to the required processing power you’ve got a significant investment in more hardware needed.
But what is really incredible is this issue isn’t new. What? Yes. This issue was first discussed in some depth in 1995, in a white paper called “Intel 80×86 Processor Architecture: Pitfalls for Secure Systems*“. This paper also referenced older papers from 1992. Over 25 years ago.
Yup, this issue has been kicking around for a very, very long time.
Also published on Medium.