It is the 1662nd of March 2020 (aka the 17th of September 2024)
You are 3.235.226.14,
pleased to meet you!
mailto:blog-at-heyrick-dot-eu
Computer cockups for dummies
This past week, we have heard of a big problem affecting many modern processors. There are three varieties, known as Spectre (types 1 and 2) and Meltdown (yes, all good computer whoopsies need to have a name).
What follows is a simplified set of analogies to try to explain to non-nerds what all the fuss is about. It has the Mom seal of approval, and features several clarity enhancing suggestions by Mom. ☺
For nerds: This blog entry attempts to explain what is going on with the aforementioned problems in a way that normal people can get an understanding of the underlying cause. It provides a rough overview of the issue whilst trying not to get too technical. There are various in-depth descriptions already available, but not a one of them written in terms my mother could understand. The Pi Foundation article on why the Pi isn't affected introduces terms such as Scalar and Superscalar, and tries to explain it using Python syntax. To non-nerds, a python is a snake or any one of a group of British comedians....
In computer terms, there are principally two levels of "security". The top level with all of the privilege is the operating system. That is some complicated software that manages your computer and its behaviour. Windows, iOS, Android... The less privileged level is the user level. This is where applications such as word processors and web browsers exist. Why? Simple. A word processor is only supposed to be concerned with its own activities, not messing around with other things.
Now it is quite normal for the application to make requests of the operating system. A word processor might ask "is the Control key being pressed?" or "load this file from disc". At this point, the operating system will take over, the processor will switch up to a higher level of privilege, and it will perform the task. When this is done, the processor will switch back down to lesser privilege and return to the application to carry on.
Computer speak complicated? Try this. Imagine you are a little girl making a cake with mummy. You (the little girl) are only allowed to do certain things. You can't freely use knives or mess around with the oven. Mummy might assist you in doing something 'dangerous', but it's mummy in charge.
I'm sure you've already figured out that mummy is representing the operating system and the little girl is an application.
Now for the second part of the equation. A processor, despite being capable of showing you TV programmes from around the world and Twitter and Facebook... is actually a rather stupid device. It performs a list of instructions. Like baking a cake - you need a certain amount of flour, some butter, eggs, etc etc. The processor will do these instructions one by one.
Now here's a slightly more complicated part. A processor, in order to behave more efficiently, uses something known as a pipeline. While modern pipelines are horribly complex, they still essentially boil down to the three phases known as fetch, decode, execute.
That is to say, the process of dealing with an instruction such as ADD this TO that can be broken down as follows:
Retrieve the instruction from memory
Work out what the instruction actually is
Do it
This is because, for instance, if you are simply adding two numbers, you only need to activate the logic circuits for adding numbers. The parts of the processor that, say, swap things around? Not needed this time.
To put this into context with our cake recipe, we look at the recipe and note that we need flour (retrieve), we get the flour pack and measure out so many grams (decode), then we put the flour into the mixing bowl (execute).
Now for the next part of our equation. Decisions.
Computers make millions of decisions every second. These are not big decisions, things like IF something THEN do this. Back to our cake recipe analogy - you can get two types of flour. You can get plain white flour and you can get self raising flour. IF flour IS NOT self raising THEN ... add raising agent
We have interrupted the smooth flow of our recipe with a decision. We are being asked that, if we are using plain flour, to add a raising agent to help the cake rise.
Now the problem for people who design processors is that decisions do not work well with a pipeline. When the processor has to go and do something else as a result of a decision, the pipelined instructions being fetched and being decoded must be thrown away. Why? Because we're not doing that, we're doing something else now. This isn't an error, it is simply a side effect of how a pipeline works.
Imagine - to explain this - three schoolgirls. Girl 3 has a bucket. Girl 1 picks up a pebble and gives it to Girl 2, who then gives it to Girl 3 who puts it in the bucket. Their teacher will tell them to stop when there are exactly fifty pebbles in the bucket. At this point, Girl 1 will have just picked up another pebble and Girl 2 will be about to hand her pebble over. But the bucket is full. So they must drop their pebbles. The teacher empties the bucket, and tells the girls to begin again.
As you will have noticed, Girl 3 will have nothing to do for a few moments. She will be waiting for a pebble to be picked up by Girl 1, given to Girl 2, and then passed to her for placement in the bucket.
The nerdy term for this is a "pipeline stall". In effect, the processor will have nothing to do while it is waiting for the new instructions to arrive through the pipeline. This waiting may be something in the order of a billionth of a second, however if there are a million decisions, you can imagine how these tiny delays can add up and affect performance.
Now an additional problem arises when an application calls the operating system. We not only need to refill the pipeline, but around this call the state of the processor needs to be remembered and restored. Mummy and girl may be making the cake on the kitchen table. In order to have a big place to work, all the things on the table (the coffee dose holder, the vase, etc) will have been removed and put aside. When the cake has been made, these things will be put back again. In computer terms, we refer to this as a "context switch" and we generally call them "expensive" as it takes time and effort for the processor to perform these actions.
The final part of our equation is something called a "cache". You see, processors these days operate at insane speeds. My PC runs at around two and a half gigahertz, my phone only slightly slower. That means the ticks that synchronise the fetch, decode, execute system are ticking at 2,500,000,000 times every second. Because of lots of complicated laws of physics, the processor can operate at these sorts of speeds in a tiny carefully designed space. However the rest of the machine is utterly incapable of this. A circuit board just cannot operate so quickly. Which means the computer's memory cannot operate as fast as the processor. In order to act as a buffer between the slow main memory inside the computer and the fast processor, there is a small amount of additional memory built into the processor itself. It is a very small amount, but this is okay as processors frequently do repetitive tasks. Let's say you want to make a picture black and white. Well, this means looking at a single dot on the screen, adding up its colour values, then deciding how much 'grey' this dot would become, before changing the dot to be grey instead of coloured. For a FullHD television picture, this would mean performing the same task 2,073,600 times, the only thing that would change is which dot is being looked at. Because the recent instructions will have been copied into the cache, the processor can retrieve them over and over from there instead of waiting on slow memory.
A lot of time and energy has been expended in trying to find ways to make processors faster and more efficient. While there are hundreds of different types of processor, the three big ones are:
Intel's x86 family (and compatibles) which is mainly used in big desktop machines. These offer brute force, but are large and consume a fair bit of power.
The ARM core which is mainly used in efficient mobile devices that need to operate for a long time from ever-smaller batteries.
MIPS processors which tend to turn up quietly doing their stuff in WiFi adaptors, webcams, television/satellite receivers, etc.
One of the big ideas of the last decade (or so) is something known as "speculative execution". That is to say, when the processor reaches a decision point, it will actually begin to execute both sets of instructions (taking the decision, and not) at the same time. Then, when the decision actually comes to be made (the execute part of the pipeline), the processor will just throw away the path that is not taken, and the pipeline will already be full from the path that was taken. Instant saving!
Now what is happening with these newly discovered problems (Spectre and Meltdown) is an abuse of various parts of the processor. A program without any privilege will attempt to read a little bit of memory from the operating system. This will fail, as unprivileged programs cannot access internal stuff (or else your security passwords and encryption keys and such would be accessible to everything and everyone). The problem is, the speculative execution will have already tried it and there will be traces of the failed instruction in the cache including the data that the unprivileged program is not permitted to access. By careful arrangement of instructions and timing, it has been discovered that this makes a weakness - if the data is present in the cache, and the cache can be read, you can see where this is going. The procedure may only be able to extract a single character or two at a time, and operating system workspace may amount to hundreds of millions of characters. However, as we've already demonstrated, computers are really good at doing little things over and over and over. It'll take some time, but it is now technically possible to extract all of the privileged memory with a program that has no privilege. Worse than that, in theory a Javascript application in a web browser may also be capable of doing the same thing - that is to say that advertisers and such on certain web sites could try to read private information from the heart of your computer.
But, wait, why is everybody seemingly affected (there's like zero correlation between ARM and x86) and why wasn't this known about earlier?
In answer to that, the reason it seems that everybody is affected is due to how modern processors perform speculative execution. There is another issue known as "out of order execution" where a processor will try to rearrange instructions for greater efficiency. Everybody demands more and more from a processor. My first Android phone had a simple web browser and could record 480p (307,200 dots, just like old TV) video. My current phone can run Firefox and record QuadHD video (3,686,400 dots, 12 times larger), and it's barely bigger than the first. We just want more and more and more. So a lot of effort has been invested in looking for inefficiencies and devising ways to work around them.
While this problem has been a part of processor design for at least the past decade, it was actually pretty complicated to make work. The set of instructions may turn out to be fairly simple, but the process behind is less so. That's why it's taken so long for this discovery to come to light.
Another big change in the past decade is the enhanced security of operating systems. Way back when, the "security" of Windows was a joke. The default user (that would be you) was granted full administrator level access which meant any program running (as you) would have pretty complete access to much of the machine. Malware and viruses were rife. It's no surprise that many antivirus products were created to try to plug the obvious holes.
However as operating systems become more and more advanced and secure, those who wish to break into machines are starting to turn their attention to bypassing the operating system by exploiting issues inherent in the very hardware that the machine is made from.
And that's where we stand today. At the edge of realisation that enhanced security and enhanced speed are mutually exclusive. New processor designs will have ways to operate without these problems, but in time we will no doubt see another world shattering problem turn up as we strive to make our processors faster and faster yet.
Your comments:
Please note that while I check this page every so often, I am not able to control what users write; therefore I disclaim all liability for unpleasant and/or infringing and/or defamatory material. Undesired content will be removed as soon as it is noticed. By leaving a comment, you agree not to post material that is illegal or in bad taste, and you should be aware that the time and your IP address are both recorded, should it be necessary to find out who you are. Oh, and don't bother trying to inline HTML. I'm not that stupid! ☺ ADDING COMMENTS DOES NOT WORK IF READING TRANSLATED VERSIONS.
You can now follow comment additions with the comment RSS feed. This is distinct from the b.log RSS feed, so you can subscribe to one or both as you wish.
This web page is licenced for your personal, private, non-commercial use only. No automated processing by advertising systems is permitted.
RIPA notice: No consent is given for interception of page transmission.