The Pipeline


A conventional processor executes instructions one at a time, just as you expect it to when you write your code. Each execution can be broken down into three parts, which anybody who has learned this stuff at college will have fetch, decode, execute burned into their memory.

In English...

  1. Fetch
    Retrieve the instruction from memory.
    Don't get all techie - whether the instruction comes from system memory or the processor cache is irrelevant, the instruction is not loaded 'into' the processor until it is specifically requested. The cache simply serves to speed things up. By loading chunks of system memory into the cache, the processor can satisfy many more of its instruction fetches by pulling instructions from the cache. This is necessary because processors are very fast (StrongARMs, 200MHz+; Pentiums up to GHz!) and system memory is not (33, 66, or 133MHz). To see the effect the cache has on your processor, use *Cache Off.
  2. Decode
    Figure out what the instruction is, and what is supposed to be done.
  3. Execute
    Perform the requested operation.

Each of these operations is performed along with the electronic 'heartbeat', the clock rate. Example clock rates for several microprocessors included in Acorn products are given here as an example:

BBC microcomputer 6502 2MHz
Acorn A310-A3000 ARM 2 8MHz
Acorn A5000 ARM 3 25MHz
Acorn A5000/I ARM 3 30MHz
RiscPC600 ARM610 33MHz
RiscPC700 ARM710 40MHz
Early PC co-processor 486SXL-40 33MHz (not 40!)
RiscPC (StrongARM) SA110 202MHz - 278MHz+
As shown in the PC world, processors are running into GHz speeds (1,000,000,000 ticks/sec) which will necessitate much in the way of speed tweaks (huge amounts of cache, extremely optimised pipeline) because there is no way the rest of the system can keep up. Indeed, the rest of the system is likely to be operating at a quarter of the speed of the processor. The RiscPC is designed to work, I believe, at 33MHz. That is why people thought the StrongARM wouldn't give much of a speed boost. However the small size of ARM programs, coupled with a rather large cache, made the StrongARM a viable proposition in the RiscPC, it bottlenecked horribly, but other factors meant that this wasn't so visible to the end-user, so the result was a system which is much faster than the ARM710. More recently, the Kinetic StrongARM processor card. This attempts to alleviate bottlenecks by installing a big wodge of memory directly on the processor card and using that. It even goes so far as to install the entirety of RISC OS into that memory so you aren't kept waiting for the ROMs (which are slower even than RAM).

There is an obvious solution. Since these three stages (fetch, decode, execute) are fairly independent, would it not be possible to:

     fetch   instruction #3
     decode  instruction #2
     execute instruction #1

     ...then, on the next clock tick...

     fetch   instruction #4
     decode  instruction #3
     execute instruction #2


     fetch   instruction #5
     decode  instruction #4
     execute instruction #3
In practice, the answer is yes. And this is exactly what a pipeline is. Simply by doing this, you have just made your processor three times faster!

Now, it isn't a perfect solution.

Return to assembler index
Copyright © 2001 Richard Murray