The ARM family
The ARM family is not exactly simple to figure out. ARM cores don't match architecture numbers, letter suffixes are not always straightforward, and while looking for information on the ARM926 core in my PVR, the only thing I can say for certain is it is an ARM9 processor, and apparently nobody knows what the '26' suffix means. Um...
So here's a potted history of the ARM family.
It started with the ARM2. There were some ARM1s, but not many. It was the Acorn Archimedes that arrived in the world with an ARM2 onboard. Then came the ARM3.
There never was an ARM4 or ARM5.
By this time, Apple, DEC, and Intel got on board. Thus gave birth to the ARM6 and ARM7 processors, in a variety of guises. These were used in the RiscPC range, among other devices. DEC developed the StrongARM, and as part of a settlement this was given to Intel who refined it and created their own XScale processor which had since been passed on to somebody else. By the way, an ARM8 did exist and was to be an upgrade for the RiscPC, but it never went anywhere as the StrongARM blew it out of the water, so to speak.
ARM(the company)'s fortunes took a turn for the better with the release of the ARM7TDMI - which meant Thumb, Debug, Multiplier, ICE - for it offered Thumb 16 bit instruction support, JTAG and on-chip JTAG debugger support. It has been widely deployed in embedded applications.
Now, hold your breath.
ARM design and licence cores, not processors. So the ARM 1 is architecture ARMv1, while ARM2 and ARM3 are architecture ARMv2; even if the difference between an ARM2 and an ARM3 is greater than that of an ARM2 compared to an ARM1.
There is no ARM4, nor an ARM5.
The ARM6 and ARM7 are ARMv3. The ARM7TDMI is ARMv4(T) while the ARM7EJ is ARMv5(TEJ). The StrongARM and ARM8 are ARMv4 as well. The ARM9 is ARMv5, except for the ARM9TDMI which is ARMv4(T). The ARM11 is ARMv6.
After that point, everything is called Cortex. There are three families of Cortex. There is the Cortex-A aimed at application situations (this means it has an MMU), the Cortex-R aimed at real-time applications (extremely low latency), and a greatly restricted Cortex-M for microcontroller applications (may be Thumb only, few instructions, tiny pipeline) where ARM will give you "32 bit power at an 8 bit cost", to wean people away from PICs and AVRs and the like. Now I should point out that there is no hint of correlation in the suffix numbering. A Cortex-A3 is not like a Cortex-R3 or even remotely like a Cortex-M3. As it happens, only the M3 exists (A3 and R3 do not). There are holes in the numbering. They are all ARMv7 except the M0 which is ARMv6.
Note, incidentally, the Cortex suffix... A... R... M...
Let's take it a stage further. Thumb. A 16 bit reduced version of the ARM instruction set, right? Was that Thumb, Thumb2 (with some 32 bit instructions), or ThumbEE (?)?
To say this is all rather complicated would be an understatement.
There is a complete list on the other Wiki at http://en.wikipedia.org/wiki/List_of_ARM_microprocessor_cores
Ancient ARM, as used in the Acorn desktop computers, operates in the original mode where the Program Counter was 24 bits (effectively 26 bit as instructions are word aligned) with the upper six bits being processor status and flags, and the lowest two bits providing the processor mode. They are all, with the exception of the StrongARM, are von Neumann architecture, meaning that the cache (if one exists - it doesn't on the ARM1, ARM2, or ARM250) is a combined instruction and data cache.
All versions of RISC OS until the Castle RISC OS 5 for the Iyonix, worked in 26 bit mode. The ARM6, ARM7, and StrongARM processors are capable of 26 bit operation for compatibility.
No current processor supports the old 26 bit mode, these are legacy processors in the same vein as the 80386...
Contemporary ARM is a hard call. There are processors capable of running a desktop OS, but that isn't really where the ARM excels. The ARM excels in two areas - those who want a usable fairly feature-packed device that is light on battery consumption, and those who want a powerful and capable embedded system processor.
Therefore, you will find most Android mobile phones run on ARM, as does the Apple iPod/iPhone range. Numerous routers, printers, harddiscs, etc run ARM. However you can also get a Thumb-only ARM with a piddling amount of memory (something like 8KiB SRAM and 64KiB Flash) and a heap of I/O pins, which might sound like a bit of a joke, but it'd be perfectly suitable for microwaves, washing machines, bread makers, and so on.
This, as you can imagine, presents a huge problem when talking contemporary ARM. We can say the later processors switched to Harvard architecture (separate data/instruction caches), we can say some of them have as much as a 13 stage pipeline (that's a lot in ARM terms). We can also say that some offer NEON DSP-like instructions, vector floating point, or... but this would only be looking at one part of a very diverse family.
Perhaps the best way to think of it is that up to the ARM11, it was the ARM microprocessor going from strength to strength. Then following The Great Divide, when everything became a Cortex of some description, the ARM diversified. Massively.
The ARM family
The ARM family is described according to its base architecture version.
ARMv1 and ARMv2/ARMv2a
The ARM1 (ARMv1)
The ARM1 is a fairly little-known processor that was where it all started. It was fitted on expansion and co-processor modules as, at the time, no suitable computer existed. Actually, prior to the creation of the chip in silicon, a simulation was written in BASIC. [see also: http://www.bbcbasic.co.uk/bbcbasic/birthday ] Now that would be code worth looking at! The ARM1 was never commercialised. It was used, in small numbers, on a co-processor board for the BBC Micro family, and also (I believe) on a prototype RISC machine that came before the A3xx range. Once in a while, this equipment turns up on the likes of eBay and sells for a mind-numbing amount of money.
The ARM2 (ARMv2)
It was when RISC OS 2 came along that people started to take notice.
In more technical terms, the 8MHz ARM2 improved upon the ARM1 by implementing integer multiply in silicon and providing a co-processor interface mechanism for the elusive floating point accelerator. In addition, R8 and R9 are now banked in FIQ mode (the ARM1 only banked R10 upwards).
The picture on the right is of an ARM2 processor in an Acorn A3000. Running at 8MHz, and featuring the newer MEMC1a memory controller, this baby-Archimedes (limited expansion, no serial port as standard, one-box design) launched in 1989 managed around 4.6 MIPS.
The ARM3 (ARMv2a)
The next iteration of the ARM processor came in the form of the ARM3. Offering a 4KiB unified cache, plus increased clock speeds (usually 25MHz, although some 33MHz parts exist), it was compatible enough for a carrier board to be designed to plug in where an ARM2 used to be, giving an older machine an instant resurrection as something new.
The ARM3 adds the SWP instruction, which is a atomic instruction to swap data between registers and memory and is guaranteed not to be interrupted (unlike the corresponding LDR/STR sequence) so it may be used for system semaphores.
The first machine to use the ARM3 onboard was Acorn's A5000 (1991) and the A4 laptop (1992, although the A5000 design is derived from the A4!). Clocking at 25MHz (24MHz for the A4), the machine offers around 13.5 MIPS, which is about what you'd expect for the difference in clock speed. The Alpha versions of the A5000 (clocked at 33MHz) offer nearly 18 MIPS. By way of comparison, an Intel 80386DX clocking 25MHz provides 8.5 MIPS, while the higher power (and phenomenally expensive in its day) 80486(any version) generally offered around 20 MIPS at 25MHz.
The ARM250 (ARMv2a)
Late in 1992, Acorn launched new budget machines - two more in the A3000 range (A3010 and A3020) and an A4000 which was a business oriented mini-A5000. The principal difference was a reduction of board complexity with Acorn/ARM's first foray into SoC (System-on-Chip) design in which the main components of the ARM chipset (ARM, MEMC, VIDC, IOC) were coupled together in one piece of silicon. As it happens, initial supply problems meant that early machines did not have an ARM250 but instead a mezzanine board containing the individual chips.
An ARM250 is more or less an ARM3 without the cache, so it ought to be detectable by the lack of a cache but the support of SWP, though the mezzanine version may just behave like a marginally faster ARM2 machine. Clocking 12MHz, this SoC delivers around 7 MIPS.
ARM4 and ARM5
This was the period when ARM spun itself away from Acorn. Thus, for some reason there was no ARM4 or ARM5.
The ARMv3 is the processors used in the birth of Acorn's last production machine, the RiscPC. To say that things changed would be quite the understatement.
The general changes from the earlier processors are:
- Full 32 bit addressing capability, with the PC being the complete 32 bits wide and flags in a separate register.
- Also supports the older combined PC+PSR mode, for Acorn never took the opportunity to update RISC OS more than was necessary for utilising the new hardware, which led to a variety of anachronisms (such as a task's slot maxing out at 28MiB on a 128MiB machine due to the 26 bit mode only being physically capable of addressing 64MiB - the rest was clever memory mapping).
- Internal registers increased (due to extra modes, the programmer's view was largely the same).
- Six new processing modes. The originals were renamed USR26, SVC26, IRQ26, and FIQ26. The new ones are User32, Supervisor32, IRQ32, FIQ32, Abort32, Undefined32.
- CPSR/SPSR - Current and Saved status registers, in addition to MRS and MSR to read/write them.
- Endian-agnostic - can access memory in big endian or little endian mode.
- Static operation, so can handle being clock-stopped. Acorn machines don't use this functionality, other hardware did.
The ARM6 is available in various guises - the ARM60 is the ARM6 processor core as a chip, the ARM61 is a lower power version, the ARM650 having onboard memory and I/O for embedded devices, and the ARM600 with MMU, cache, and coprocessor bus.
While various versions of the ARM6 core have been made, this is well known due to its use as a processor for the RiscPC machines. Being an ARM6 core with MMU (Memory Management Unit) and a 4KiB unified cache (similar to that found on the ARM3), but no coprocessor bus, it clocks at 33MHz (other speed versions exist) and offers around 27 MIPS.
The ARM710 is superficially the same as the ARM610. The differences were not so much in the instruction set but the physical design of the processor. The cache is now 8KiB, the TLB and write buffer addresses in the MMU have doubled, and numerous internal changes with respect to timings mean that the processor only clocks marginally faster at 40MHz but offers a whopping 36 MIPS, which is a third faster.
It is around this point that things start to get messy as the family is available with a variety of "options", which are determined by suffix.
The ARMv4 family offers further modifications (depending on the chip), along with signed and unsigned multiply and long multiply.
With the cache now supporting writeback, the pipeline expanded to five entries, and a speculative instruction fetcher, this processor was supposed to offer some 50 MIPS at 55MHz and be a desired update to the RiscPC. However, the StrongARM left it standing and it is now largely forgotten.
StrongARM SA110 (ARMv4)
Developed by Digital (not ARM), it, like the ARM8, increased the pipeline. Additionally, the cache was broken into separate instruction and data caches (Harvard architecture) both of which are 16KiB. A variety of speeds were produced, but people aim for the 200MHz part which theoretically delivers 230 MIPS, and only consumes 1 Watt doing so.
I say theoretically as the RiscPC is hideously limited in that its memory bus clocks a mere 16MHz, the I/O clocks 12MHz, and the podule bus running flat out is 8MHz synchronous. This means that you will only be able to shift data to or from memory at the 16MHz speed, managing a throughput of maybe 20MiB/sec flat out. Talking to I/O (harddiscs, etc) will be about 6MiB/sec with a tailwind.
It worked surprisingly well considering it is, in essence, akin to bolting a jet engine onto a Mini.
The 'T' denotes the introduction of the Thumb instruction set. The Thumb instructions are not only to improve code density, but also to bring the power of the ARM into cheaper devices which may primarily only have a 16 bit datapath on the circuit board (for 32 bit paths are costlier).
When in Thumb mode, the processor executes Thumb instructions. While most of these instructions directly map onto normal ARM instructions, the space saving is by reducing the number of options and possibilities available - for example, conditional execution is lost, only branches can be conditional. Fewer registers can be directly accessed in many instructions, etc. However, given all of this, good Thumb code can perform extremely well in a 16 bit world (as each instruction is a 16 bit entity and can be loaded directly).
This, the ARM7+Thumb+Debug+Multiplier+ICE, is one of ARM's great success stories. Being used in large numbers of low power equipment that needs decent processing from MP3 players to simple routers and even microcontrollers in µSD cards, the ARM7 saw much application in embedded devices.
In technical terms, it offers a three-stage pipeline, a unified cache of 8K size (depending on core), and Thumb instruction decoding. The ARM7TDMI also offers the enhanced multiplication abilities first seen in the StrongARM, and delivers about 36 MIPS at 40MHz.
[ARMv7-A, ARMv7-R, ARMv7-M]