overview components memorymap downloads possibilities contact
 
processor memory via acia
The Amélie project
Components - The processor
The processor chosen to be the heart of Amélie is the 6502, expected to be clocked at 2MHz (or 1.8432MHz if using ACIA Xtal).
This processor was, primarily, chosen because I have grown up with Acorn computers, the older of which contained the 6502 (BBC micro, Electron...) and it was also present within other popular computers (the Oric, Dragon, Apple II...).

In order to further justify the choice of this processor, we must remember that we are building an embedded system. In this case it is prudent to ask ourselves exactly how much computational power we need. A 2MHz 8bit CPU is more than sufficient for this task.


Architecture

A picture from my old school prospectus, made in 1990.
I wonder, who is that person that hasn't been blurred?
Notice that my screen has a lot more stuff on it! We were using EDWORD on Master Compacts. These were the only Master Compacts. The other networked machines in the room were standard BBC Bs.
The NMOS 6502 is an 8 bit processor. It contains two index/general purpose registers, called X and Y, which are 8 bit. The results of mathematical and logical operations are written to the accumulator which is also 8 bit (though a ninth 'carry' bit is available). The program counter (PC) is 16 bit which means it can address 65,536 memory locations (64K).
Page Zero (the first 256 bytes of memory; &0000-&00FF ) are used in a special 'efficient' addressing mode, this behaves like regular addressing but is faster. Indirect addressing uses base addresses held in Page Zero.
Page One (&0100-&01FF) is the hardware stack. The 6502 provides a stack of 256 bytes. While this may seen utterly pathetic, remember this processor predates the horrible stack abuses where you could, for example, assemble code "on the stack". The 6502 stack is a true stack, a dumping ground for registers and status. Assuming all registers are saved in an interrupt, our stack needs per interrupt would be only six bytes (PC, PSR, X, Y, A) and you can thus hold 42 levels of interrupt in our 'piddly' stack.
The final six bytes of the memory map (&FFFA-&FFFF ) hold the addresses for the three hardware vectors - namely NMI (Non-Maskable Interrupt, unused in Amélie), RESET, and IRQ (Interrupt ReQuest).

As can be seen from the memory map, this dictates how we lay out our addressing. RAM must be at the lower addresses and the firmware (ROM, EPROM, FlashROM, etc) must be at the higher addresses. It makes sense to place the memory-mapped I/O in between.


Limitations

The 6502 is not without its limitations, and later versions of the processor have made attempts to remove such limitations. For example the 65C02 allows you to directly stack and unstack the X and Y registers. In the NMOS version they must be transferred into the accumulator first. The planned 65CE02 (unreleased?) takes things further with the addition of a new Z register.
The processor may be restricted in only being able to address 64Kb. This may not be a problem - it is more than enough for many embedded applications; and it is possible to interface to conventional storage solutions (FDC, IDE drive, Flash media...) if data capture and logging is required.
Perhaps of greater concern is that the processor itself is not able to trap undefined instructions or bad memory access and pass control back to the OS to abort an errant task and handle it, perhaps by restarting it or dropping to a debugger. Instead, the instruction will be executed and if it causes a crash, it causes a crash. The Amélie BIOS includes a watchdog that will call the (re)start code if a memory value is decremented down to zero. This should provide a rescue for dead-loops, but if the processor encounters an instruction which aborts processing, or if the stack is messed up it all goes pear-shaped, the watchdog probably won't help much...
We will, though, be sticking with the original NMOS version of the processor. It isn't perfect, but it isn't broken. However it may be that builds of Amélie will use the CMOS parts (i.e. 65C02, 65C22...), if I can source them, purely on grounds of enhanced efficiency - it will perform better, and for longer, when running on battery power.
Again, I refer you to http://www.6502.org/ for documentation on the 65xx family.


(NMOS) Processor Bugs

  • BCD mode quirks: If an interrupt occurs while in BCD mode, it will remain in BCD mode when entering the interrupt handler; rather than switching automatically to binary mode which is the logical idea.
    Also: The status flags for negative/overflow/zero (NVZ) are not valid when using BCD maths.
  • The 6502 instruction BRK be be used as a software interrupt. In the unfortunate case that a hardware interrupt occurs at the exact same time as the processor is fetching a BRK instruction, the processor will fail to execute the BRK and will behave as if the hardware interrupt is all that happened. What isn't clear from the document I'm reading is what happens upon RTI? I think this is a bug in the return address stacked, so it proceeds as if the BRK was actually just a NOP? Whatever.
  • The JSR may be expected to push the address of the next instruction to the stack, like happens with the interrupts. It does not. It pushes the address of the last byte of the JSR and the RTS instruction corrects this quirk. It is not strictly a bug as things work okay, but when emulating the processor or writing a debugger (etc), you must take into account that the address is wrong.
  • The indirect jump "JMP (<zp addr>) is broken in that if the address given is xxFF then it will jump to address xxFF and xx00 (instead of xxFF and xxFF+1). In other words, the increment does not increment the page, just the offset, which wraps back to zero.
  • It is dubious whether this can truly be considered a bug, however the supported instructions are implemented as a hardwired logic array (instead of microcode), so calling undefined instructions can cause a variety of bizarre effects such as "AND accumulator with immediate value, then ROR the result", or "OR accumulator with &EE, AND with immediate value, then store the result in both X and the accumulator".
    There are some potentially more useful instructions, such as "DEC a memory location, then CMP with the accumulator", however these sorts of things are to be avoided as it can depend on the design and implementation of the 6502 used, so different manufacturers may offer differing behaviour on the undefined instructions.
All of these quirks, except the JSR one, have been fixed in the CMOS versions of the processor, but please note the following:
  • The JSR quirk cannot be fixed, as it risks breaking a large amount of 6502 code written with this in mind. In the long run, it is just an oddity and not an actual bug.
  • The undefined instructions all behave as NOPs on CMOS versions of the 6502, but be aware that this is only slightly less 'broken' than the random behaviour of the NMOS processor. The reason for this is that the NOPs have differing instruction lengths and cycle counts. It is as if the undefined instruction is still executed, but the internal logic is somehow disengaged during this process so nothing happens, except PC being updated to move on to the next instruction.
    To see why this is still a problem, consider:
      NOP
      JSR &FED0

    If the NOP is an undefined single-byte instruction, the code will work as defined. But what if that NOP is an undefined instruction of two or three bytes length? The following JSR will be obliterated and taken to be a BNE or an INC instruction, mucking up everything else until the inevitable crash.


Interrupt latency

One of the particularly good features of the 6502 is the interrupt latency. This is the time duration between an interrupt occurring and the interrupt handler being called. On the 6502, the interrupt system is extremely quick:

-interrupt happens- 
<vector called>    ; 8 cycles
PHA                ; 3 cycles
TXA                ; 2 cycles
PHA                ; 3 cycles
TYA                ; 2 cycles
PHA                ; 3 cycles

Here the interrupt has been called, PC and PSR stacked, and then (thanks to us) all of the registers are also stacked. If we add another two cycles for allowing the current instruction to finish prior to the interrupt call (this is an 'average' time-to-complete-half-an-instruction), then it can be seen that our interrupt handler can be entered with full status saved in only 21 cycles. If we were to only use A and X, we could cut out five cycles. If we were to perform a BIT test against specific devices first , we could respond to a critical interrupt in around 15 cycles. Clocking at 2MHz, that would equate to seven and a half micro seconds.

Returning from interrupt is performed using:

PLA                ; 4 cycles
TAY                ; 2 cycles
PLA                ; 4 cycles
TAX                ; 2 cycles
PLA                ; 4 cycles
RTI                ; 6 cycles

This adds up to twenty two cycles for a full interrupt return, including restoring all registers. As before, optimisations are possible so you could feasibly halve that number of cycles.

Here is an example of a much more optimised interrupt handler:

-interrupt happens-; 2 cycles allowing completion of current instruction
<vector called>    ; 8 cycles
BIT  via_status    ; 4 cycles
BMI  via_handler   ; 2 cycles
BIT  acia_status   ; 4 cycles
BMI  acia_handler  ; 2 cycles
RTI                ; 6 cycles

The standard Amélie BIOS will not do this as it provides a mechanism for adding handlers for additional interrupt-generating sources, however the code above demonstrates how to handle an interrupt when the ACIA and VIA are your only possible sources of interrupt. Nothing is stacked unnecessarily, and you can see the entire overheads of interrupt processing can be squeezed down to a mere twenty eight cycles, or 14 microseconds.
Obviously, it will require more time to actually process the interrupt. Here we are looking at latency , the time between the interrupt happening and the system being in a position to cope with it.

If we can take an example of a serial byte received, we should consider that we will need to read the byte from the ACIA, and then work out where to put it. Some sort of ring buffer. Furthermore, in Amélie's case, this may cause various events to be 'fired' - a byte has been received, a newline was received, buffer is full, and so on. In addition to this, we may need to write back to the ACIA for the purpose of engaging CTS/RTS flow control.
The latency of this is therefore quite important, for the longer it all takes, the slower the serial port can usefully operate. The hardware laid out is a 6551 which has no FIFO. Therefore if bytes arrive faster than they can be picked up, data will be lost and the solution is a slower serial link, which may not be desirable.
Consider 9600bps at 8N1. This means using ten bits per byte (start, 8 data, 1 stop), we will be exercising 9,600 serial bit transitions per second. Divided by ten, this is a potential 960 bytes per second. Thus the serial port will interrupt us a little under one thousand times in a second. This gives us about 2,000 cycles in which to handle each serial byte, assuming there are no other interrupts at the same time.
I have not written the serial code yet, but I would imagine that this is plenty.
Let's try it at 19,200bps - the fastest the 6551 can achieve. This is potentially 1,920 bytes per second which is 1,920 interrupts per second, with a little over 1000 cycles to process each. Possible, but pulling it a lot finer!

Remember, also, that the VIA will be causing 50 ticks per second for the time/watchdog/LED flash/etc. If we blankly "assume" 250 cycles for all this, fifty times, that is 25000 cycles a second for housekeeping. Do the maths for the 19,200bps and we'll see it is still a little over 1000 cycles per byte. Am I contradicting myself? What am I trying to prove? It's simple. When you add lots of things together, they can certainly start to add up. But a short interrupt latency coupled with two million cycles per second is actually quite a lot. A BBC microcomputer could receive a file from a modem at 9600bps and store it bit-by-bit to an Econet fileserver (with the delay being the network itself) while processing the keyboard and the timer and the screen refresh and all the other stuff that was going on inside. Sure, it's tediously slow by today's standards with 16Mb broadband running on a ridiculously oversized operating system... but back then it coped and it coped well.

The significance of this? Not a lot right now. It is just musings on the interrupt latency of the 6502.


What about the Z80?

In comparing the 6502's interrupt latency against other processors, we can run into problems with devices such as the Z80 which talk about "machine cycles", as opposed to plain clock ticks. If we assume that a "T cycle" is a clock tick, it can be seen that the interrupt call itself is about 15 cycles . It takes 11 to 15 cycles to push a register, 10 to 14 to pull it, and 14 to return from interrupt. The Z80 only saves PC upon responding to an interrupt. Thus, a simple IRQ and return with nothing else in between takes 29 cycles, more than the 6502.
In favour of the Z80, there is an alternative register set, so it is possible to have a non-reentrant interrupt handler which switches to this alternative set for the interrupt code. This can be implemented using (code suggested by Jonathan Graham Harston):

EX  AF,AF'         ; 4 cycles
EXX                ; 4 cycles

Eight cycles. Faster than the 6502. As a typical Z80 is clocked faster than a 6502, so our 29+4 cycles will be equivalent to 15 cycles of the 6502 (at 4MHz). I am slightly worried about the 11-15 cycles it takes the Z80 to push a register and the 10-14 cycles required to pull it.

This is, sadly, a bit bogus. When we attempt to compare processors in this way we run into all sorts of nasty complications in that a like-for-like is simply not possible. Not only will different processors have different functionality (the Z80 can do a 16 bit addition in one instruction, this needs to be synthesised with seven instructions on a 6502), but also different design practice means that identical-function instructions will probably operate at different speeds. The Z80 complicates this with different cycle types (is a machine cycle a clock tick?), and then complicated even more by needing to take into account differences in actual processor speed.
The solution to to compare like with like for behaviour of a more complicated program. Over to Jonathan for the final word on the Z80:

The best way of comparing CPUs is by comparing an identical complex application on each CPU. I do this by comparing BBC BASIC.
6502 BBC BASIC is 16K, Z80 BBC BASIC implemting exactly the same functionality is 12K. 6502 BBC BASIC with a 2MHz CPU runs 2% slower than Z80 BBC BASIC with a 4MHz CPU.


What about the 8088?

While the 8088 has a plus in it's 1Mb addressing range (even with the hairy segmentation issues), we must realise that it has the odd requirement of a 33% duty cycle clock. Coupled with the data bus being shared with the lower eight bits of the address bus, this processor would appear to require a certain amount of 'support'. In terms of interrupt latency, it takes 15 cycles to push a register word onto the stack, and 12 to restore it. While this may compare with the Z80, the 8088 interrupt request instruction (IRET), which restores the flags and the CS:IP address to return to, takes 44 cycles! This is not an efficient processor, though thankfully things are vastly improved in later incarnations. It isn't beyond the realms of imagination that the interrupt setup and exit sequences alone could run into cycle counts measured in hundreds!


Why is interrupt latency so important?

On the face of it, it isn't. If Amélie is controlling a central heating system, for example, then a few microseconds here or there won't make any difference. Analogue switch thermostats are often so vague that it wouldn't matter much if the heat was turned off seconds (or even an entire minute) after the thermostat interrupt.
Oh the other hand, if Amélie is applied as a robot (i.e. RICKBOT1 ) and she's hurtling across a table, then the very last thing you want is a long lag when a sensor says "woah! we're all outta table here guys... guys!?! hello!?!?!". The 6502 could have done something about it before the Z80.
And, well, we'd crash and burn before the 8088 noticed...


More advanced processors
As mentioned at the top of the page, we have to take a quick reality check and assess what we expect. Sure, it'd be fun to code up an ARM7500FE to control the 'bot or the heating, but it'd be criminal - such power and potential doing diddly-squat.
For those of you who like pictures, consider the 6502 against an 80486 (which is a pretty large lump of ceramic):

An 80486 and a 6502 - guess which is which!

The size of the processor may not put you off. After all, you can always use a bigger circuit board, right? Well, let's look underneath:

The 80486 and the 6502 as seen from underneath...

One would be a nightmare to attach to a circuitboard, and one wouldn't. Guess.
Sadly, the ARM7500FE - being a surface-mount kind of chip - is far worse. There are about eight billion legs per square inch...

But we need not worry. The 6502 provides all that we require...


Other 6502 systems
For computers we have an impressive range - all of Acorn's early machines from the System 5s and the Atoms through to the Master 128 were based upon the 6502, progressing to the 65C02 (Master) and 65C102 (Compact/FileStore).
Acorn didn't have great success in America, their main competitor was Apple's Apple II which - surprise surprise - was also based upon the 6502. The Oric was a 6502 machine, as was the well-known Commadore PET; while the Commadore 64 used the 6510 incarnation. And that's just for starters!

Ironically, until the rise of the 'XT' with MS-DOS, no home computers used the 8086/8088 - nobody took it seriously! It is quite interesting, therefore, that the Psion 3 range of organisers choose the x86 architecture. The processor is a highly optimised V30 core, which is an 8088 specifically designed (and further refined by Psion for their macrocell) for use in just such a mobile application, eking many hours from two AA cells and permitting state to be maintained from a CR2012 button cell. I wonder if the V30 core was their first choice or if nothing better was available at the time, as the later series 5 (etc) organisers use an ARM core.

In the embedded market, we hark from an era predating PICs and heavily customised microcontrollers. A number of devices were actually mini computers running fixed software, much like Amélie. The Ringdale Megabuffer II was based upon the Z80, while the picture below is of a CASE modem that I salvaged from a dump many years ago. It is a 6502 design with 2K RAM and (I think) 8K of EPROM, plus interfacing.

An ancient modem based around the 6502!

The 6502 turned up in other places - a Prestel/Viewdata box. If you happen to be a certain Gareth Babb that ran a viewdata BBS with a name like CCL4 (?), then you might smile to know that I once called your service back in 1992 using an actual viewdata set-top-box!

Because of its clean interface, the 6502 was widely used in education. When I briefly attended Bridgwater College back circa 1993, our hands-on programming was with the 6502. I don't know if they still use this processor, however Wikipedia lists a number of universities around the world who still use the 6502 to teach assembly language. Make no mistake, when Chuck Peddle designed the 6502 back in 1975, he worked magic.

Today, the 6502 is still available for embedded systems in its various guises. In addition, a 16 bit version is available (65C816), with a 32 bit version in the pipeline. Clock speeds range from the original 1MHz to as far as 20MHz parts.

In comparison, Wikipedia reports that the i386 processor (also widely used in embedded systems with a DOS-like or Unix-like base OS) would no longer be produced as of September 2007. The 6502 is still going strong!


The primary 6502 resource
If you are looking for assistance and information on the 6502 processor and its family of support devices, then you will undoubtably find lots of things listed in Google.

You don't need to Google. There's one place where you'll find everything to do with the 6502:

http://www.6502.org/

I am not affiliated with this website, though I will certainly recommend it as... well... where d'you think I got all my 65xx datasheets? :-)

© 2007 Rick Murray