|
In
order to further justify the choice of this processor, we must
remember that we are building an embedded system. In this case
it is prudent to ask ourselves exactly how much computational power we
need. A 2MHz 8bit CPU is more than sufficient for this
task.
Architecture
A
picture from my old school prospectus, made in 1990.
I wonder, who is that person that hasn't been blurred?
Notice that my screen has a lot more stuff on it! We were using EDWORD
on Master Compacts. These were the only Master Compacts. The other
networked machines in the room were standard BBC Bs.
|
| The NMOS 6502 is an
8 bit processor. It contains two index/general purpose registers, called
X and Y, which are 8 bit. The results of mathematical
and logical operations are written to the accumulator which
is also 8 bit (though a ninth 'carry' bit is available). The
program counter (PC) is 16 bit which means
it can address 65,536 memory locations (64K). Page Zero (the first
256 bytes of memory; &0000-&00FF
) are
used in a special 'efficient' addressing mode, this behaves
like regular addressing but is faster. Indirect addressing
uses base addresses held in Page Zero. Page One
(&0100-&01FF)
is the hardware stack. The 6502 provides a
stack of 256 bytes. While this may seen utterly pathetic, remember
this processor predates the horrible stack abuses where
you could, for example, assemble code "on the stack". The 6502
stack is a true stack, a dumping ground for registers and
status. Assuming all registers are saved in an interrupt,
our stack needs per interrupt would be only six bytes (PC,
PSR, X, Y, A) and you can thus hold 42 levels of interrupt in
our 'piddly' stack. The final six bytes of the memory map
(&FFFA-&FFFF
) hold the addresses for the three
hardware vectors - namely NMI (Non-Maskable Interrupt, unused
in Amélie), RESET, and IRQ (Interrupt
ReQuest).
As can be seen from the memory map, this dictates how we lay out
our addressing. RAM must be at the lower addresses
and the firmware (ROM, EPROM, FlashROM, etc) must
be at the
higher addresses. It makes sense to place the memory-mapped I/O in
between.
Limitations The 6502 is not without its
limitations, and later versions of the processor have made
attempts to remove such limitations. For example the 65C02
allows you to directly stack and unstack the X and Y
registers. In the NMOS version they must be transferred into
the accumulator first. The planned 65CE02 (unreleased?) takes things
further with
the addition of a new Z register.
The processor may be restricted in only being able to address
64Kb. This may not be a problem - it is more than enough for
many embedded applications; and it is possible to interface to
conventional storage solutions (FDC, IDE drive, Flash media...)
if data capture and logging is required.
Perhaps of greater concern is that the processor itself is not
able to trap undefined instructions or bad memory access and
pass control back to the OS to abort an errant task and handle
it, perhaps by restarting it or dropping to a debugger. Instead,
the instruction will be executed and if it causes a crash, it
causes a crash. The Amélie BIOS includes a watchdog
that will call the (re)start code if a memory value is
decremented down to zero. This should provide a rescue for
dead-loops, but if the processor encounters an instruction
which aborts processing, or if the stack is messed up it all
goes pear-shaped, the watchdog probably won't help much...
We will, though, be
sticking with the original NMOS version of the processor.
It isn't perfect, but it isn't broken. However it may be
that builds of Amélie will use the CMOS parts
(i.e. 65C02, 65C22...), if I can source them, purely on
grounds of enhanced efficiency - it will perform better, and
for longer, when running on battery power. Again, I refer
you to http://www.6502.org/
for documentation on the 65xx
family.
(NMOS) Processor Bugs
- BCD mode quirks: If an interrupt occurs while in BCD mode, it will
remain in BCD mode when entering the interrupt handler; rather than
switching automatically to binary mode which is the logical idea.
Also: The status flags for negative/overflow/zero (NVZ) are not
valid when using BCD maths.
- The 6502 instruction BRK be be used as a software interrupt.
In the unfortunate case that a hardware interrupt occurs at the exact
same time as the processor is fetching a BRK instruction, the
processor will fail to execute the BRK and will behave as if
the hardware interrupt is all that happened. What isn't clear from the
document I'm reading is what happens upon RTI? I think this
is a bug in the return address stacked, so it proceeds as if the
BRK was actually just a NOP? Whatever.
- The JSR may be expected to push the address of the next
instruction to the stack, like happens with the interrupts. It does
not. It pushes the address of the last byte of the JSR and
the RTS instruction corrects this quirk. It is not strictly
a bug as things work okay, but when emulating the processor or writing
a debugger (etc), you must take into account that the address is
wrong.
- The indirect jump "JMP (<zp addr>) is broken in that
if the address given is xxFF then it will jump to address xxFF and
xx00 (instead of xxFF and xxFF+1). In other words, the increment does
not increment the page, just the offset, which wraps back to zero.
- It is dubious whether this can truly be considered a bug, however the
supported instructions are implemented as a hardwired logic array
(instead of microcode), so calling undefined instructions can cause
a variety of bizarre effects such as "AND accumulator with
immediate value, then ROR the result", or "OR accumulator with
&EE, AND with immediate value, then store the result in both X and
the accumulator".
There are some potentially more useful instructions, such as "DEC
a memory location, then CMP with the accumulator", however these
sorts of things are to be avoided as it can depend on the design and
implementation of the 6502 used, so different manufacturers may offer
differing behaviour on the undefined instructions.
All of these quirks, except the JSR one, have been fixed in the CMOS
versions of the processor, but please note the following:
- The JSR quirk cannot be fixed, as it risks breaking a large
amount of 6502 code written with this in mind. In the long run, it is
just an oddity and not an actual bug.
- The undefined instructions all behave as NOPs on CMOS
versions of the 6502, but be aware that this is only slightly less
'broken' than the random behaviour of the NMOS processor. The reason
for this is that the NOPs have differing instruction lengths
and cycle counts. It is as if the undefined instruction is still
executed, but the internal logic is somehow disengaged during this
process so nothing happens, except PC being updated to move
on to the next instruction.
To see why this is still a problem, consider:
NOP
JSR &FED0
If the NOP is an undefined single-byte instruction, the code
will work as defined. But what if that NOP is an undefined
instruction of two or three bytes length? The following JSR
will be obliterated and taken to be a BNE or an INC
instruction, mucking up everything else until the inevitable crash.
Interrupt
latency One of the particularly good
features of the 6502 is the interrupt latency. This
is the time duration between an interrupt occurring and the
interrupt handler being called. On the 6502, the interrupt
system is extremely quick:
-interrupt happens-
<vector called> ; 8 cycles
PHA ; 3 cycles
TXA ; 2 cycles
PHA ; 3 cycles
TYA ; 2 cycles
PHA ; 3 cycles
Here the interrupt has been called, PC and PSR stacked,
and then (thanks to us) all of the registers are also
stacked. If we add another two cycles for allowing the current
instruction to finish prior to the interrupt call (this is
an 'average' time-to-complete-half-an-instruction), then it can be seen that our interrupt
handler can be entered with full status saved in only
21
cycles. If we
were to only use A and X, we could cut out five cycles. If we
were to perform a BIT test against specific devices
first
, we could respond to a
critical interrupt in around 15 cycles. Clocking at 2MHz, that
would equate to seven and a half
micro
seconds.
Returning from interrupt is performed using:
PLA ; 4 cycles
TAY ; 2 cycles
PLA ; 4 cycles
TAX ; 2 cycles
PLA ; 4 cycles
RTI ; 6 cycles
This adds up to twenty two cycles
for a full interrupt return, including restoring all registers. As before, optimisations are possible so you could feasibly
halve that number of
cycles.
Here is an example of a much more optimised interrupt handler:
-interrupt happens-; 2 cycles
allowing completion of current
instruction <vector
called> ; 8
cycles BIT
via_status ; 4
cycles BMI
via_handler ; 2
cycles BIT acia_status
; 4 cycles BMI acia_handler
; 2
cycles RTI
; 6 cycles
The standard Amélie BIOS will not
do this as it provides a mechanism for adding handlers for
additional interrupt-generating sources, however the code
above demonstrates how to handle an interrupt when the ACIA
and VIA are your only possible sources of interrupt.
Nothing is stacked unnecessarily, and you can see the
entire overheads of interrupt
processing can be squeezed down to a mere twenty
eight cycles, or 14 microseconds. Obviously,
it will require more time to actually process the interrupt.
Here we are looking at latency , the
time between the interrupt happening and the system being in a position
to cope with it.
If we can take
an example of a serial byte received, we should consider that
we will need to read the byte from the ACIA, and then work out
where to put it. Some sort of ring buffer. Furthermore, in
Amélie's case, this may cause various events to be 'fired' - a
byte has been received, a newline was received, buffer is
full, and so on. In addition to this, we may need to write
back to the ACIA for the purpose of engaging CTS/RTS flow
control. The latency of this is therefore quite important,
for the longer it all takes, the slower the serial port can
usefully operate. The hardware laid out is a 6551 which has no
FIFO. Therefore if bytes arrive faster than they can be picked
up, data will be lost and the solution is a slower serial
link, which may not be desirable. Consider 9600bps at 8N1.
This means using ten bits per byte (start, 8 data, 1
stop), we will be exercising 9,600 serial bit transitions per
second. Divided by ten, this is a potential 960 bytes per
second. Thus the serial port will interrupt us a little under
one thousand times in a second. This gives us about
2,000 cycles in which to handle each serial byte, assuming
there are no other interrupts at the same time. I have not
written the serial code yet, but I would imagine that this is
plenty. Let's try it at 19,200bps - the fastest the 6551
can achieve. This is potentially 1,920 bytes per second which
is 1,920 interrupts per second, with a little over 1000 cycles
to process each. Possible, but pulling it a lot
finer!
Remember, also, that the VIA will
be causing 50 ticks per second for the time/watchdog/LED
flash/etc. If we blankly "assume" 250 cycles for all this,
fifty times, that is 25000 cycles a second for housekeeping.
Do the maths for the 19,200bps and we'll see it is
still a little over 1000 cycles per byte. Am I
contradicting myself? What am I trying to prove? It's simple.
When you add lots of things together, they can certainly start
to add up. But a short interrupt latency coupled with two
million cycles per second is actually quite a lot. A BBC
microcomputer could receive a file from a modem at 9600bps and
store it bit-by-bit to an Econet fileserver (with the delay
being the network itself) while processing the keyboard and
the timer and the screen refresh and all the other stuff that
was going on inside. Sure, it's tediously slow by today's
standards with 16Mb broadband running on a ridiculously
oversized operating system... but back then it coped and it
coped well.
The significance of this?
Not a lot right now. It is just musings on the interrupt
latency of the 6502.
What about the
Z80?
In
comparing the 6502's interrupt latency against other
processors, we can run into problems with devices such as the
Z80 which talk about "machine cycles", as opposed to plain
clock ticks. If we assume that a "T cycle" is a clock tick, it
can be seen that the interrupt call itself is about 15 cycles . It takes
11 to 15 cycles to push a register, 10 to 14 to pull it, and
14 to return from interrupt. The Z80 only saves PC upon
responding to an interrupt. Thus, a simple IRQ and return with
nothing else in between takes 29 cycles, more than the
6502. In favour of the Z80, there is an
alternative register set, so it is possible to have a
non-reentrant interrupt handler which switches to this
alternative set for the interrupt code. This can be
implemented using (code suggested by Jonathan Graham
Harston):
EX AF,AF'
; 4
cycles EXX
; 4 cycles
Eight cycles.
Faster than the 6502. As a typical Z80 is
clocked faster than a 6502, so our 29+4 cycles will
be equivalent to 15 cycles of the 6502 (at 4MHz). I am
slightly worried about the 11-15 cycles it takes the Z80 to
push a register and the 10-14 cycles required to pull
it.
This is, sadly, a bit bogus. When we attempt to
compare processors in this way we run into all sorts of nasty
complications in that a like-for-like is simply not possible.
Not only will different processors have different
functionality (the Z80 can do a 16 bit addition in
one instruction, this needs to be synthesised with
seven instructions on a 6502), but also different
design practice means that identical-function instructions
will probably operate at different speeds. The Z80 complicates
this with different cycle types (is a machine cycle a clock
tick?), and then complicated even more by needing to take into
account differences in actual processor speed. The solution
to to compare like with like for behaviour of a more
complicated program. Over to Jonathan for the final word
on the Z80:
The best way
of comparing CPUs is by comparing an identical complex
application on each CPU. I do this by comparing BBC
BASIC. 6502 BBC BASIC is 16K, Z80 BBC BASIC implemting
exactly the same functionality is 12K. 6502 BBC BASIC with a
2MHz CPU runs 2% slower than Z80 BBC BASIC with a 4MHz
CPU.
What about the
8088?
While the 8088 has a plus in it's 1Mb
addressing range (even with the hairy segmentation issues), we
must realise that it has the odd requirement of a 33%
duty cycle clock. Coupled with the data bus being shared
with the lower eight bits of the address bus,
this processor would appear to require a certain amount of
'support'. In terms of interrupt latency, it takes 15 cycles to push a
register word onto the stack, and 12 to restore it. While this
may compare with the Z80, the 8088 interrupt request
instruction (IRET), which
restores the flags and the CS:IP
address to return to,
takes 44 cycles! This is not an efficient processor,
though thankfully things are vastly
improved in later incarnations. It isn't beyond the realms of
imagination that the interrupt setup and exit sequences
alone could run into cycle counts
measured in hundreds!
Why is interrupt latency so
important?
On the face of it, it isn't.
If Amélie is controlling a central heating system,
for example, then a few microseconds here or there won't make
any difference. Analogue switch thermostats are often so vague
that it wouldn't matter much if the heat was turned off
seconds (or even an entire minute) after the thermostat
interrupt. Oh the other hand, if Amélie is applied as a
robot (i.e. RICKBOT1
) and she's hurtling across a table, then the very
last thing you want is a long lag when a sensor says
"woah! we're all outta table here guys... guys!?! hello!?!?!". The 6502 could have done
something about it before the Z80.
And, well,
we'd crash and burn before the 8088 noticed...
More advanced
processors As
mentioned at the top of the page,
we have to take a quick reality check and assess what we
expect. Sure, it'd be fun to code up an ARM7500FE to control the 'bot
or the heating, but it'd be criminal - such
power and potential doing diddly-squat. For those of you
who like pictures, consider the 6502 against an 80486 (which
is a pretty large lump of ceramic):

The size of the processor may not put you off.
After all, you can always use a bigger circuit board, right?
Well, let's look underneath:

One would be a nightmare to attach to a circuitboard,
and one wouldn't. Guess. Sadly, the ARM7500FE - being a surface-mount kind
of chip - is far worse. There are about eight billion
legs per square inch...
But we need not worry. The
6502 provides all that we require...
Other 6502
systems
For computers
we have an impressive range - all of Acorn's early machines
from the System 5s and the Atoms through to the
Master 128 were based upon the 6502, progressing to the 65C02 (Master)
and 65C102 (Compact/FileStore). Acorn
didn't have great success in America, their main competitor was Apple's Apple
II which - surprise surprise - was also based upon
the 6502. The Oric was a 6502 machine, as was the well-known Commadore
PET; while the Commadore 64 used the 6510 incarnation. And that's just
for starters!
Ironically, until the rise of the 'XT'
with MS-DOS, no home computers used the 8086/8088 - nobody
took it seriously! It is quite interesting, therefore,
that the Psion 3 range of organisers choose the x86
architecture. The processor is a highly optimised V30 core,
which is an 8088 specifically designed (and further refined by
Psion for their macrocell) for use in just such a mobile
application, eking many hours from two AA cells and permitting
state to be maintained from a CR2012 button cell. I wonder if
the V30 core was their first choice or if nothing better was
available at the time, as the later series 5 (etc) organisers
use an ARM core.
In the embedded market, we
hark from an era predating PICs and heavily customised
microcontrollers. A number of devices were actually mini
computers running fixed software, much like Amélie. The Ringdale Megabuffer II
was based upon the Z80, while the picture below is of a CASE
modem that I salvaged from a dump many years ago. It is a 6502
design with 2K RAM and (I think) 8K of EPROM, plus
interfacing.

The 6502 turned up in other
places - a Prestel/Viewdata box. If you happen to be a
certain Gareth Babb that ran a viewdata BBS with a name like
CCL4 (?), then you might smile to know that I once called your
service back in 1992 using an actual viewdata
set-top-box!
Because
of its clean interface, the 6502 was widely used in education.
When I briefly attended Bridgwater College back circa
1993, our hands-on programming was with the 6502. I don't know
if they still use this processor, however
Wikipedia lists a number of universities
around the world who still use the 6502 to teach assembly
language. Make no mistake, when Chuck Peddle designed the 6502
back in 1975, he worked magic.
Today,
the 6502 is still available for embedded systems in its
various guises. In addition, a 16 bit version is available
(65C816), with a 32 bit version in the pipeline. Clock speeds
range from the original 1MHz to as far as 20MHz
parts.
In
comparison, Wikipedia
reports that the i386 processor (also widely used in
embedded systems with a DOS-like or Unix-like base OS) would
no longer be produced as of September 2007. The 6502 is still
going strong!
The
primary 6502 resource If you
are looking for assistance and information on the 6502
processor and its family of support devices, then you will
undoubtably find lots of things listed in Google.
You don't need to Google. There's one place
where you'll find everything to do with the
6502:
http://www.6502.org/
I am not affiliated with this website,
though I will certainly recommend it as... well... where d'you
think I got all my 65xx datasheets?
:-)
|