|
The bad news
first...
Due to
the heavy dependencies on conio and the use
of delay(), this code will only run under
16-bit DOS. I looked to both lcc and
OpenWatcom and one has a conio that crashed when used
as a 32-bit character mode application, and the other didn't
have it.
So,
for now, unless somebody can point me to conio
that works comprehensively in a 32-bit .exe in text mode
(graphics not required), AmélieEm will remain
old-style-DOS.
There is a
basic, hacky, RISC OS conversion available too... :-)
Introduction
As I did
not have an EPROM eraser I figured it might be better
to write and test Amélie's BIOS and application code
in the software domain. Besides, writing an emulator sounded like fun.

The picture above is what
AmélieEm says when it is initialising, in case you
ever wondered.. This stage will take a split second if you are
loading AmélieEm from a harddisc. The next thing you
will see is Tracey:

Get to know
Tracey, she is very versatile. No, she isn't named
after my girlfriend - it is from "tracing mode". I'm a geek,
remember?
How the display is laid
out
The screen is split into three
sections:
-
The top displays a disassembly of the current
instructions on the left. Cursor down will go to the
next instruction, but cursor up will back up one
byte. Sorry, but the 6502 isn't a word-aligned
processor. In the middle is the complete status of the
processor registers, plus the addressing mode of the
currently selected instruction. Ignore "mem" and "tmp".
These instructions are used internally in the emulation
(refer to the source if you want to know what for). On
the right, the final 16 bytes of the software stack (this
is fixed) and the value of the stack
pointer.
-
The middle of the screen is reserved for I/O
emulation status. Here you can see the important VIA
registers/status, and the slightly inaccurate cycle counter (it does not
add extra cycles for page boundaries being
crossed).
-
The bottom of the screen serves as a 64 byte dump
of memory and the command line. Page Up and Page Down
can be used to scroll through the memory. The dump will
'wrap around', and any unused or unallocated addresses
will be seen as
'00'.
Tracey is versatile. You
can alter most aspects of the system here - including poking
around in memory (including EPROM!). Pressing RETURN will
allow you to single-step instructions. Leaving Tracey will let the emulation run at
full speed, which isn't terribly fast - some day I
will look to reworking the 6502 core.
Breakpoints
You can set up
"breakpoints" which will cause Tracey to reappear
just before a specific instruction is executed.
Let's set up a
breakpoint now. You would press B
(for Breakpoint). The command line changes
to:

We want to
set a
breakpoint, so press press S
. The
command line will change to:

We want to set a breakpoint on an
address
, so press A, and then type in the desired
address F817
. The command line will look
like:
Press Return to set it. You can tell breakpoints by the
red highlight and the 'B' in the leftmost part of the
disassembly.

You can have up to 16 breakpoints active at any one time, and you can also set
up break-on-event, as shown in the following picture:

Changing
flags
Unlike several emulations I've seen, Tracey bends over
backwards to prompt you when necessary. You do not have to
remember arcane incantations just to change the Zero
flag...
Tracey prompts you all the
way... In fact, there's a help screen so perhaps all you
really need to remember is to press H when you want help!
Emulation
principles
The main emulation
loop is within wrapper.c. The loop is as
follows:
{
are we
stepping? if so, call Tracey (Tracey
doesn't return until complete)
read byte from
memory, this is the instruction opcode
look up
addressing mode and cycle count for this
instruction
dispatch the
instruction (this means, 'execute' it)
increment
cycle count
patch up after
breakpoint call, if breakpoints active
post-call
Tracey (this method is not used at this
time)
Poll the
hardware devices
If 10240
cycles have elapsed, check for a keypress (this isn't
accurate as cycles are not incremented one by one; and
anyway the kbhit() call is painfully
SLOW)
} repeat
loop
That is it in a
nutshell.
Address
decoding
The address
decoding attempts to mimic the soft of logic that would be
used on Amélie. It would be simpler (and
faster?) to simply block it as "if between &A000 and &A0FF
then it
is the VIA", but we want to be sure that our memory logic is
viable.
A8 = ( (addr >> 8) & 1
); A9 = ( (addr >> 9)
& 1 ); A13 = ( (addr >> 13) &
1 ); A14 = ( (addr >> 14) & 1
); A15 = ( (addr >> 15) & 1
);
/* RAM or ROM?
*/ wrk = A14
+ A15;
if (wrk ==
0) return RAMSEL; /* !14
& !15 = RAM at &0000 */
if (wrk ==
2) return
ROMSEL; /* 14 & 15 = ROM at &E000
*/
/* Note
that ROM beginning at &E000 is A13 + A14 + A15, so
we work
by
picking up on A14 + A15. For specifics of how the
hardware
implmentation operates, please refer to the addrdecode schematic. */
/* TEST TWO - I/O STUFF [A15 and A13 are
SET, A8 and A9 determine device] */ if (
!A13 || !A15 ) return
0;
wrk = A8 + (A9 <<
1); switch (wrk)
{ case 0 : /* !8 & !9
= VIA at &A000
*/
return VIASEL;
case 1 : /* 8
& !9 = SER at &A100
*/
return SERSEL;
case 2 : /* !8
& 9 = <unused> at &A200
*/
break; /* invalid device, it is an error...
*/
case 3 : /* 8
& 9 = LAT at &A300
*/
return LATSEL; }
What you are
actually looking at here is an optimised software version of
the NAND and AND and 3-to-8 demux. Instead of asking "is (NOT
A14 AND NOT A15)" and then "is (A14 AND A15)", we can add
them, as both have value '1' if active. Therefore RAM (neither
A14 nor A15) will be zero and ROM (A14 and A15) will be
two.
Similar logic is
applied to the I/O selection, though note that this code only
implements a 2-to-4 decode.
True address
decoding
Obviously the
memory decode given above is not optimal as it explains what
we are doing. A few small revisions will shave off some
instructions and cycles from what is a highly important
routine (all memory access goes via
this to decode our source device!)...
int address_decode(unsigned int addr)
{
int A13, A15, wrk = 0;
wrk = ( (addr >> 14) & 1 ) + ( (addr >> 15) & 1 );
if (wrk == 0)
return RAMSEL; /* !14 & !15 = RAM at &0000 (up to max. &3FFF) */
if (wrk == 2)
return ROMSEL; /* 14 & 15 = ROM at &E000 (reality is &C000 onwards) */
A13 = ( (addr >> 13) & 1 ); /* only compute A13/A15 when we need it */
A15 = ( (addr >> 15) & 1 );
if ( !A13 || !A15 )
return 0; /* &4000 to &9FFF (24K) currently unaddressable */
wrk = ( ((addr >> 8) & 1) + ((addr >> 8) & 2) );
switch (wrk)
{
case 0 : /* !8 & !9 = VIA at &A000 */
return VIASEL;
case 1 : /* 8 & !9 = SER at &A100 */
return SERSEL;
case 2 : /* !8 & 9 = <unused> at &A200 */
break; /* invalid device, it is an error... */
case 3 : /* 8 & 9 = LAT at &A300 */
return LATSEL;
}
return 0;
}
Addressing
mode lookup
Basically two 256
byte tables. The instruction is an offset into the table. It
can be expressed beautifully in ARM code:
lookup_opcode
; ON ENTRY:
; R0 =
Opcode
; R1 = Pointer to two-word block for opcode
information
; R2 = Offset
pointer
; R3 = Value read
ADR R2,
datablock ; set up
pointer
LDRB R3, [R2,
R0] ; read addressing
mode (via datablock + opcode
)
STR R3, [R1,
#0]
ADD R2, R2,
#256 ; reposition to
second table
LDRB R3, [R2,
R0] ; read cycle
count
STR R3, [R1,
#4]
MOV PC, R14
The &xB
instructions are undefined on the NMOS 6502, so have been used
to implement various emulator-specific instructions. If you
wish to remove this functionality (perhaps to add 65C(E)02
instructions, please be aware that the breakpoint system uses
one of these instructions!).
Instruction
dispatch
We have the
instruction opcode. So which instruction is this?
The dispatch has
been implemented as a big "select" structure listing
all 256 possible opcodes, trusting that the compiler
can do a good job of making optimised code. The worst
non-optimal case would be:
if (opcode
== 0) { opcode_brk(); return; } if (opcode
== 1) { opcode_ora(); return; } [...] /*
else
*/
opcode_err(); return;
A better option
would be a jump table. Acorn C v5.51 and TurboC v2.01 and
TurboC++ v1.0 all do this as it is the sensible approach - you
don't need to perform 255 tests to reach the 256th
element.
Unfortunately,
there isn't much you can do about how crap the x86 processor
is, so here is an example of it. This code loads a
pre-computed address from an array, so it is a jump table in
the true sense of the word.
push
bp
mov bp,
sp
mov bx,word ptr
[bp+4]
cmp
bx,255
jbe
@@0
jmp
@1@3890 @@0:
shl
bx,1
jmp word ptr
cs:@1@C15044[bx]
The jump table
itself looks like:
@1@C14538 label
word
dw
@1@98
dw
@1@122
and each branch
point looks like:
@1@98:
call near ptr
_opcode_brk
jmp
@1@3914 @1@122:
call near ptr
_opcode_ora
jmp @1@3914
It is almost
a spiritual event working with the ARM processor. The instruction
positionings are fixed at a "word" of four bytes. You can
randomly disassemble anything as a new word is a new
instruction. The side effect of this is we can dispense
with the actual jump table and use this knowledge to poke a
new value directly into the Program Counter, as
follows:
CMP
a1,#&ff
ADDLS pc,pc,a1,LSL
#2
B
|L000818.J164.dispop|
B
|L00081c.J163.dispop|
[...] |L000818.J164.dispop|
B
opcode_brk |L00081c.J163.dispop|
B
opcode_ora
This is oh-so-close
. It would have been really great if the compiler
had realised that B ..J164.dispop
-> B opcode_brk is actually the same thing as
calling opcode_brk directly.
As a side effect, note that no registers
are corrupted for this to work.
Here is my
hand-crafted dispatch code:
CMP R0, #((dispatch_endoftable -
dispatch_table) /
4)
ADDCC PC, PC, R0, LSL
#2
B opcode_inv
dispatch_table
; row 0
B
opcode_brk
B
opcode_ora
[...] displatch_endoftable
Processor
'internals'
To be
described...
Device
polling
To be
described...
Breakpoints
(implementation)
To be
described...
Known
emulation faults
-
6502 CPU
core
-
Minimal NMI support
(Amélie doesn't use NMIs - NMIvec
points to RSTvec)
-
No "BCD" maths
mode
-
No support for 'undocumented'
side-effects in the NMOS
version of the 6502; except for known processor
bugs
-
Basic cycle counting
- does not include "additional" cycles
-
May or may not fully
support all of the CPU bugs (these need to be enabled,
then the core recompiled)
-
6522 VIA
core
-
No
Timer2
-
Timer1 only works in
basic modes (single-shot and countdown, without
PB7)
-
No support for
serial shifting
-
No support for
automatic handshaking
-
Only generates IRQs
for Timer1, CAx and CBx events
-
unfinished
-
6551 ACIA
core
-
Latch
Just show me the code!
As AmélieEm has not been
finished, sources are not available.
The code is
written in plain C, with C style comments.
While much of
AmélieEm is "portable", the user interface parts rely
heavily on conio.h and dos.h which means
that at this time only a 16-bit MS-DOS version
is available.
AmélieEm compiles on these
systems:
-
16-bit
DOS (all versions of MS-DOS) =
TurboC++ v1.0
The project files supplied are for
use with TurboC++, which is downloadable from Borland
(look for the museum).
-
26/32 neutral (all
versions of RISC OS) = Acorn
C/C++ compiler v5.xx Also available is a RISC
OS port. While it is designed for use with the newer
compiler in a PSR-neutral mode, I anticipate that it
shouldn't be too difficult to jiggle to code for
most compilers. The older Norcroft will be the easiest to
work with as it is fundamentally the same. The
conio and dos implementations make heavy
use of _kernel calls, so how it compiles with the
likes of EasyC or the RISC OS build of
gcc depends more upon the libraries
supplied...
For various
reasons, AmélieEm does not
compile on these systems:
-
TurboC v2.01 -
technical limitation
Tracey's source is larger than the
inbuilt (~64K) limit on source
file size.
-
lcc-win32 v3.8 -
missing resource
We have (mostly?) conio.h but I don't see
any delay() function.
-
OpenWatcom v1.2 -
sort of works, but no benefit
The required parts appear to be present
(more-or-less), but you can't use them in a 32-bit console
application, so compiling to a 16-bit application is
unlikely to offer anything over
TurboC++.
-
RISC OS, Unix,
Mac, etc etc... - you'll need to "roll your
own"
Find conio.h and make
a delay() routine, and you might be in with
a chance... :-)
If you need
any help with
AmélieEm's code,
feel free to
contact me.
Modules
acia.c
is the ACIA (serial port) emulation.
addrdeco.c simply decodes the
address given to be a device ID.
appsys.c is a front-end
that operates in an application-specific way. The supplied
front-end provides "RickBot" functionality. This should be
changed if you are implementing something else, such as a
central heating controller.
breakpt.c handles the
breakpoints.
conio.c (non-DOS
only) provides an 'emulation' of the TurboC "conio"
library.
dispatch.c is the processor
instruction dispatcher. You may find benefits if you replace
this with some optimised code; I have written a fast ARM
version. Sorry, I don't speak x86.
dos.c (non-DOS only)
provides an 'emulation' of the TurboC "dos" library and some
hardware functions.
latch.c
is the latch emulation.
lookup.c is the part that looks
up cycle count and addressing mode for each instruction. As
with dispatch.c, you can probably write more optimal
code than your compiler in this instance...
memory.c handles all reading and
writing from memory. This is a candidate for assemblerisation,
but it may be quite involved.
opcode.c is the core of the 6502
processor emulation; and it probably needs to be rewritten to run at a
decent speed. romram.c is a short module that
allocates memory for the RAM area and the ROM area; and pokes
some special values into the
RAM area so that the emulator may be detected programmatically,
if required. tracey.c contains all of
Tracey's code, which is why it
is huge! via.c is the 6522
VIA emulation. wrapper.c is the entry point. It
organises initialisation and then runs the main execution
loop.
Release
notes
AmélieEm is not yet 'finished', nor
has it really been tested, so I have nothing
to add at this
time.
|