The Amélie project: Information: AmélieEm

The Amélie project

Information - AmélieEm

The bad news first...

Due to the heavy dependencies on conio and the use of delay(), this code will only run under 16-bit DOS. I looked to both lcc and OpenWatcom and one has a conio that crashed when used as a 32-bit character mode application, and the other didn't have it.

So, for now, unless somebody can point me to conio that works comprehensively in a 32-bit .exe in text mode (graphics not required), AmélieEm will remain old-style-DOS.

This, by the way, means that a RISC OS conversion will not be along any time soon.

Introduction

As I do not, currently, have an EPROM eraser I figured it might be better to write and test Amélie's BIOS and application code in the software domain. Besides, writing an emulator sounded like fun.

AmélieEm initialising.

The picture above is what AmélieEm says when it is initialising, in case you ever wondered.. This stage will take a split second if you are loading AmélieEm from a harddisc. The next thing you will see is Tracey:

Tracey, within AmélieEm.

Get to know Tracey, she is very versatile. No, she isn't named after my girlfriend - it is from "tracing mode". I'm a geek, remember?

The screen is split into three sections:

The top displays a disassembly of the current instructions on the left.
Cursor down will go to the next instruction, but cursor up will back up one byte. Sorry, but the 6502 isn't a word-aligned processor.
In the middle is the complete status of the processor registers, plus the addressing mode of the currently selected instruction. Ignore "mem" and "tmp". These instructions are used internally in the emulation (refer to the source if you want to know what for).
On the right, the final 16 bytes of the software stack (this is fixed) and the value of the stack pointer.

The middle of the screen is reserved for I/O emulation status. At this point, perhaps the only useful part is the slightly inaccurate cycle counter (it does not add extra cycles for page boundaries being crossed).

The bottom of the screen serves as a 64 byte dump of memory and the command line.
Page Up and Page Down can be used to scroll through the memory. The dump will 'wrap around', and any unused or unallocated addresses will be seen as '00'.

Tracey is versatile. You can alter most aspects of the system here - including poking around in memory (including EPROM!). Pressing RETURN will allow you to single-step instructions. Leaving Tracey will let the emulation run at full speed. You can set up "breakpoints" which will cause Tracey to reappear just before a specific instruction is executed.

Let's set up a breakpoint now. You would press B (for Breakpoint). The command line changes to:

We want to set a breakpoint, so press S, and then type in the desired address F817 . The command line will look like:

Press Return to set it. You can tell breakpoints by the red highlight and the 'B' in the leftmost part of the disassembly.

Setting a breakpoint, part 3.

You can have up to 16 breakpoints active at any one time.

Unlike several emulations I've seen, Tracey bends over backwards to prompt you when necessary. You do not have to remember arcane incantations just to change the Zero flag...

Simply press S (for Set)

Then press F (for Flag)

Then press Z (for Zero)

And finally press F (for False)

You can see here that the 'Z' flag is now in lower case (this means that it is unset).
Setting a processor flag, step 4.

Tracey prompts you all the way...

Emulation principles

The main emulation loop is within wrapper.c. The loop is as follows:

{

are we stepping? if so, call Tracey
(Tracey doesn't return until complete)

read byte from memory, this is the instruction opcode

look up addressing mode and cycle count for this instruction

dispatch the instruction (this means, 'execute' it)

increment cycle count

patch up after breakpoint call, if breakpoints active

post-call Tracey (this method is not used at this time)

Poll the hardware devices

If 10240 cycles have elapsed, check for a keypress
(this isn't accurate as cycles are not incremented one by one; and anyway the kbhit() call is painfully SLOW)

} loop

That is it in a nutshell.

Address decoding

The address decoding attempts to mimic the soft of logic that would be used on Amélie. It would be simpler (and faster?) to simply block it as "if between &A000 and &A0FF then it is the VIA", but we want to be sure that out memory logic is viable.

   A8 = ( (addr >> 8) & 1 );
   A9 = ( (addr >> 9) & 1 );
   A13 = ( (addr >> 13) & 1 );
   A14 = ( (addr >> 14) & 1 );
   A15 = ( (addr >> 15) & 1 );

   /* RAM or ROM? */
   wrk = A14 + A15;

   if (wrk == 0)
      return RAMSEL; /* !14 & !15 = RAM at &0000 */

   if (wrk == 2)
      return ROMSEL; /* 14 & 15 = ROM at &F000 */

   /* TEST TWO - I/O STUFF [A15 and A13 are SET, A8 and A9 determine device] */
   if ( !A13 || !A15 )
      return 0;

   wrk = A8 + (A9 << 1);
   switch (wrk)
   {
      case 0 : /* !8 & !9 = VIA at &A000 */
               return VIASEL;

      case 1 : /* 8 & !9 = SER at &A100 */
               return SERSEL;

      case 2 : /* !8 & 9 = <unused> at &A200 */
               break; /* invalid device, it is an error... */

      case 3 : /* 8 & 9 = LAT at &A300 */
               return LATSEL;
   }

What you are actually looking at here is an optimised software version of the NAND and AND and 3-to-8 demux. Instead of asking "is (NOT A14 AND NOT A15)" and then "is (A14 AND A15)", we can add them, as both have value '1' if active. Therefore RAM (neither A14 nor A15) will be zero and ROM (A14 and A15) will be two.

Similar logic is applied to the I/O selection, though note that this code only implements a 2-to-4 decode.

Addressing mode lookup

Basically two 256 byte tables. The instruction is an offset into the table. It can be expressed beautifully in ARM code:

lookup_opcode
        ; ON ENTRY:
        ;   R0 = Opcode
        ;   R1 = Pointer to two-word block for opcode information
        ;   R2 = Offset pointer
        ;   R3 = Value read

        ADR     R2, datablock      ; set up pointer
        LDRB    R3, [R2, R0]       ; read addressing mode (via datablock + opcode )
        STR     R3, [R1, #0]
        ADD     R2, R2, #256       ; reposition to second table
        LDRB    R3, [R2, R0]       ; read cycle count
        STR     R3, [R1, #4]
        MOV     PC, R14

The &xB instructions are undefined on the NMOS 6502, so have been used to implement various emulator-specific instructions. If you wish to remove this functionality (perhaps to add 65C(E)02 instructions, please be aware that the breakpoint system uses one of these instructions!).

Instruction dispatch

We have the instruction opcode. So which instruction is this?

The dispatch has been implemented as a big "select" structure listing all 256 possible opcodes, trusting that the compiler can do a good job of making optimised code. The worst non-optimal case would be:

if (opcode ==   0) { opcode_brk(); return; }
if (opcode ==   1) { opcode_ora(); return; }
[...]
/* else */           opcode_err(); return;

A better option would be a jump table. Acorn C v5.51 and TurboC v2.01 and TurboC++ v1.0 all do this as it is the sensible approach - you don't need to perform 255 tests to reach the 256th element.

Unfortunately, there isn't much you can do about how crap the x86 processor is, so here is an example of it. This code loads a pre-computed address from an array, so it is a jump table in the true sense of the word.

        push    bp
        mov     bp, sp
        mov     bx,word ptr [bp+4]
        cmp     bx,255
        jbe     @@0
        jmp     @1@3890
@@0:
        shl     bx,1
        jmp     word ptr cs:@1@C15044[bx]

The jump table itself looks like:

@1@C14538 label word
dw @1@98
dw @1@122

and each branch point looks like:

@1@98:
        call    near ptr _opcode_brk
        jmp     @1@3914
@1@122:
        call    near ptr _opcode_ora
        jmp     @1@3914

It is almost a sexual event working with the ARM processor. The instruction positionings are fixed at a "word" of four bytes. You can randomly disassemble anything as a new word is a new instruction.
The side effect of this is we can dispense with the actual jump table and use this knowledge to poke a new value directly into the Program Counter, as follows:

        CMP      a1,#&ff
        ADDLS    pc,pc,a1,LSL #2
        B        |L000818.J164.dispop|
        B        |L00081c.J163.dispop|
        [...]
|L000818.J164.dispop|
        B        opcode_brk
|L00081c.J163.dispop|
        B        opcode_ora

This is oh-so-close. It would have been really great if the compiler had realised that B ..J164.dispop -> B opcode_brk is actually the same thing as calling opcode_brk directly. As a side effect, note that no registers are corrupted for this to work.

Here is my hand-crafted dispatch code:

        CMP     R0, #((dispatch_endoftable - dispatch_table) / 4)
        ADDCC   PC, PC, R0, LSL #2
        B       opcode_inv

dispatch_table
        ; row 0
        B       opcode_brk
        B       opcode_ora
        [...]
displatch_endoftable

Processor 'internals'

To be described...

Device polling

To be described...

Breakpoints

To be described...

Known emulation faults

6502 CPU core
- Minimal NMI support (Amélie doesn't use NMIs)
- No "BCD" maths mode
- No support for 'undocumented' side-effects in the NMOS version of the 6502
- Basic cycle counting - does not include "additional" cycles
- May or may not fully support all of the CPU bugs (these need to be enabled, then the core recompiled)
6522 VIA core
- No Timer2
- Timer1 only works in basic modes (single-shot and countdown, without PB7)
- No support for serial shifting
- No support for automatic handshaking
- Only generates IRQs for Timer1, CAx and CBx events
- unfinished
6551 ACIA core
- not yet written
Latch
- I don't anticipate any problems with this...

Just show me the code!

The code is written in plain C, with C style comments.

While much of AmélieEm is "portable", the user interface parts rely heavily on conio.h and dos.h which means that at this time only a 16-bit MS-DOS version is available.

AmélieEm compiles on these systems:

16-bit DOS (all versions of MS-DOS) = TurboC++ v1.0
The project files supplied are for use with TurboC++, which is downloadable from Borland (look for the museum).

For various reasons, AmélieEm does not compile on these systems:

TurboC v2.01
Tracey's source is larger than the inbuilt (~64K) limit on source file size.

lcc-win32 v3.8
We have (mostly?) conio.h but I don't see any delay() function.

OpenWatcom v1.2
The requires parts appear to be present (more-or-less), but you can't use them in a 32-bit console application, so compiling to a 16-bit application is unlikely to offer anything over TurboC++.

RISC OS, Unix, Mac, etc etc...
Find conio.h and make a delay() routine, and you might be in with a chance... :-)

If you need any help with AmélieEm's code, feel free to contact me.

Modules

addrdeco.c simply decodes the address given to be a device ID.

breakpt.c handles the breakpoints.

dispatch.c is the processor instruction dispatcher. You may find benefits if you replace this with some optimised code; I have written a fast ARM version. Sorry, I don't speak x86.

lookup.c is the part that looks up cycle count and addressing mode for each instruction. As with dispatch.c, you can probably write more optimal code than your compiler in this instance...

memory.c handles all reading and writing from memory. This is a candidate for assemblerisation, but it may be quite involved.

opcode.c is the core of the 6502 processor emulation.

romram.c is a short module that allocates memory for the RAM area and the ROM area.

tracey.c contains all of Tracey's code, which is why it is huge!

via.cis the 6522 VIA emulation.

wrapper.c is the entry point. It organises initialisation and then runs the main execution loop.

Release notes

AmélieEm is not yet 'finished', nor has it really been tested, so I have nothing to add at this time.