overview components memorymap downloads possibilities contact
 
construction pinouts 6502asm amelieem possibilities
The Amélie project
Information - AmélieEm

The bad news first...

Due to the heavy dependencies on conio and the use of delay(), this code will only run under 16-bit DOS. I looked to both lcc and OpenWatcom and one has a conio that crashed when used as a 32-bit character mode application, and the other didn't have it.

So, for now, unless somebody can point me to conio that works comprehensively in a 32-bit .exe in text mode (graphics not required), AmélieEm will remain old-style-DOS.

There is a basic, hacky, RISC OS conversion available too... :-)

 

Introduction

As I did not have an EPROM eraser I figured it might be better to write and test Amélie's BIOS and application code in the software domain. Besides, writing an emulator sounded like fun.

AmélieEm initialising.

The picture above is what AmélieEm says when it is initialising, in case you ever wondered.. This stage will take a split second if you are loading AmélieEm from a harddisc. The next thing you will see is Tracey:

AmélieEm running in 'Tracey' mode.

Get to know Tracey, she is very versatile. No, she isn't named after my girlfriend - it is from "tracing mode". I'm a geek, remember?

 

How the display is laid out

The screen is split into three sections:

  • The top displays a disassembly of the current instructions on the left.
    Cursor down will go to the next instruction, but cursor up will back up one byte. Sorry, but the 6502 isn't a word-aligned processor.
    In the middle is the complete status of the processor registers, plus the addressing mode of the currently selected instruction. Ignore "mem" and "tmp". These instructions are used internally in the emulation (refer to the source if you want to know what for).
    On the right, the final 16 bytes of the software stack (this is fixed) and the value of the stack pointer.
     
  • The middle of the screen is reserved for I/O emulation status. Here you can see the important VIA registers/status, and the slightly inaccurate cycle counter (it does not add extra cycles for page boundaries being crossed).
     
  • The bottom of the screen serves as a 64 byte dump of memory and the command line.
    Page Up and Page Down can be used to scroll through the memory. The dump will 'wrap around', and any unused or unallocated addresses will be seen as '00'.

Tracey is versatile. You can alter most aspects of the system here - including poking around in memory (including EPROM!). Pressing RETURN will allow you to single-step instructions. Leaving Tracey will let the emulation run at full speed, which isn't terribly fast - some day I will look to reworking the 6502 core.

 

Breakpoints

You can set up "breakpoints" which will cause Tracey to reappear just before a specific instruction is executed.

Let's set up a breakpoint now. You would press B (for Breakpoint). The command line changes to:

Setting a breakpoint, part 1.

We want to set a breakpoint, so press press S . The command line will change to:

Setting a breakpoint, part 2.

We want to set a breakpoint on an address , so press A, and then type in the desired address F817 . The command line will look like:

Setting a breakpoint, part 3.

Press Return to set it. You can tell breakpoints by the red highlight and the 'B' in the leftmost part of the disassembly.

Setting a breakpoint, part 4.

You can have up to 16 breakpoints active at any one time, and you can also set up break-on-event, as shown in the following picture:

Break-on-event options.

 

Changing flags

Unlike several emulations I've seen, Tracey bends over backwards to prompt you when necessary. You do not have to remember arcane incantations just to change the Zero flag...

Simply press S (for Set) Setting a processor flag, step 1.
Then press F (for Flag) Setting a processor flag, step 2.
Then press Z (for Zero)  Setting a processor flag, step 3.

And finally press F (for False)

You can see here that the 'Z' flag is now in lower case (this means that it is unset).

Setting a processor flag, step 4.

Tracey prompts you all the way...
In fact, there's a help screen so perhaps all you really need to remember is to press H when you want help!

 

Emulation principles

The main emulation loop is within wrapper.c. The loop is as follows:

{

are we stepping? if so, call Tracey
(Tracey doesn't return until complete)

read byte from memory, this is the instruction opcode

look up addressing mode and cycle count for this instruction

dispatch the instruction (this means, 'execute' it)

increment cycle count

patch up after breakpoint call, if breakpoints active

post-call Tracey (this method is not used at this time)

Poll the hardware devices

If 10240 cycles have elapsed, check for a keypress
(this isn't accurate as cycles are not incremented one by one; and anyway the kbhit() call is painfully SLOW)

} repeat loop

That is it in a nutshell.

 

Address decoding

The address decoding attempts to mimic the soft of logic that would be used on Amélie. It would be simpler (and faster?) to simply block it as "if between &A000 and &A0FF then it is the VIA", but we want to be sure that our memory logic is viable.

   A8  = ( (addr >>  8) & 1 );
   A9  = ( (addr >>  9) & 1 );
   A13 = ( (addr >> 13) & 1 );
   A14 = ( (addr >> 14) & 1 );
   A15 = ( (addr >> 15) & 1 );

   /* RAM or ROM? */
   wrk = A14 + A15;

   if (wrk == 0)
      return RAMSEL; /* !14 & !15 = RAM at &0000 */

   if (wrk == 2)
      return ROMSEL; /*  14 &  15 = ROM at &E000 */

      /* Note that ROM beginning at &E000 is A13 + A14 + A15, so we work by
         picking up on A14 + A15. For specifics of how the hardware
         implmentation operates, please refer to the addrdecode schematic. */



   /* TEST TWO - I/O STUFF [A15 and A13 are SET, A8 and A9 determine device] */
   if ( !A13 || !A15 )
      return 0;

   wrk = A8 + (A9 << 1);
   switch (wrk)
   {
      case 0 : /* !8 & !9 = VIA at &A000 */
               return VIASEL;

      case 1 : /*  8 & !9 = SER at &A100 */
               return SERSEL;

      case 2 : /* !8 &  9 = <unused> at &A200 */
               break; /* invalid device, it is an error... */

      case 3 : /*  8 &  9 = LAT at &A300 */
               return LATSEL;
   }

What you are actually looking at here is an optimised software version of the NAND and AND and 3-to-8 demux. Instead of asking "is (NOT A14 AND NOT A15)" and then "is (A14 AND A15)", we can add them, as both have value '1' if active. Therefore RAM (neither A14 nor A15) will be zero and ROM (A14 and A15) will be two.

Similar logic is applied to the I/O selection, though note that this code only implements a 2-to-4 decode.

 

True address decoding

Obviously the memory decode given above is not optimal as it explains what we are doing. A few small revisions will shave off some instructions and cycles from what is a highly important routine (all memory access goes via this to decode our source device!)...

   int  address_decode(unsigned int addr)
   {
      int A13, A15, wrk = 0;

      wrk = ( (addr >> 14) & 1 ) + ( (addr >> 15) & 1 );

      if (wrk == 0)
         return RAMSEL; /* !14 & !15 = RAM at &0000 (up to max. &3FFF) */

      if (wrk == 2)
         return ROMSEL; /*  14 &  15 = ROM at &E000 (reality is &C000 onwards) */

      A13 = ( (addr >> 13) & 1 ); /* only compute A13/A15 when we need it */
      A15 = ( (addr >> 15) & 1 );
      if ( !A13 || !A15 )
         return 0; /* &4000 to &9FFF (24K) currently unaddressable */

      wrk = ( ((addr >> 8) & 1) + ((addr >> 8) & 2) );
      switch (wrk)
      {
         case 0 : /* !8 & !9 = VIA at &A000 */
                  return VIASEL;

         case 1 : /*  8 & !9 = SER at &A100 */
                  return SERSEL;

         case 2 : /* !8 &  9 = <unused> at &A200 */
                  break; /* invalid device, it is an error... */

         case 3 : /*  8 &  9 = LAT at &A300 */
                  return LATSEL;
      }

      return 0;
   }

 

Addressing mode lookup

Basically two 256 byte tables. The instruction is an offset into the table. It can be expressed beautifully in ARM code:

lookup_opcode
        ; ON ENTRY:
        ;   R0 = Opcode
        ;   R1 = Pointer to two-word block for opcode information
        ;   R2 = Offset pointer
        ;   R3 = Value read

        ADR     R2, datablock      ; set up pointer
        LDRB    R3, [R2, R0]       ; read addressing mode (via datablock + opcode )
        STR     R3, [R1, #0]
        ADD     R2, R2, #256       ; reposition to second table
        LDRB    R3, [R2, R0]       ; read cycle count
        STR     R3, [R1, #4]
        MOV     PC, R14

The &xB instructions are undefined on the NMOS 6502, so have been used to implement various emulator-specific instructions. If you wish to remove this functionality (perhaps to add 65C(E)02 instructions, please be aware that the breakpoint system uses one of these instructions!).

 

Instruction dispatch

We have the instruction opcode. So which instruction is this?

The dispatch has been implemented as a big "select" structure listing all 256 possible opcodes, trusting that the compiler can do a good job of making optimised code. The worst non-optimal case would be:

if (opcode ==   0) { opcode_brk(); return; }
if (opcode ==   1) { opcode_ora(); return; }
[...]
/* else */           opcode_err(); return;

A better option would be a jump table. Acorn C v5.51 and TurboC v2.01 and TurboC++ v1.0 all do this as it is the sensible approach - you don't need to perform 255 tests to reach the 256th element.
 

Unfortunately, there isn't much you can do about how crap the x86 processor is, so here is an example of it. This code loads a pre-computed address from an array, so it is a jump table in the true sense of the word.

        push    bp
        mov     bp, sp
        mov     bx,word ptr [bp+4]
        cmp     bx,255
        jbe     @@0
        jmp     @1@3890
@@0:
        shl     bx,1
        jmp     word ptr cs:@1@C15044[bx]

The jump table itself looks like:

@1@C14538 label word
        dw      @1@98
        dw      @1@122

and each branch point looks like:

@1@98:
        call    near ptr _opcode_brk
        jmp     @1@3914
@1@122:
        call    near ptr _opcode_ora
        jmp     @1@3914

 

It is almost a spiritual event working with the ARM processor. The instruction positionings are fixed at a "word" of four bytes. You can randomly disassemble anything as a new word is a new instruction.
The side effect of this is we can dispense with the actual jump table and use this knowledge to poke a new value directly into the Program Counter, as follows:

        CMP      a1,#&ff
        ADDLS    pc,pc,a1,LSL #2
        B        |L000818.J164.dispop|
        B        |L00081c.J163.dispop|
        [...]
|L000818.J164.dispop|
        B        opcode_brk
|L00081c.J163.dispop|
        B        opcode_ora

This is oh-so-close . It would have been really great if the compiler had realised that B ..J164.dispop -> B opcode_brk is actually the same thing as calling opcode_brk directly. As a side effect, note that no registers are corrupted for this to work.

Here is my hand-crafted dispatch code:

        CMP     R0, #((dispatch_endoftable - dispatch_table) / 4)
        ADDCC   PC, PC, R0, LSL #2
        B       opcode_inv

dispatch_table
        ; row 0
        B       opcode_brk
        B       opcode_ora
        [...]
displatch_endoftable

 

Processor 'internals'

To be described...

 

Device polling

To be described...

 

Breakpoints (implementation) 

To be described...

 

Known emulation faults

  • 6502 CPU core
    • Minimal NMI support (Amélie doesn't use NMIs - NMIvec points to RSTvec)
    • No "BCD" maths mode
    • No support for 'undocumented' side-effects in the NMOS version of the 6502; except for known processor bugs
    • Basic cycle counting - does not include "additional" cycles
    • May or may not fully support all of the CPU bugs (these need to be enabled, then the core recompiled)
       
  • 6522 VIA core
    • No Timer2
    • Timer1 only works in basic modes (single-shot and countdown, without PB7)
    • No support for serial shifting
    • No support for automatic handshaking
    • Only generates IRQs for Timer1, CAx and CBx events
    • unfinished
       
  • 6551 ACIA core
    • extremely basic
       
  • Latch
    • I don't anticipate any problems with this...

 

Just show me the code!

As AmélieEm has not been finished, sources are not available.

The code is written in plain C, with C style comments.

While much of AmélieEm is "portable", the user interface parts rely heavily on conio.h and dos.h which means that at this time only a 16-bit MS-DOS version is available.

AmélieEm compiles on these systems:

    • 16-bit DOS (all versions of MS-DOS) = TurboC++ v1.0
      The project files supplied are for use with TurboC++, which is downloadable from Borland (look for the museum).
       
    • 26/32 neutral (all versions of RISC OS) = Acorn C/C++ compiler v5.xx
      Also available is a RISC OS port. While it is designed for use with the newer compiler in a PSR-neutral mode, I anticipate that it shouldn't be too difficult to jiggle to code for most compilers. The older Norcroft will be the easiest to work with as it is fundamentally the same. The conio and dos implementations make heavy use of _kernel calls, so how it compiles with the likes of EasyC or the RISC OS build of gcc depends more upon the libraries supplied...

For various reasons, AmélieEm does not compile on these systems:

    • TurboC v2.01 - technical limitation
      Tracey's source is larger than the inbuilt (~64K) limit on source file size.
       
    • lcc-win32 v3.8 - missing resource
      We have (mostly?) conio.h but I don't see any delay() function.
       
    • OpenWatcom v1.2 - sort of works, but no benefit
      The required parts appear to be present (more-or-less), but you can't use them in a 32-bit console application, so compiling to a 16-bit application is unlikely to offer anything over TurboC++.
       
    • RISC OS, Unix, Mac, etc etc... - you'll need to "roll your own"
      Find conio.h and make a delay() routine, and you might be in with a chance... :-)

If you need any help with AmélieEm's code, feel free to contact me.

 

Modules

acia.c is the ACIA (serial port) emulation.

addrdeco.c simply decodes the address given to be a device ID.

appsys.c is a front-end that operates in an application-specific way. The supplied front-end provides "RickBot" functionality. This should be changed if you are implementing something else, such as a central heating controller.

breakpt.c handles the breakpoints.

conio.c (non-DOS only) provides an 'emulation' of the TurboC "conio" library.

dispatch.c is the processor instruction dispatcher. You may find benefits if you replace this with some optimised code; I have written a fast ARM version. Sorry, I don't speak x86.

dos.c (non-DOS only) provides an 'emulation' of the TurboC "dos" library and some hardware functions.

latch.c is the latch emulation.

lookup.c is the part that looks up cycle count and addressing mode for each instruction. As with dispatch.c, you can probably write more optimal code than your compiler in this instance...

memory.c handles all reading and writing from memory. This is a candidate for assemblerisation, but it may be quite involved.

opcode.c is the core of the 6502 processor emulation; and it probably needs to be rewritten to run at a decent speed.

romram.c is a short module that allocates memory for the RAM area and the ROM area; and pokes some special values into the RAM area so that the emulator may be detected programmatically, if required.

tracey.c contains all of Tracey's code, which is why it is huge!

via.c is the 6522 VIA emulation.

wrapper.c is the entry point. It organises initialisation and then runs the main execution loop.

 

Release notes

AmélieEm is not yet 'finished', nor has it really been tested, so I have nothing to add at this time.

© 2007 Rick Murray