Processor setup via co-processor 15
and about co-processors

Introduction

ARM processors after (and including) the ARM 3 offer various ID and internal configuration facilities by providing internally a co-processor 15 which you can read from and and write to.

The setup is controlled by co-processor 15 registers, accessed with MRC and MCR in non-user mode.

These registers are particular to the processor specified.

ARM 3

  Bits  0 -  7  Revision of processor
  Bits  8 - 15  Should be '3', identifying processor as an ARM3
  Bits 16 - 23  Manufacturer code (&56 = VLSI Technology Inc.)
  Bits 24 - 31  Designer code (&41 = ARM Ltd)

Register 1 - Cache flush (write only)
Write-sensitive, writing anything to register 1 will cause the cache to be flushed.

  Bit 0 - Turns the cache on (1) or off (0)
  Bit 1 - Determines if user mode and non-user mode use the same address
          mapping. 1 if they do, or 0. Should be 1 for use with MEMC.
  Bit 2 - 0 for normal operation, 1 for special monitor mode (processor
          runs at memory speed and address/data always put on external
          pins even if data fetched from cache - for logic analyser
          to trace the program properly).

  Other bits reserved.

  Bit 0  - 1 if virtual addresses &0000000-&01FFFFF are cachable, 0 if not
  Bit 0  - 1 if virtual addresses &0200000-&03FFFFF are cachable, 0 if not
  ...
  Bit 31 - 1 if virtual addresses &3E00000-&3FFFFFF are cachable, 0 if not

Register 4 - Which areas are updateable
Controls which areas of memory are updateable, in 2Mb chunks. Writes to non-updateable memory go to the real memory, not the cache. This is suitable for things like ROMs, since you don't want the cached data to be altered by attempted writes.
```
  Bit 0  - 1 if virtual addresses &0000000-&01FFFFF are updateable, 0 if not
  Bit 0  - 1 if virtual addresses &0200000-&03FFFFF are updateable, 0 if not
  ...
  Bit 31 - 1 if virtual addresses &3E00000-&3FFFFFF are updateable, 0 if not
```
Register 5 - Which areas are disruptive
Controls which areas of memory are disruptive, in 2Mb chunks. Writes to disruptive areas of memory cause the cache to be flushed. For example, writing to physical memory at &2000000-&2FFFFFF on an MEMC system will usually cache virtually addresses memory and if this location was cached, an attempt to read it would read back the old contents.
```
  Bit 0  - 1 if virtual addresses &0000000-&01FFFFF are disruptive, 0 if not
  Bit 0  - 1 if virtual addresses &0200000-&03FFFFF are disruptive, 0 if not
  ...
  Bit 31 - 1 if virtual addresses &3E00000-&3FFFFFF are disruptive, 0 if not
```

Register 2 is set to zero after power-up, and registers 3-5 are undefined. The registers 3-5 should be set up correctly before the cache is switched on. You should always check the processoridentity before setting up the registers, unless you are completely certain your code will only ever be executed on an ARM3 processor.

ARM 610

  Bits  0 -  7  Revision of processor (&1x)
  Bits  8 - 15  Processor identity
  Bits 16 - 23  Manufacturer code (&56 = VLSI Technology Inc.)
  Bits 24 - 31  Designer code (&41 = ARM Ltd)

  Bit  0 - On-chip MMU turned off (0) or on (1)
  Bit  1 - Address alignment fault disabled (0) or enabled (1)
  Bit  2 - Instruction/data cache turned off (0) or on (1)
  Bit  3 - Write buffer turned off (0) or on (1)
  Bit  4 - 26 bit program space if 0, 32 bit program space if 1
  Bit  5 - 26 bit data space if 0, 32 bit data space if 1
  Bit  6 - Early abort mode if 0, late abort mode if 1
  Bit  7 - Little-endian operation if 0, big-endian if 1
  Bit  8 - System bit - controls the ARM610 permission system

Register 2 - Translation Table Base (write only)
Bits 14-31 hold the base of the currently active Level One page table.

Register 3 - Domain Access Control (write only)
This register holds the current access control for domains 0 to 15. Each domain has two bits (domain 0 bits 0,1 ... domain 15 bits 30,31) which may be set as follows:

  00  No Access - Domain fault generated if tried to access
  01  Client    - Accesses are checked against permission bits in
                  section/page descriptor
  10  Reserved  - Currently behaves like no access mode
  11  Manager   - Accesses are NOT checked, permission faults cannot
                  be generated

Register 4 - Reserved - do not attempt to access
Register 5 - Page fault status / TLB flush
When reading, this holds the status of the last data fault (not updated for pre-fetch fault). Only the bottom byte is of significance.
```
  Bits  0 -  3  Status
  Bits  4 -  7  Domain
  Bits  8 - 11  Set to zero
  Bits 12 - 31  Whatever was the last value on the internal data bus
```
When writing to this register, any value written will cause the Translation Look-aside Buffer to be flushed.
Register 6 - Data fault address / TLB purge
When reading this register, you can determine the virtual address of the last page fault.

When writing this register, the value given (in bits 14-31) is treated as an address. The TLB will be searched for a corresponding address and if it is found, it is marked as invalid. This is to allow the page table in main memory to be updated and the now-invalid entries in the on-chip TLB to be purged without assuming the penalty of flushing the entire TLB.
Register 7 - IDC flush (write only)
Any data written to this location will cause the IDC (Instruction/Data cache) to be flushed.
Registers 8 to 15 - Reserved
Accessing these registers will cause the undefined instruction trap to be taken.

ARM 710

This is similar to the ARM610.

  Bits  0 -  3  Revision of processor?
  Bits  3 - 15  Processor identity - &710
  Bits 16 - 23  Manufacturer code
  Bits 24 - 31  Designer code (&41 = ARM Ltd)

  Bit  0 - On-chip MMU turned off (0) or on (1)
  Bit  1 - Address alignment fault disabled (0) or enabled (1)
  Bit  2 - Instruction/data cache turned off (0) or on (1)
  Bit  3 - Write buffer turned off (0) or on (1)
  Bit  4 - 26 bit program space if 0, 32 bit program space if 1
  Bit  5 - 26 bit data space if 0, 32 bit data space if 1
  Bit  6 - Early abort mode if 0, late abort mode if 1
  Bit  7 - Little-endian operation if 0, big-endian if 1
  Bit  8 - System bit - controls the ARM710 permission system
  Bit  9 - ROM bit - controls the ARM710 permission system

Register 2 - Translation Table Base (write only)
Bits 14-31 hold the base of the currently active Level One page table.

  00  No Access - Domain fault generated if tried to access
  01  Client    - Accesses are checked against permission bits in
                  section/page descriptor
  10  Reserved  - Currently behaves like no access mode
  11  Manager   - Accesses are NOT checked, permission faults cannot
                  be generated

Register 4 - Reserved - do not attempt to access
Register 5 - Page fault status / TLB flush
When reading, this holds the status of the last data fault (not updated for pre-fetch fault). Only the bottom byte is of significance.
```
  Bits  0 -  3  Status
  Bits  4 -  7  Domain
  Bits  8 - 11  Set to zero
  Bits 12 - 31  Whatever was the last value on the internal data bus
```
When writing to this register, any value written will cause the Translation Look-aside Buffer to be flushed.
Register 6 - Data fault address / TLB purge
When reading this register, you can determine the virtual address of the last page fault.

When writing this register, the value given (in bits 14-31) is treated as an address. The TLB will be searched for a corresponding address and if it is found, it is marked as invalid. This is to allow the page table in main memory to be updated and the now-invalid entries in the on-chip TLB to be purged without assuming the penalty of flushing the entire TLB.
Register 7 - IDC flush (write only)
Any data written to this location will cause the IDC (Instruction/Data cache) to be flushed.
Registers 8 to 15 - Reserved
Accessing these registers will cause the undefined instruction trap to be taken.

ARM 7500

The registers are exactly the same as the ARM710, except the processor ID (register 0) will be different. The datasheet did not specify what should be expected.

ARM 7500FE

The registers are exactly the same as the ARM710, except the processor ID (register 0) will be different. The datasheet did not specify what should be expected, however interrogation of the Bush set-top box reveals &41077100.

StrongARM SA110

Register 0 - Processor identification (read only)
The value returned for an SA110 processor should be &4401A10x.
```
  Bits  0 -  3  Processor revision number
```

  Bit  0 - On-chip MMU turned off (0) or on (1)
  Bit  1 - Address alignment fault disabled (0) or enabled (1)
  Bit  2 - Data cache turned off (0) or on (1)
  Bit  3 - Write buffer turned off (0) or on (1)
  Bit  7 - Little-endian operation if 0, big-endian if 1
  Bit  8 - System bit - controls the MMU permission system
  Bit  9 - ROM bit - controls the MMU permission system
  Bit 12 - Instruction cache turned off (0) or on (1)

Register 2 - Translation Table Base (read/write)
Bits 14-31 hold the base of the currently active Level One page table.
Register 3 - Domain Access Control (read/write)
This register holds the current access control for domains 0 to 15.
The document I have contains no further details, though I would assume it would be similar to the ARM610/710/etc usage.
Register 4 - Reserved - do not attempt to access
Register 5 - Fault status (read/write)
When reading, this holds the status of the last data fault (not updated for pre-fetch fault). Only the bottom byte is of significance.
```
  Bits  0 -  3  Status
  Bits  4 -  7  Domain
  Bit        8  Zero
  Bits  9 - 31  Undefined on read, ignored on write
```
Register 6 - Fault address (read/write)
When reading this register, you can determine the virtual address of the last page fault.

Register 7 - Cache control (write only)
Any data written to this location will cause the selected cache to be flushed.

  The OPC_2 and CRm co-processor fields select which cache
  operation should occur:

    Function         OPC_2    CRm    Data

    Flush I + D      %0000    %0111  -
    Flush I          %0000    %0101  -
    Flush D          %0000    %0110  -
    Flush D single   %0001    %0110  Virtual address
    Clean D entry    %0001    %1010  Virtual address
    Drain write buf. %0100    %1010  -

Register 8 - TLB operations (write only)
Any data written to this location will cause the selected TLB flush operation.

  The OPC_2 and CRm co-processor fields select which cache
  operation should occur:

    Function         OPC_2    CRm    Data

    Flush I + D      %0000    %0111  -
    Flush I          %0000    %0101  -
    Flush D          %0000    %0110  -
    Flush D single   %0001    %0110  Virtual address

Registers 9 to 14 - Reserved
Accessing these registers will cause the undefined instruction trap to be taken.

  The OPC_2 and CRm co-processor fields select the following...

    Function         OPC_2    CRm

    Enable odd word  %0001    %0001
    loading of
    Icache LFSR

    Enable even word %0001    %0010
    loading of
    Icache LFSR

    Clear Icache     %0001    %0100
    LFSR

    Move LFSR to     %0001    %1000
    R14,Abort

    Enable clock     %0010    %0001
    switching

    Disable clock    %0010    %0010
    switching

    Disable nMCLK    %0010    %0100
    output

    Wait for         %0010    %1000
    interrupt

ARM9...XScale

Unfortunately I do not have details of these registers.
Try http://www.arm.com/.

How to read these registers

The code I knocked up for the Bush box processor ID was:

    10 DIM code% 32
    20 P% = code%
    30 [ OPT     3
    40   SWI     "OS_EnterOS"
    50   MRC     CP15, 0, R0, C0, C0
    60   TSTP    PC, #&F0000000
    70   MOV     R0, R0
    80   MOV     PC, R14
    90 ]
   100 PRINT ~USR(code%)

When run, this would print:

   >RUN
   00008FAC                    OPT     3
   00008FAC EF000016           SWI     "OS_EnterOS"
   00008FB0 EE100F10           MRC     CP15, 0, R0, C0, C0
   00008FB4 E31FF20F           TSTP    PC, #&F0000000
   00008FB8 E1A00000           MOV     R0, R0
   00008FBC E1A0F00E           MOV     PC, R14
     41077100
   >

Note that this code must run in a privileged mode.

Co-processors

There are between zero and three possible co-processors. Most desktop ARM systems do not have logic for external co-processors, so we may either use that which is built into the ARM itself, or an emulated co-processor.
CP15 is reserved on the ARM 3 and later processors for internal configuration, as described in this document.
CP0 and CP1 is used by the floating point system. It may either be an external floating point chip (as used with the ARM 3), hardware built into the processor (as in the ARM 7500FE), or a totally software-based emulation (as with the FPEmulator that we all know).

Here is a short exercise for you:

    10 DIM code% 16
    20 P% = code%
    30 [ OPT     3
    40   CDP     CP1, 0, C0, C1, C2, 0
    50   ADFS    F0, F1, F3
    60   MOV     PC, R14
    70 ]
   >RUN
   00008F78                    OPT     3
   00008F78 EE010102           CDP     CP1, 0, C0, C1, C2
   00008F7C EE010102           ADFS    F0, F1, F2
   00008F80 E1A0F00E           MOV     PC, R14
   >

What do you notice? :-)

When the ARM executes a co-processor instruction, or an undefined instruction, it will offer it to any co-processors which may be presently attached. If hardware is available to process the given instruction, then it is expected to do so. If it is busy at the time the instruction is offered, the ARM will wait for it.
If there is no co-processor capable of executing the instruction, the ARM will take its undefined instruction trap, in which case the following will happen:

The PSR and PC are both saved (the method differs for 26 bit and 32 bit ARMs)
SVC mode (26 bit) / UND mode (32 bit) is entered, and the I bit of the PSR is set
The instruction at address &00000004 is executed

This trap may be used to add instructions to the instruction set by emulation, or to implement a software emulation of hardware that isn't fitted. The Floating Point Emulator works by doing this.

To return, simply pull the saved PC and PSR (depends on 26/32 bit) and push them to the current PC and PSR, like MOVS PC, R14 in 26 bit systems. This will pick up with the instruction following the one which caused the trap.

All of the co-processor instructions can be executed conditionally. Please note that the conditionals relate to the status of the ARM processor, and not the status of any of the co-processors. This is because the ARM always tries the instruction first, and offers it around and maybe takes the undefined application trap, so the conditions are ARM related.
To make this clearer:

    10 DIM code% 32
    20 P% = code%
    30 [ OPT     3
    40   FLTS    F0, R0
    50   FLTS    F1, R1
    60   FMLS    F2, F0, F1
    70   FIX     R0, F2
    80   MOVS    PC, R14
    90 ]
   100 INPUT "First number : "A%
   110 INPUT "Second number: "B%
   120 PRINT USR(code%)

This probably won't assemble without an enhanced BASIC assembler.

Anyway, you might think the ARM will hand over to the floating point co-processor to do the four FP instructions, then hand back afterwards.
If you did, you would be incorrect!

What actually is executed is:

   MCR     CP1, 0, R0, C0, C0
   MCR     CP1, 0, R1, C1, C0
   CDP     CP1, 9, C2, C0, C1
   MRC     CP1, 0, R0, C0, C2

It is worth pointing out that objasm specifies co-processor registers using the CR notation (ie, CR0 - CR15), which is first defined with the CN directive. It does not appear as if default co-processor instructions are defined in Nick Roberts' ASM, though I've only looked in the instructions at the "defined symbols" section...
Darren Salt's ExtBASICasm provides the register names C0 - C15 to refer to the co-processors. So if any of these examples fail when you try to assemble them, please check what format your assembler provides these instructions.

MRC

The instruction MRC transfers a co-processor register to an ARM register. It takes the form:

   MRC    <co-pro>, <op>, <ARM reg>, <co-pro reg>, <co-pro reg2>, <op2>

The co-processor is denoted in most assemblers by CPx.
The register <co-pro reg> is written to <ARM reg>, using operation <op>. This may, possibly, be further modified by <co-pro reg2> and <op2>. For an idea of the sorts of times when this might be necessary, consider instructions of the form LDR Ra, [Rb], #x.
The final <op2> may be omitted, as it is in the example, but the other parts of the MRC instruction must be supplied.

MCR

The instruction MCR transfers an ARM register to a co-processor register. It takes the form:

   MCR    <co-pro>, <op>, <ARM reg>, <co-pro reg>, <co-pro reg2>, <op2>

The co-processor is free to interpret the fields as it desires, but the standard interpretation is that the contents of the ARM register are written to the co-processor register using the operation code given, which may be further modified by the second co-processor register and/or the second operation code.

LDC and STC

The instruction LDC loads data from memory into the co-processor register, while STC saves data from a co-processor register to memory.
The ARM should supply the address, the co-processor accepts the data and controls how much is transferred.

   LDC    <co-pro>, <co-pro reg>, <address>
   LDCL   <co-pro>, <co-pro reg>, <address>
   STC    <co-pro>, <co-pro reg>, <address>
   STCL   <co-pro>, <co-pro reg>, <address>

If the 'L' flag is specified, a long transfer is performed. Otherwise a short transfer is performed. The 'L' flag follows the extension, like LDCEQL.
The address is an expression which results in an address being generated, so examples of which are:

   [Rx]
   [Rx, #x] !
   [Rx], #x

These are like those used for the LDR instruction. However they are only eight bits wide and specify word offsets (the ARM types are 12 bit and byte offset).
What happens is the 8 bit unsigned offset is shifted left two bits and added or subtracted from the base register, this may be done before or after the base is used as the transfer address. The new base value can be written back, or left unmodified.
The next difference is that post-indexed addressing requires explicit setting of the W bit of the instruction (unlike LDR/STR which always does it when post-indexed). You set the 'W' bit with the '!' flag, like STR CP0, CR1, [R2, #16]!.
The base register is used for the first transfer. If there are any further transfers, the base will be incremented by one word for each of those additional transfers.

CDP

The instruction CDP instructs the co-processor to do some processing. It takes the form:

   CDP    <co-pro>, <co-pro reg1>, <co-pro reg2>, <co-pro reg3>, <op>

This tells the co-processor to do something. The ARM will not wait for it to finish, nor is any sort of status sent back to the ARM. It is possible for a co-processor to maintain a queue of instructions, allowing it and the ARM to process in parallel.
A variant of this may be obtained with the floating point hardware; while it does not (to my knowledge) support a queue of instructions, it is true that the ARM will await the FPU to finish an operation before providing the next. With careful coding, it would therefore be possible to get the ARM to do some sort of processing (a few instructions) in between sending an instruction to the FPU and reading it's result back.
So instead of:

   FLTE    F0, R0
   FLTE    F1, R1
   MUFE    F2, F0, F1
   FIX     R0, F2
   MOV     R1, #0

you could save a small amount of time with:

   FLTE    F0, R0
   FLTE    F1, R1
   MUFE    F2, F0, F1
   MOV     R1, #0
   FIX     R0, F2

as the FPU could be finishing the MUF while you MOV. The hardware FPU (as in the 7500FE) runs asynchronous - you can switch to synchronous by setting a bit in the FPSR. The software emulation always runs synchronously, and as it uses the ARM in order to emulate the FP instructions, there is no possible advantage to be gained.
Obviously the above example is somewhat contrived. However it is only an example. Real life code, such an an MP3 decoder, could well benefit from careful arrangement of code.

There are no rules for the register types and/or the operation codes. These depend upon the co-processor.

Return to assembler index

Processor setup via co-processor 15 and about co-processors