Processor setup via co-processor 15
and about co-processors



ARM processors after (and including) the ARM 3 offer various ID and internal configuration facilities by providing internally a co-processor 15 which you can read from and and write to.

The setup is controlled by co-processor 15 registers, accessed with MRC and MCR in non-user mode.

These registers are particular to the processor specified.




Register 2 is set to zero after power-up, and registers 3-5 are undefined. The registers 3-5 should be set up correctly before the cache is switched on. You should always check the processoridentity before setting up the registers, unless you are completely certain your code will only ever be executed on an ARM3 processor.



ARM 610



ARM 710

This is similar to the ARM610.



ARM 7500

The registers are exactly the same as the ARM710, except the processor ID (register 0) will be different. The datasheet did not specify what should be expected.



ARM 7500FE

The registers are exactly the same as the ARM710, except the processor ID (register 0) will be different. The datasheet did not specify what should be expected, however interrogation of the Bush set-top box reveals &41077100.



StrongARM SA110




Unfortunately I do not have details of these registers.



How to read these registers

The code I knocked up for the Bush box processor ID was:
    10 DIM code% 32
    20 P% = code%
    30 [ OPT     3
    40   SWI     "OS_EnterOS"
    50   MRC     CP15, 0, R0, C0, C0
    60   TSTP    PC, #&F0000000
    70   MOV     R0, R0
    80   MOV     PC, R14
    90 ]
   100 PRINT ~USR(code%)
When run, this would print:
   00008FAC                    OPT     3
   00008FAC EF000016           SWI     "OS_EnterOS"
   00008FB0 EE100F10           MRC     CP15, 0, R0, C0, C0
   00008FB4 E31FF20F           TSTP    PC, #&F0000000
   00008FB8 E1A00000           MOV     R0, R0
   00008FBC E1A0F00E           MOV     PC, R14
Note that this code must run in a privileged mode.




There are between zero and three possible co-processors. Most desktop ARM systems do not have logic for external co-processors, so we may either use that which is built into the ARM itself, or an emulated co-processor.
CP15 is reserved on the ARM 3 and later processors for internal configuration, as described in this document.
CP0 and CP1 is used by the floating point system. It may either be an external floating point chip (as used with the ARM 3), hardware built into the processor (as in the ARM 7500FE), or a totally software-based emulation (as with the FPEmulator that we all know).

Here is a short exercise for you:

    10 DIM code% 16
    20 P% = code%
    30 [ OPT     3
    40   CDP     CP1, 0, C0, C1, C2, 0
    50   ADFS    F0, F1, F3
    60   MOV     PC, R14
    70 ]
   00008F78                    OPT     3
   00008F78 EE010102           CDP     CP1, 0, C0, C1, C2
   00008F7C EE010102           ADFS    F0, F1, F2
   00008F80 E1A0F00E           MOV     PC, R14
What do you notice? :-)


When the ARM executes a co-processor instruction, or an undefined instruction, it will offer it to any co-processors which may be presently attached. If hardware is available to process the given instruction, then it is expected to do so. If it is busy at the time the instruction is offered, the ARM will wait for it.
If there is no co-processor capable of executing the instruction, the ARM will take its undefined instruction trap, in which case the following will happen:

This trap may be used to add instructions to the instruction set by emulation, or to implement a software emulation of hardware that isn't fitted. The Floating Point Emulator works by doing this.

To return, simply pull the saved PC and PSR (depends on 26/32 bit) and push them to the current PC and PSR, like MOVS PC, R14 in 26 bit systems. This will pick up with the instruction following the one which caused the trap.

All of the co-processor instructions can be executed conditionally. Please note that the conditionals relate to the status of the ARM processor, and not the status of any of the co-processors. This is because the ARM always tries the instruction first, and offers it around and maybe takes the undefined application trap, so the conditions are ARM related.
To make this clearer:

    10 DIM code% 32
    20 P% = code%
    30 [ OPT     3
    40   FLTS    F0, R0
    50   FLTS    F1, R1
    60   FMLS    F2, F0, F1
    70   FIX     R0, F2
    80   MOVS    PC, R14
    90 ]
   100 INPUT "First number : "A%
   110 INPUT "Second number: "B%
   120 PRINT USR(code%)
This probably won't assemble without an enhanced BASIC assembler.

Anyway, you might think the ARM will hand over to the floating point co-processor to do the four FP instructions, then hand back afterwards.
If you did, you would be incorrect!

What actually is executed is:

   MCR     CP1, 0, R0, C0, C0
   MCR     CP1, 0, R1, C1, C0
   CDP     CP1, 9, C2, C0, C1
   MRC     CP1, 0, R0, C0, C2

It is worth pointing out that objasm specifies co-processor registers using the CR notation (ie, CR0 - CR15), which is first defined with the CN directive. It does not appear as if default co-processor instructions are defined in Nick Roberts' ASM, though I've only looked in the instructions at the "defined symbols" section...
Darren Salt's ExtBASICasm provides the register names C0 - C15 to refer to the co-processors. So if any of these examples fail when you try to assemble them, please check what format your assembler provides these instructions.




The instruction MRC transfers a co-processor register to an ARM register. It takes the form:
   MRC    <co-pro>, <op>, <ARM reg>, <co-pro reg>, <co-pro reg2>, <op2>
The co-processor is denoted in most assemblers by CPx.
The register <co-pro reg> is written to <ARM reg>, using operation <op>. This may, possibly, be further modified by <co-pro reg2> and <op2>. For an idea of the sorts of times when this might be necessary, consider instructions of the form LDR Ra, [Rb], #x.
The final <op2> may be omitted, as it is in the example, but the other parts of the MRC instruction must be supplied.



The instruction MCR transfers an ARM register to a co-processor register. It takes the form:
   MCR    <co-pro>, <op>, <ARM reg>, <co-pro reg>, <co-pro reg2>, <op2>
The co-processor is free to interpret the fields as it desires, but the standard interpretation is that the contents of the ARM register are written to the co-processor register using the operation code given, which may be further modified by the second co-processor register and/or the second operation code.



The instruction LDC loads data from memory into the co-processor register, while STC saves data from a co-processor register to memory.
The ARM should supply the address, the co-processor accepts the data and controls how much is transferred.
   LDC    <co-pro>, <co-pro reg>, <address>
   LDCL   <co-pro>, <co-pro reg>, <address>
   STC    <co-pro>, <co-pro reg>, <address>
   STCL   <co-pro>, <co-pro reg>, <address>
If the 'L' flag is specified, a long transfer is performed. Otherwise a short transfer is performed. The 'L' flag follows the extension, like LDCEQL.
The address is an expression which results in an address being generated, so examples of which are:
   [Rx, #x] !
   [Rx], #x
These are like those used for the LDR instruction. However they are only eight bits wide and specify word offsets (the ARM types are 12 bit and byte offset).
What happens is the 8 bit unsigned offset is shifted left two bits and added or subtracted from the base register, this may be done before or after the base is used as the transfer address. The new base value can be written back, or left unmodified.
The next difference is that post-indexed addressing requires explicit setting of the W bit of the instruction (unlike LDR/STR which always does it when post-indexed). You set the 'W' bit with the '!' flag, like STR CP0, CR1, [R2, #16]!.
The base register is used for the first transfer. If there are any further transfers, the base will be incremented by one word for each of those additional transfers.



The instruction CDP instructs the co-processor to do some processing. It takes the form:
   CDP    <co-pro>, <co-pro reg1>, <co-pro reg2>, <co-pro reg3>, <op>
This tells the co-processor to do something. The ARM will not wait for it to finish, nor is any sort of status sent back to the ARM. It is possible for a co-processor to maintain a queue of instructions, allowing it and the ARM to process in parallel.
A variant of this may be obtained with the floating point hardware; while it does not (to my knowledge) support a queue of instructions, it is true that the ARM will await the FPU to finish an operation before providing the next. With careful coding, it would therefore be possible to get the ARM to do some sort of processing (a few instructions) in between sending an instruction to the FPU and reading it's result back.
So instead of:
   FLTE    F0, R0
   FLTE    F1, R1
   MUFE    F2, F0, F1
   FIX     R0, F2
   MOV     R1, #0
you could save a small amount of time with:
   FLTE    F0, R0
   FLTE    F1, R1
   MUFE    F2, F0, F1
   MOV     R1, #0
   FIX     R0, F2
as the FPU could be finishing the MUF while you MOV. The hardware FPU (as in the 7500FE) runs asynchronous - you can switch to synchronous by setting a bit in the FPSR. The software emulation always runs synchronously, and as it uses the ARM in order to emulate the FP instructions, there is no possible advantage to be gained.
Obviously the above example is somewhat contrived. However it is only an example. Real life code, such an an MP3 decoder, could well benefit from careful arrangement of code.

There are no rules for the register types and/or the operation codes. These depend upon the co-processor.



Return to assembler index
Copyright © 2004 Richard Murray