LDR

From ARMwiki
(Difference between revisions)
Jump to: navigation, search
(Created (unfinished).)
 
(Clarified that LDR immediate offsets are in bytes.)
 
(2 intermediate revisions by one user not shown)
Line 9: Line 9:
  
 
==LDR[B][T] : Load Register==
 
==LDR[B][T] : Load Register==
LDR allows you a flexible way to load a 32 bit word (LDR) or an unsigned byte (LDRB) with optional translation (LDRT/LDRBT) into a register. By using PC as the base register, position independent code can be created, as can jump tables (see example).
+
''Take a deep breath - this is one of the most flexible, and complicated, of the ARM instructions.''
 +
 
 +
LDR allows you a way to load a 32 bit word (LDR) or an unsigned byte (LDRB) into a register, in a variety of addressing modes (pre- and post-indexed), with optional address translation to force accessing User mode registers (LDRT/LDRBT). By using PC as the base register, position independent code can be created, as can jump tables (see example), easy access to memory arrays, etc.
  
 
There are nine possible addressing modes, for all purposes:
 
There are nine possible addressing modes, for all purposes:
Line 27: Line 29:
 
===Syntax===
 
===Syntax===
 
Many. Notice that the condition code comes ''before'' the '''B''' or '''T''' specifier, and that '''T''' is only available with post-indexed addressing.<br>
 
Many. Notice that the condition code comes ''before'' the '''B''' or '''T''' specifier, and that '''T''' is only available with post-indexed addressing.<br>
Optional parts are described in curly braces {} because square brackets [] are a part of the instruction syntax.
+
''Optional parts are described in curly braces'' {} ''because square brackets'' [] ''are a part of the instruction syntax.''
  
 
Immediate offset:
 
Immediate offset:
   LDR{cond}{B}    Rd, [Rn #{+|-}<12 bit offset>]
+
   LDR{cond}{B}    Rd, [Rn{, #{+|-}<12 bit offset>}]
 +
''The offset is optional, as '''LDR Rd, [Rn]''' is a valid instruction, which will be assembled with an offset of zero to mean, quite simply, "load Rd with the data at Rn".''
  
 
Register offset:
 
Register offset:
Line 36: Line 39:
  
 
Scaled register offset:
 
Scaled register offset:
   LDR{cond}{B}    Rd, [Rn, {+|-}Rm, {shift} #{shift immediate}]
+
   LDR{cond}{B}    Rd, [Rn, {+|-}Rm, <shift> #<shift immediate>]
  
 
Immediate pre-indexed:
 
Immediate pre-indexed:
Line 45: Line 48:
  
 
Scaled register pre-indexed:
 
Scaled register pre-indexed:
   LDR{cond}{B}    Rd, [Rn, {+|-}Rm, {shift} #{shift immediate}]!
+
   LDR{cond}{B}    Rd, [Rn, {+|-}Rm, <shift> #<shift immediate>]!
  
 
Immediate post-indexed:
 
Immediate post-indexed:
Line 54: Line 57:
  
 
Scaled register post-indexed:
 
Scaled register post-indexed:
   LDR{cond}{B}{T}  Rd, [Rn], {+|-}Rm, {shift} #{shift immediate}
+
   LDR{cond}{B}{T}  Rd, [Rn], {+|-}Rm, <shift> #<shift immediate>
  
 
Briefly, an '''!''' suffix means pre-indexed, and having square brackets around ''only'' Rn specifies post-indexed.
 
Briefly, an '''!''' suffix means pre-indexed, and having square brackets around ''only'' Rn specifies post-indexed.
Line 64: Line 67:
  
 
Note that the first three forms are, strictly speaking, pre-indexed. However the ARM Architecture Reference Manual uses the term "pre-indexed" only to the versions with writeback, so this is the parlance used here. You can, if you wish, think of the first three as "''pre-indexed without writeback''".
 
Note that the first three forms are, strictly speaking, pre-indexed. However the ARM Architecture Reference Manual uses the term "pre-indexed" only to the versions with writeback, so this is the parlance used here. You can, if you wish, think of the first three as "''pre-indexed without writeback''".
 +
 +
With ''normal'' addressing, the address calculated is the address used.
  
 
====Immediate offset====
 
====Immediate offset====
   LDR{cond}{B}    Rd, [Rn, #{+|-}<12 bit offset>]
+
   LDR{cond}{B}    Rd, [Rn{, #{+|-}<12 bit offset>}]
 
The register '''Rd''' will contain the word or byte loaded from '''Rn''' plus/minus the specified offset.
 
The register '''Rd''' will contain the word or byte loaded from '''Rn''' plus/minus the specified offset.
  
 
This addressing mode is useful for accessing structures and data fields. For example, if R3 points to the structure base, you can load the third element with:
 
This addressing mode is useful for accessing structures and data fields. For example, if R3 points to the structure base, you can load the third element with:
 
   LDR  R0, [R3, #8]  ; R0 = word at [R3 + 8]
 
   LDR  R0, [R3, #8]  ; R0 = word at [R3 + 8]
 +
 +
:It is worth taking a moment to understand a small source of potential confusion. Because LDR and LDRB share a lot in implementation, and because some ARMs are capable of unaligned loads (that is, loading words that are not on a word boundary), the offset specified, the '''#8''', is in ''bytes''; however LDR loads ''words''.<br>
 +
:Therefore, when you are looking to load elements from an array of words, the first word would be at offset #0. As a word is four bytes, the second would be found at offset #4. The third, the one the example above is loading, is found at #8.
 +
 +
 +
A basic no-offset LDR is this with an offset of zero. It looks like:
 +
  LDR  R0, [R3]      ; R0 = word at R3
  
 
====Register offset====
 
====Register offset====
Line 88: Line 100:
 
   LDR  PC, [PC, R0, LSL #2]  ; PC = PC + [R0 << 2]
 
   LDR  PC, [PC, R0, LSL #2]  ; PC = PC + [R0 << 2]
 
Refer to the jump table example to see this in full.
 
Refer to the jump table example to see this in full.
 +
 +
 +
With ''pre-indexed'' addressing, the address calculated is the address used; with the calculated address being written back to the base register.
  
 
====Immediate pre-indexed====
 
====Immediate pre-indexed====
Line 94: Line 109:
 
   LDRB  R0, [R1, #1]!  ; R0 = byte from [R1+1], then R1 updated
 
   LDRB  R0, [R1, #1]!  ; R0 = byte from [R1+1], then R1 updated
  
This addressing mode, when used with [[STR]] use useful for push single registered on to the RISC OS/ARMLinux (FD) stack:
+
This addressing mode, when used with [[STR]] is useful for pushing single registers on to the RISC OS/ARMLinux (FD) stack:
 
   STR  R14, [R13, #-4]!
 
   STR  R14, [R13, #-4]!
 +
 +
====Register pre-indexed====
 +
  LDR{cond}{B}    Rd, [Rn, Rm]!
 +
This functions as for immediate pre-indexed, except that the offset comes from a register. This is useful for walking an array, perhaps with variable sized elements where the offsets can be written to the offset register.
 +
  LDR  R0, [R1, R2]!  ; R0 = byte from [R1+R2], then R1 updated
  
 
====Scaled register pre-indexed====
 
====Scaled register pre-indexed====
Line 104: Line 124:
 
   LDR{cond}{B}    Rd, [Rn, {+|-}Rm, RRX]!
 
   LDR{cond}{B}    Rd, [Rn, {+|-}Rm, RRX]!
 
As for scaled register offset, with the difference that the calculated address is written back to '''Rn'''. The actual address is '''Rn''' plus the shifted value of '''Rm'''.
 
As for scaled register offset, with the difference that the calculated address is written back to '''Rn'''. The actual address is '''Rn''' plus the shifted value of '''Rm'''.
   LDR  R0, [R1, R2, LSL #2]  ; R0 = [R1 + (R2 << 2)], then R1 updated
+
   LDR  R0, [R1, R2, LSL #2]!   ; R0 = [R1 + (R2 << 2)], then R1 updated
 +
 
 +
 
 +
With ''post-indexed'' addressing, the address used is the address held in the base register ('''Rn'''). Once the data has been loaded, the address is then calculated using the offsets, and this is written back to the base register.<br>
 +
Post-indexed addressing ''always'' writes back. You can use the '''T''' instruction suffix to force accessing User mode registers from a privileged mode.
  
 
====Immediate post-indexed====
 
====Immediate post-indexed====
Line 131: Line 155:
 
   LDR    R0, [R1, #4]
 
   LDR    R0, [R1, #4]
 
   r0 = r1[1];
 
   r0 = r1[1];
 +
:Remember, again, the offset is in bytes, so #4 would point to the second word (long int) in our array, thus in C the array indice [1] would provide the correct data.
  
 
Register offset:
 
Register offset:
Line 165: Line 190:
  
 
===Example===
 
===Example===
Simple I/O to memory transfer. In this example, R8 points to the I/O device base address, R9 points to the memory buffer where we will be writing our data, R10 points to the final address (in memory), and R11 is used as workspace. These values were chosen as they are banked in FIQ mode. The final note is the constant #DataWord which is an offset to the actual I/O register that we will be reading. The hardware clears the interrupt when the data is read.
+
'''Simple I/O to memory transfer.''' In this example, R8 points to the I/O device base address, R9 points to the memory buffer where we will be writing our data, R10 points to the final address (in memory), and R11 is used as workspace. These values were chosen as they are banked in FIQ mode. The final note is the constant #DataWord which is an offset to the actual I/O register that we will be reading. The hardware clears the interrupt when the data is read.
 
   .read
 
   .read
 
   LDR    R11, [R8, #DataWord]  ; read word (fixed address)
 
   LDR    R11, [R8, #DataWord]  ; read word (fixed address)
Line 172: Line 197:
 
   BLT    read                  ; no, go back for more
 
   BLT    read                  ; no, go back for more
  
Jump table. A jump table (also called a multi-way branch) is a useful concept for dispatching according to an input selector. By way of an example, imagine an emulation of the 6502 processor. The 6502 presents up to 256 possible instructions (not all are used) followed by zero, one, or two data bytes. We ''could'' sort out which instruction we are executing with code such as the following:
+
 
 +
'''Jump table.''' A jump table (also called a multi-way branch) is a useful concept for dispatching according to an input selector. By way of an example, imagine an emulation of the 6502 processor. The 6502 presents up to 256 possible instructions (not all are used) followed by zero, one, or two data bytes. We ''could'' sort out which instruction we are executing with code such as the following:
 
   ; 6502 opcode in R0
 
   ; 6502 opcode in R0
 
   CMP  R0, #0
 
   CMP  R0, #0
Line 180: Line 206:
 
   CMP  R0, #2
 
   CMP  R0, #2
 
   ...etc...
 
   ...etc...
However anybody that writes code like this is either a total newbie, or doesn't have a future in programming.<br>
+
However anybody that writes code like this is either a total newbie, or doesn't have a future in programming ... to execute the instruction '''INC <absolute>, X''' which is opcode &FE, you would need to work through some two ''hundred'' comparisons and skip over the corresponding number of branches (the exact number depends on how you handle invalid opcodes). It is possible that you might reach your INC handler branch after ''five hundred and six'' instructions! As you can imagine, such code would be tedious in every possible sense of the word.
A much better solution, that will reduce any and every 6502 opcode to a three-instruction dispatch is the jump table. Here is the start:
+
 
 +
A much better solution, that will reduce any and every 6502 opcode to a three-instruction dispatch is the jump table. Here it is:
 
   CMP    R0, #((dispatch_endoftable - dispatch_table) / 4)
 
   CMP    R0, #((dispatch_endoftable - dispatch_table) / 4)
 
   ADDCC  PC, PC, R0, LSL #2
 
   ADDCC  PC, PC, R0, LSL #2
Line 194: Line 221:
 
   .dispatch_endoftable
 
   .dispatch_endoftable
 
The complicated equation at the end is checking that R0 fits into the range of (end of table - start of table) divided by four. The result, for 1024 bytes representing 256 branch instructions, should be 256 - for 256 possible 6502 opcodes.<br>
 
The complicated equation at the end is checking that R0 fits into the range of (end of table - start of table) divided by four. The result, for 1024 bytes representing 256 branch instructions, should be 256 - for 256 possible 6502 opcodes.<br>
We then calculate an address and push PC to it, the address being our desired opcode branch. If the input value is out of range, we instead branch to the invalid opcode handler. This is possible as PC, when read, is in advance of the expected location of PC due to pipelining; which provides us with the space to then insert our fall-through case branch. Nifty, huh?
+
We then calculate a relative address (PC plus shifted R0 offset) and push it into PC, the address being our desired opcode branch. If the input value is out of range, we instead fall through to branch to the invalid opcode handler. This is possible as PC, when read, is in advance of the expected location of PC due to how the ARM works (it is actually PC+8); which provides us with the space to then insert our fall-through case branch. Nifty, huh?
  
You might be thinking ''where was the LDR?'' Okay then... An alternative jump table can be created by, instead of jumping to a branch, by taking an address and stuffing it directly into PC. For example, if we assume that '''R0''' holds the operation index, and '''MaxOp''' is a constant describing the maximum number of supported operations, we can perform our branch as follows:
+
 
 +
'''Better jump table.''' You might be thinking ''where was the LDR?'' The above example demonstrates how a jump table functions. An alternative jump table can be created by, instead of jumping to a branch, by instead taking an address and stuffing it directly into PC. For example, if we assume that '''R0''' holds the operation index, and '''MaxOp''' is a constant describing the maximum number of supported operations, we can perform our branch as follows:
 
   CMP    R0, #MaxOp
 
   CMP    R0, #MaxOp
   LDRLT  PC, [PC, LSL #2]
+
   LDRLT  PC, [PC, R0, LSL #2]
 
   B      BadIndexValue
 
   B      BadIndexValue
 
    
 
    
Line 205: Line 233:
 
   DCD    Op_2_Handler
 
   DCD    Op_2_Handler
 
   ...etc...
 
   ...etc...
This version, by using a direct load instead of a branch-to-a-branch is even faster, able to handle dispatch in only ''two'' instructions. This is useful for function wrappers, SWI handlers, and the like, where an input value selects the operation desired. For instance, instead of a dozen functions, there may (at API level) instead be a single "Misc Filesystem Function" where operation #0 is Read Size, #1 is read date, #2 is read permissions, etc etc.
+
This version, by using a direct load instead of a branch-to-a-branch is even better, able to handle dispatch in only '''''two''''' instructions (compare, then load). This is useful for function wrappers, SWI handlers, and the like, where an input value selects the operation desired. For instance, instead of a dozen functions, there may (at API level) instead be a single "Misc Filesystem Function" where operation #0 is Read Size, #1 is Read Date, #2 is Read Permissions, etc etc.
  
 
===Notes===
 
===Notes===
Line 217: Line 245:
  
 
===Technical===
 
===Technical===
The instruction bit pattern is as follows:
+
The instruction bit patterns are as follows.
 +
*'''I''' - Register (set) or Immediate (unset)
 +
*'''P''' - Pre-indexed (set) or Post-indexed (unset)
 +
*'''U''' - Offset ''added'' to base (set) or ''subtracted'' from base (unset)
 +
*'''B''' - Unsigned byte (set) or word (unset) access
 +
*'''W''' - Depends on the '''P''' bit:
 +
**'''[P = 1]''' - the calculated address will be written back if '''W''' set.
 +
**'''[P = 0]''' - the access is treated as a User mode access if '''W''' set (has no effect if processor in User mode).
 +
*'''L''' - operation is a Load (set) or a Store (unset)
  
 +
'''Immediate offset/index''':
 
{| border="1" align="center" style="text-align:center;"
 
{| border="1" align="center" style="text-align:center;"
 
! 31 - 28
 
! 31 - 28
Line 241: Line 278:
 
| 1
 
| 1
 
| Rn (base)
 
| Rn (base)
|  
+
| Rd
 +
| 12 bit offset
 
|}
 
|}
  
 +
'''Register offset/index''':
 +
{| border="1" align="center" style="text-align:center;"
 +
! 31 - 28
 +
! 27 - 26
 +
! 25
 +
! 24
 +
! 23
 +
! 22
 +
! 21
 +
! 20
 +
! 19 - 16
 +
! 15 - 12
 +
! 11 - 4
 +
! 3 - 0
 +
|-
 +
| condition
 +
| 0 1
 +
| I
 +
| P
 +
| U
 +
| B
 +
| W
 +
| 1
 +
| Rn (base)
 +
| Rd
 +
| 0 0 0 0 0 0 0 0
 +
| Rm
 +
|}
  
'''''to be finished''''' (I'm knackered!)
+
'''Scaled Register offset/index''':
 +
{| border="1" align="center" style="text-align:center;"
 +
! 31 - 28
 +
! 27 - 26
 +
! 25
 +
! 24
 +
! 23
 +
! 22
 +
! 21
 +
! 20
 +
! 19 - 16
 +
! 15 - 12
 +
! 11 - 7
 +
! 6 - 5
 +
! 4
 +
! 3 - 0
 +
|-
 +
| condition
 +
| 0 1
 +
| I
 +
| P
 +
| U
 +
| B
 +
| W
 +
| 1
 +
| Rn (base)
 +
| Rd
 +
| shift immediate
 +
| shift
 +
| 0
 +
| Rm
 +
|}
  
  
Where:
+
To help clarify, the bits 27-20 are as follows for each of the available options:
*'''P''' specifies if the address is incremented ''before'' the data is read (P=1) or incremented ''after'' (P=0).
+
{| border="1" align="center" style="text-align:center;"
*'''U''' specifies if the address is ''ascending'' (U=1) or ''descending'' (U=0).
+
! Addressing mode
*'''S''' specifies if banked register access should occur when in privileged modes [or if R15 and 26 bit and user mode, if the PSR should be written while PC is updated]
+
! 27 - 26
*'''W''' specifies if the base register address should be updated after the data transfer.
+
! 25
 +
! 24
 +
! 23
 +
! 22
 +
! 21
 +
! 20
 +
|-
 +
| Immediate offset
 +
| 0 1
 +
| 0
 +
| 1
 +
| U
 +
| B
 +
| 0
 +
| L
 +
|-
 +
| (Scaled) Register offset
 +
| 0 1
 +
| 1
 +
| 1
 +
| U
 +
| B
 +
| 0
 +
| L
 +
|-
 +
| Immediate Pre-indexed
 +
| 0 1
 +
| 0
 +
| 1
 +
| U
 +
| B
 +
| 1
 +
| L
 +
|-
 +
| (Scaled) Register Pre-indexed
 +
| 0 1
 +
| 1
 +
| 1
 +
| U
 +
| B
 +
| 1
 +
| L
 +
|-
 +
| Immediate Post-indexed
 +
| 0 1
 +
| 0
 +
| 0
 +
| U
 +
| B
 +
| 0
 +
| L
 +
|-
 +
| (Scaled) Register Post-indexed
 +
| 0 1
 +
| 1
 +
| 0
 +
| U
 +
| B
 +
| 0
 +
| L
 +
|}
 +
 
 +
''You can differentiate between Register and Scaled Register by looking at bits 11-4, which will be all zero for Register, or set accordingly if Scaled Register.''
  
 
[[Category:Opcodes]]
 
[[Category:Opcodes]]

Latest revision as of 05:29, 7 March 2012

LDR
Instruction LDR[B][T]
Function Load Register
Category Load and Store
ARM family All
Notes -

Contents

[edit] LDR[B][T] : Load Register

Take a deep breath - this is one of the most flexible, and complicated, of the ARM instructions.

LDR allows you a way to load a 32 bit word (LDR) or an unsigned byte (LDRB) into a register, in a variety of addressing modes (pre- and post-indexed), with optional address translation to force accessing User mode registers (LDRT/LDRBT). By using PC as the base register, position independent code can be created, as can jump tables (see example), easy access to memory arrays, etc.

There are nine possible addressing modes, for all purposes:

  • Immediate offset
  • Register offset
  • Scaled register offset
  • Immediate pre-indexed
  • Register pre-indexed
  • Scaled register pre-indexed
  • Immediate post-indexed
  • Register post-indexed
  • Scaled register post-indexed

These are described in more detail below.

LDR[B][T] is available in all versions of the ARM architecture. Later versions offer, additionally, LDRH/LDRSH to load unsigned or signed 16 bit halfwords, plus LDRSB to load signed bytes. These instructions are available in architecture v4 or later (ARM8/StrongARM generation).

[edit] Syntax

Many. Notice that the condition code comes before the B or T specifier, and that T is only available with post-indexed addressing.
Optional parts are described in curly braces {} because square brackets [] are a part of the instruction syntax.

Immediate offset:

  LDR{cond}{B}     Rd, [Rn{, #{+|-}<12 bit offset>}]

The offset is optional, as LDR Rd, [Rn] is a valid instruction, which will be assembled with an offset of zero to mean, quite simply, "load Rd with the data at Rn".

Register offset:

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm]

Scaled register offset:

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, <shift> #<shift immediate>]

Immediate pre-indexed:

  LDR{cond}{B}     Rd, [Rn #{+|-}<12 bit offset>]!

Register pre-indexed:

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm]!

Scaled register pre-indexed:

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, <shift> #<shift immediate>]!

Immediate post-indexed:

  LDR{cond}{B}{T}  Rd, [Rn], #{+|-}<12 bit offset>

Register post-indexed:

  LDR{cond}{B}{T}  Rd, [Rn], {+|-}Rm

Scaled register post-indexed:

  LDR{cond}{B}{T}  Rd, [Rn], {+|-}Rm, <shift> #<shift immediate>

Briefly, an ! suffix means pre-indexed, and having square brackets around only Rn specifies post-indexed.

[edit] Function

 Read words or bytes from memory, with optional write-back.

[edit] Addressing modes

Note that the first three forms are, strictly speaking, pre-indexed. However the ARM Architecture Reference Manual uses the term "pre-indexed" only to the versions with writeback, so this is the parlance used here. You can, if you wish, think of the first three as "pre-indexed without writeback".

With normal addressing, the address calculated is the address used.

[edit] Immediate offset

  LDR{cond}{B}     Rd, [Rn{, #{+|-}<12 bit offset>}]

The register Rd will contain the word or byte loaded from Rn plus/minus the specified offset.

This addressing mode is useful for accessing structures and data fields. For example, if R3 points to the structure base, you can load the third element with:

  LDR   R0, [R3, #8]   ; R0 = word at [R3 + 8]
It is worth taking a moment to understand a small source of potential confusion. Because LDR and LDRB share a lot in implementation, and because some ARMs are capable of unaligned loads (that is, loading words that are not on a word boundary), the offset specified, the #8, is in bytes; however LDR loads words.
Therefore, when you are looking to load elements from an array of words, the first word would be at offset #0. As a word is four bytes, the second would be found at offset #4. The third, the one the example above is loading, is found at #8.


A basic no-offset LDR is this with an offset of zero. It looks like:

  LDR   R0, [R3]       ; R0 = word at R3

[edit] Register offset

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm]

The register Rd will contain the word or byte loaded from Rn plus/minus the offset specified in Rm.

This addressing mode is similar to immediate offset, except for the offset being held in a register, thus can easily cycle through array elements. Assuming R3 points to the structure base, and R4 holds the value 8, this will load the third element of a word array:

  LDR   R0, [R3, R4]   ; R0 = word at [R3+R4]

[edit] Scaled register offset

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, LSL #<shift immediate>]
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, LSR #<shift immediate>]
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, ASR #<shift immediate>]
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, ROR #<shift immediate>]
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, RRX]

This is useful for cases where the index register Rm is a counter rather than an offset, thus permitting a shift to be applied to change the counter into an offset. Assuming R0 is a function number, we can shift this into an address offset and branch to handler code as follows:

  LDR   PC, [PC, R0, LSL #2]   ; PC = PC + [R0 << 2]

Refer to the jump table example to see this in full.


With pre-indexed addressing, the address calculated is the address used; with the calculated address being written back to the base register.

[edit] Immediate pre-indexed

  LDR{cond}{B}     Rd, [Rn, #{+|-}<12 bit offset>]!

This functions as for immediate indexed, except that the calculated address is written back to the base register Rn. This permits pointer access to arrays with automatic update of the pointer. We can read each byte from a string, one by one, using something such as:

  LDRB  R0, [R1, #1]!   ; R0 = byte from [R1+1], then R1 updated

This addressing mode, when used with STR is useful for pushing single registers on to the RISC OS/ARMLinux (FD) stack:

  STR   R14, [R13, #-4]!

[edit] Register pre-indexed

  LDR{cond}{B}     Rd, [Rn, Rm]!

This functions as for immediate pre-indexed, except that the offset comes from a register. This is useful for walking an array, perhaps with variable sized elements where the offsets can be written to the offset register.

  LDR   R0, [R1, R2]!   ; R0 = byte from [R1+R2], then R1 updated

[edit] Scaled register pre-indexed

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, LSL #<shift immediate>]!
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, LSR #<shift immediate>]!
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, ASR #<shift immediate>]!
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, ROR #<shift immediate>]!
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, RRX]!

As for scaled register offset, with the difference that the calculated address is written back to Rn. The actual address is Rn plus the shifted value of Rm.

  LDR   R0, [R1, R2, LSL #2]!   ; R0 = [R1 + (R2 << 2)], then R1 updated


With post-indexed addressing, the address used is the address held in the base register (Rn). Once the data has been loaded, the address is then calculated using the offsets, and this is written back to the base register.
Post-indexed addressing always writes back. You can use the T instruction suffix to force accessing User mode registers from a privileged mode.

[edit] Immediate post-indexed

  LDR{cond}{B}{T}  Rd, [Rn], #{+|-}<12 bit offset>

This is used for access to arrays with automatic update of the base register. The word to be read is read from Rn, and then Rn is updated to be Rn plus offset.
This addressing mode can be used to pop single words off of the RISC OS/ARMLinux (FD) stack:

  LDR   R14, [R13], #4

[edit] Register post-indexed

  LDR{cond}{B}{T}  Rd, [Rn], {+|-}Rm

As for immediate post-indexed, with the difference that the offset is taken from a register. The base register, Rn is updated following the read (from the address pointed to by Rn) to be Rn plus Rm.

[edit] Scaled register post-indexed

  LDR{cond}{B}     Rd, [Rn], {+|-}Rm, LSL #<shift immediate>
  LDR{cond}{B}     Rd, [Rn], {+|-}Rm, LSR #<shift immediate>
  LDR{cond}{B}     Rd, [Rn], {+|-}Rm, ASR #<shift immediate>
  LDR{cond}{B}     Rd, [Rn], {+|-}Rm, ROR #<shift immediate>
  LDR{cond}{B}     Rd, [Rn], {+|-}Rm, RRX

As for register post-indexed, with the addition that the contents of Rm are shifted.

[edit] Equivalence in C

This shows an equivalence between an ARM instruction, and behave-alike C code. Assume that r0 is a long unsigned integer, and r1 is an array of long unsigned integers.
The upper line in each example is ARM code, the lower line is equivalent C.

Immediate offset:

  LDR    R0, [R1, #4]
  r0 = r1[1];
Remember, again, the offset is in bytes, so #4 would point to the second word (long int) in our array, thus in C the array indice [1] would provide the correct data.

Register offset:

  LDR    R0, [R1, R2]
  r0 = r1[r2];

Scaled register offset:

  LDR    R0, [R1, R2, LSL #4]
  r0 = r1[(r2 << 4)];

Immediate pre-indexed:

  LDR    R0, [R1, #4]!
  r1 += 4; r0 = *r1;

Register pre-indexed:

  LDR    R0, [R1, R2]!
  r1 += r2; r0 = *r1;

Scaled register pre-indexed:

  LDR    R0, [R1, R2, LSL #2]!
  r1 += (r2 << #2); r0 = *r1;

Immediate post-indexed:

  LDR    R0, [R1], #4]
  r0 = *r1; r1 += 4;

Register post-indexed:

  LDR    R0, [R1], R2
  r0 = *r1; r1 += r2;

Scaled register post-indexed:

  LDR    R0, [R1, R2, LSL #2]!
  r0 = *r1; r1 += (r2 << #2);

[edit] Example

Simple I/O to memory transfer. In this example, R8 points to the I/O device base address, R9 points to the memory buffer where we will be writing our data, R10 points to the final address (in memory), and R11 is used as workspace. These values were chosen as they are banked in FIQ mode. The final note is the constant #DataWord which is an offset to the actual I/O register that we will be reading. The hardware clears the interrupt when the data is read.

 .read
  LDR     R11, [R8, #DataWord]  ; read word (fixed address)
  STR     R11, [R9], #4         ; write to memory (address updates)
  CMP     R9, R10               ; done?
  BLT     read                  ; no, go back for more


Jump table. A jump table (also called a multi-way branch) is a useful concept for dispatching according to an input selector. By way of an example, imagine an emulation of the 6502 processor. The 6502 presents up to 256 possible instructions (not all are used) followed by zero, one, or two data bytes. We could sort out which instruction we are executing with code such as the following:

  ; 6502 opcode in R0
  CMP   R0, #0
  BEQ   opcode_brk
  CMP   R0, #1
  BEQ   opcode_ora
  CMP   R0, #2
  ...etc...

However anybody that writes code like this is either a total newbie, or doesn't have a future in programming ... to execute the instruction INC <absolute>, X which is opcode &FE, you would need to work through some two hundred comparisons and skip over the corresponding number of branches (the exact number depends on how you handle invalid opcodes). It is possible that you might reach your INC handler branch after five hundred and six instructions! As you can imagine, such code would be tedious in every possible sense of the word.

A much better solution, that will reduce any and every 6502 opcode to a three-instruction dispatch is the jump table. Here it is:

  CMP    R0, #((dispatch_endoftable - dispatch_table) / 4)
  ADDCC  PC, PC, R0, LSL #2
  B      opcode_inv
  
 .dispatch_table
  ; row 0
  B      opcode_brk
  B      opcode_ora
  B      opcode_err
  ...all of the rest of the instructions...
 .dispatch_endoftable

The complicated equation at the end is checking that R0 fits into the range of (end of table - start of table) divided by four. The result, for 1024 bytes representing 256 branch instructions, should be 256 - for 256 possible 6502 opcodes.
We then calculate a relative address (PC plus shifted R0 offset) and push it into PC, the address being our desired opcode branch. If the input value is out of range, we instead fall through to branch to the invalid opcode handler. This is possible as PC, when read, is in advance of the expected location of PC due to how the ARM works (it is actually PC+8); which provides us with the space to then insert our fall-through case branch. Nifty, huh?


Better jump table. You might be thinking where was the LDR? The above example demonstrates how a jump table functions. An alternative jump table can be created by, instead of jumping to a branch, by instead taking an address and stuffing it directly into PC. For example, if we assume that R0 holds the operation index, and MaxOp is a constant describing the maximum number of supported operations, we can perform our branch as follows:

  CMP    R0, #MaxOp
  LDRLT  PC, [PC, R0, LSL #2]
  B      BadIndexValue
  
  DCD    Op_0_Handler
  DCD    Op_1_Handler
  DCD    Op_2_Handler
  ...etc...

This version, by using a direct load instead of a branch-to-a-branch is even better, able to handle dispatch in only two instructions (compare, then load). This is useful for function wrappers, SWI handlers, and the like, where an input value selects the operation desired. For instance, instead of a dozen functions, there may (at API level) instead be a single "Misc Filesystem Function" where operation #0 is Read Size, #1 is Read Date, #2 is Read Permissions, etc etc.

[edit] Notes

  • Immediate, Register, or Scaled register: Specifying PC as Rn uses the value of (the instruction + 8).
  • Pre-indexed (any) / Post-indexed (any): Specifying PC as either Rn or Rm is unpredictable.
  • Register or Scaled register (pre- or post- indexed): Using the same register as Rn and Rm is unpredictable.
  • Pre-indexed (any) / Post-indexed (any): Using the same register as Rd and Rn is unpredictable.
  • If Translation is used (post-indexed only), the registers used will be the User Mode registers, reardless of the currrent processor mode.
  • If a word read is not word aligned, the data read is rotated so that the addressed byte is the least significant byte of the register.
  • For byte loads, the byte is zero-extended so that it, and it alone, is what is held in the specified register.

[edit] Technical

The instruction bit patterns are as follows.

  • I - Register (set) or Immediate (unset)
  • P - Pre-indexed (set) or Post-indexed (unset)
  • U - Offset added to base (set) or subtracted from base (unset)
  • B - Unsigned byte (set) or word (unset) access
  • W - Depends on the P bit:
    • [P = 1] - the calculated address will be written back if W set.
    • [P = 0] - the access is treated as a User mode access if W set (has no effect if processor in User mode).
  • L - operation is a Load (set) or a Store (unset)

Immediate offset/index:

31 - 28 27 - 26 25 24 23 22 21 20 19 - 16 15 - 12 11 - 0
condition 0 1 I P U B W 1 Rn (base) Rd 12 bit offset

Register offset/index:

31 - 28 27 - 26 25 24 23 22 21 20 19 - 16 15 - 12 11 - 4 3 - 0
condition 0 1 I P U B W 1 Rn (base) Rd 0 0 0 0 0 0 0 0 Rm

Scaled Register offset/index:

31 - 28 27 - 26 25 24 23 22 21 20 19 - 16 15 - 12 11 - 7 6 - 5 4 3 - 0
condition 0 1 I P U B W 1 Rn (base) Rd shift immediate shift 0 Rm


To help clarify, the bits 27-20 are as follows for each of the available options:

Addressing mode 27 - 26 25 24 23 22 21 20
Immediate offset 0 1 0 1 U B 0 L
(Scaled) Register offset 0 1 1 1 U B 0 L
Immediate Pre-indexed 0 1 0 1 U B 1 L
(Scaled) Register Pre-indexed 0 1 1 1 U B 1 L
Immediate Post-indexed 0 1 0 0 U B 0 L
(Scaled) Register Post-indexed 0 1 1 0 U B 0 L

You can differentiate between Register and Scaled Register by looking at bits 11-4, which will be all zero for Register, or set accordingly if Scaled Register.

Personal tools
Namespaces

Variants
Actions
Navigation
Contents
Toolbox