From ARMwiki
Revision as of 22:47, 26 December 2011 by Admin (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Instruction LDR[B][T]
Function Load Register
Category Load and Store
ARM family All
Notes -


LDR[B][T] : Load Register

LDR allows you a flexible way to load a 32 bit word (LDR) or an unsigned byte (LDRB) with optional translation (LDRT/LDRBT) into a register. By using PC as the base register, position independent code can be created, as can jump tables (see example).

There are nine possible addressing modes, for all purposes:

  • Immediate offset
  • Register offset
  • Scaled register offset
  • Immediate pre-indexed
  • Register pre-indexed
  • Scaled register pre-indexed
  • Immediate post-indexed
  • Register post-indexed
  • Scaled register post-indexed

These are described in more detail below.

LDR[B][T] is available in all versions of the ARM architecture. Later versions offer, additionally, LDRH/LDRSH to load unsigned or signed 16 bit halfwords, plus LDRSB to load signed bytes. These instructions are available in architecture v4 or later (ARM8/StrongARM generation).


Many. Notice that the condition code comes before the B or T specifier, and that T is only available with post-indexed addressing.
Optional parts are described in curly braces {} because square brackets [] are a part of the instruction syntax.

Immediate offset:

  LDR{cond}{B}     Rd, [Rn #{+|-}<12 bit offset>]

Register offset:

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm]

Scaled register offset:

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, {shift} #{shift immediate}]

Immediate pre-indexed:

  LDR{cond}{B}     Rd, [Rn #{+|-}<12 bit offset>]!

Register pre-indexed:

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm]!

Scaled register pre-indexed:

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, {shift} #{shift immediate}]!

Immediate post-indexed:

  LDR{cond}{B}{T}  Rd, [Rn], #{+|-}<12 bit offset>

Register post-indexed:

  LDR{cond}{B}{T}  Rd, [Rn], {+|-}Rm

Scaled register post-indexed:

  LDR{cond}{B}{T}  Rd, [Rn], {+|-}Rm, {shift} #{shift immediate}

Briefly, an ! suffix means pre-indexed, and having square brackets around only Rn specifies post-indexed.


 Read words or bytes from memory, with optional write-back.

Addressing modes

Note that the first three forms are, strictly speaking, pre-indexed. However the ARM Architecture Reference Manual uses the term "pre-indexed" only to the versions with writeback, so this is the parlance used here. You can, if you wish, think of the first three as "pre-indexed without writeback".

Immediate offset

  LDR{cond}{B}     Rd, [Rn, #{+|-}<12 bit offset>]

The register Rd will contain the word or byte loaded from Rn plus/minus the specified offset.

This addressing mode is useful for accessing structures and data fields. For example, if R3 points to the structure base, you can load the third element with:

  LDR   R0, [R3, #8]   ; R0 = word at [R3 + 8]

Register offset

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm]

The register Rd will contain the word or byte loaded from Rn plus/minus the offset specified in Rm.

This addressing mode is similar to immediate offset, except for the offset being held in a register, thus can easily cycle through array elements. Assuming R3 points to the structure base, and R4 holds the value 8, this will load the third element of a word array:

  LDR   R0, [R3, R4]   ; R0 = word at [R3+R4]

Scaled register offset

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, LSL #<shift immediate>]
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, LSR #<shift immediate>]
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, ASR #<shift immediate>]
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, ROR #<shift immediate>]
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, RRX]

This is useful for cases where the index register Rm is a counter rather than an offset, thus permitting a shift to be applied to change the counter into an offset. Assuming R0 is a function number, we can shift this into an address offset and branch to handler code as follows:

  LDR   PC, [PC, R0, LSL #2]   ; PC = PC + [R0 << 2]

Refer to the jump table example to see this in full.

Immediate pre-indexed

  LDR{cond}{B}     Rd, [Rn, #{+|-}<12 bit offset>]!

This functions as for immediate indexed, except that the calculated address is written back to the base register Rn. This permits pointer access to arrays with automatic update of the pointer. We can read each byte from a string, one by one, using something such as:

  LDRB  R0, [R1, #1]!   ; R0 = byte from [R1+1], then R1 updated

This addressing mode, when used with STR use useful for push single registered on to the RISC OS/ARMLinux (FD) stack:

  STR   R14, [R13, #-4]!

Scaled register pre-indexed

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, LSL #<shift immediate>]!
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, LSR #<shift immediate>]!
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, ASR #<shift immediate>]!
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, ROR #<shift immediate>]!
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, RRX]!

As for scaled register offset, with the difference that the calculated address is written back to Rn. The actual address is Rn plus the shifted value of Rm.

  LDR   R0, [R1, R2, LSL #2]   ; R0 = [R1 + (R2 << 2)], then R1 updated

Immediate post-indexed

  LDR{cond}{B}{T}  Rd, [Rn], #{+|-}<12 bit offset>

This is used for access to arrays with automatic update of the base register. The word to be read is read from Rn, and then Rn is updated to be Rn plus offset.
This addressing mode can be used to pop single words off of the RISC OS/ARMLinux (FD) stack:

  LDR   R14, [R13], #4

Register post-indexed

  LDR{cond}{B}{T}  Rd, [Rn], {+|-}Rm

As for immediate post-indexed, with the difference that the offset is taken from a register. The base register, Rn is updated following the read (from the address pointed to by Rn) to be Rn plus Rm.

Scaled register post-indexed

  LDR{cond}{B}     Rd, [Rn], {+|-}Rm, LSL #<shift immediate>
  LDR{cond}{B}     Rd, [Rn], {+|-}Rm, LSR #<shift immediate>
  LDR{cond}{B}     Rd, [Rn], {+|-}Rm, ASR #<shift immediate>
  LDR{cond}{B}     Rd, [Rn], {+|-}Rm, ROR #<shift immediate>
  LDR{cond}{B}     Rd, [Rn], {+|-}Rm, RRX

As for register post-indexed, with the addition that the contents of Rm are shifted.

Equivalence in C

This shows an equivalence between an ARM instruction, and behave-alike C code. Assume that r0 is a long unsigned integer, and r1 is an array of long unsigned integers.
The upper line in each example is ARM code, the lower line is equivalent C.

Immediate offset:

  LDR    R0, [R1, #4]
  r0 = r1[1];

Register offset:

  LDR    R0, [R1, R2]
  r0 = r1[r2];

Scaled register offset:

  LDR    R0, [R1, R2, LSL #4]
  r0 = r1[(r2 << 4)];

Immediate pre-indexed:

  LDR    R0, [R1, #4]!
  r1 += 4; r0 = *r1;

Register pre-indexed:

  LDR    R0, [R1, R2]!
  r1 += r2; r0 = *r1;

Scaled register pre-indexed:

  LDR    R0, [R1, R2, LSL #2]!
  r1 += (r2 << #2); r0 = *r1;

Immediate post-indexed:

  LDR    R0, [R1], #4]
  r0 = *r1; r1 += 4;

Register post-indexed:

  LDR    R0, [R1], R2
  r0 = *r1; r1 += r2;

Scaled register post-indexed:

  LDR    R0, [R1, R2, LSL #2]!
  r0 = *r1; r1 += (r2 << #2);


Simple I/O to memory transfer. In this example, R8 points to the I/O device base address, R9 points to the memory buffer where we will be writing our data, R10 points to the final address (in memory), and R11 is used as workspace. These values were chosen as they are banked in FIQ mode. The final note is the constant #DataWord which is an offset to the actual I/O register that we will be reading. The hardware clears the interrupt when the data is read.

  LDR     R11, [R8, #DataWord]  ; read word (fixed address)
  STR     R11, [R9], #4         ; write to memory (address updates)
  CMP     R9, R10               ; done?
  BLT     read                  ; no, go back for more

Jump table. A jump table (also called a multi-way branch) is a useful concept for dispatching according to an input selector. By way of an example, imagine an emulation of the 6502 processor. The 6502 presents up to 256 possible instructions (not all are used) followed by zero, one, or two data bytes. We could sort out which instruction we are executing with code such as the following:

  ; 6502 opcode in R0
  CMP   R0, #0
  BEQ   opcode_brk
  CMP   R0, #1
  BEQ   opcode_ora
  CMP   R0, #2

However anybody that writes code like this is either a total newbie, or doesn't have a future in programming.
A much better solution, that will reduce any and every 6502 opcode to a three-instruction dispatch is the jump table. Here is the start:

  CMP    R0, #((dispatch_endoftable - dispatch_table) / 4)
  ADDCC  PC, PC, R0, LSL #2
  B      opcode_inv
  ; row 0
  B      opcode_brk
  B      opcode_ora
  B      opcode_err
  ...all of the rest of the instructions...

The complicated equation at the end is checking that R0 fits into the range of (end of table - start of table) divided by four. The result, for 1024 bytes representing 256 branch instructions, should be 256 - for 256 possible 6502 opcodes.
We then calculate an address and push PC to it, the address being our desired opcode branch. If the input value is out of range, we instead branch to the invalid opcode handler. This is possible as PC, when read, is in advance of the expected location of PC due to pipelining; which provides us with the space to then insert our fall-through case branch. Nifty, huh?

You might be thinking where was the LDR? Okay then... An alternative jump table can be created by, instead of jumping to a branch, by taking an address and stuffing it directly into PC. For example, if we assume that R0 holds the operation index, and MaxOp is a constant describing the maximum number of supported operations, we can perform our branch as follows:

  CMP    R0, #MaxOp
  LDRLT  PC, [PC, LSL #2]
  B      BadIndexValue
  DCD    Op_0_Handler
  DCD    Op_1_Handler
  DCD    Op_2_Handler

This version, by using a direct load instead of a branch-to-a-branch is even faster, able to handle dispatch in only two instructions. This is useful for function wrappers, SWI handlers, and the like, where an input value selects the operation desired. For instance, instead of a dozen functions, there may (at API level) instead be a single "Misc Filesystem Function" where operation #0 is Read Size, #1 is read date, #2 is read permissions, etc etc.


  • Immediate, Register, or Scaled register: Specifying PC as Rn uses the value of (the instruction + 8).
  • Pre-indexed (any) / Post-indexed (any): Specifying PC as either Rn or Rm is unpredictable.
  • Register or Scaled register (pre- or post- indexed): Using the same register as Rn and Rm is unpredictable.
  • Pre-indexed (any) / Post-indexed (any): Using the same register as Rd and Rn is unpredictable.
  • If Translation is used (post-indexed only), the registers used will be the User Mode registers, reardless of the currrent processor mode.
  • If a word read is not word aligned, the data read is rotated so that the addressed byte is the least significant byte of the register.
  • For byte loads, the byte is zero-extended so that it, and it alone, is what is held in the specified register.


The instruction bit pattern is as follows:

31 - 28 27 - 26 25 24 23 22 21 20 19 - 16 15 - 12 11 - 0
condition 0 1 I P U B W 1 Rn (base)

to be finished (I'm knackered!)


  • P specifies if the address is incremented before the data is read (P=1) or incremented after (P=0).
  • U specifies if the address is ascending (U=1) or descending (U=0).
  • S specifies if banked register access should occur when in privileged modes [or if R15 and 26 bit and user mode, if the PSR should be written while PC is updated]
  • W specifies if the base register address should be updated after the data transfer.
Personal tools