Unaligned data access
Suppose you had the following block of memory:
|(little endian)||&8000||AA05FE02||93749132||39E000FF||00EA00EA||(traditional ARM)|
&8000 would load the value
&8008, it would load the value
What about if you loaded from
The result depends not only on the ARM processor family, but also the specific implementation. The most likely result on older (pre-Cortex) processors is the value
&05AA02FE (the normal word is loaded, and the result shifted according to the value of the bottom two bits of the address. On later ARMs (Cortex), the request is broken down into non-atomic byte loads which produce the expected result (
&05AA3291 - on a little endian system).
However, the situation is considerably more complicated than this.
ARM family differences
ARM v5 and earlier
The Load Register instruction performs a rotated load where the lower two bits of the address specified are used as an input the the shifter: If the address is %xxx01 the result will be ROR #8, if %xxx10 the result will be ROR #16, and if it is %xxx11 the result is ROR #24.
This can be synthesised with the following code:
; This is the same as: ; LDR R0, [R1] ; for when R1 is a non-aligned address BIC Rx, R1, #3 LDR R0, [Rx] AND Rx, R1, #3 MOV Rx, Rx, LSL #3 MOV R0, R0, ROR Rx
(where 'Rx' is a temporary register)
This is of some importance in legacy code as some compilers were known to use this behaviour in order to synthesise halfword loads.
The Store Register instruction treats the bottom two bits as zero, so writing a word to
&8002 will actually write it to
Using the Load Halfword instruction (ARMv4 or later) with an address where the bottom bit is set is considered unpredictable.
You will need to refer to the relevant descriptions for instructions such as Load Coprocessor or Load Double; however in general either the lower bits are ignored, or the instruction must work on correctly aligned data.
In the case of LDRD, the specified register must be even numbered, and the address must be doubleword aligned - every other permutation is unpredictable or undefined.
The ARM v6 family supports the ARM v7 behaviour, but incorporates a configuration option in the system control register to select between this and older-style behaviour.
In general, the ARM v7 will perform the logically expected operation by breaking the memory access down into multiple memory accesses in order to load the expected data. There will be time penalties in this, not only in accessing multiple memory locations, but also in the case where cache reloads are required, page boundaries are crossed, etc.
- On older ARMs (ARM v5 and earlier), when a Data Abort happened as a result of the unaligned access being faulted, it was implementation defined as to what exactly would be in the "base register". This means, if the register supplying the address has writeback enabled, it was up to the specific implementation to determine whether the register, at point of abort, would contain the original contents, or the updated contents.
- The above does not happen on later ARMs (v6+) as it is specified that the base register will not have been updated if an abort is raised.
- While the ARM's behaviour is (varied but) predictable, whether or not unaligned accesses 'work' or get faulted is not up to the ARM. It is up to the MMU. The standard ARM MMU has an option to trap unaligned accesses. Some ARM implementations may contain simpler MMUs which do not raise an abort for an (unsupported) unaligned memory access. You should be fairly safe if you are using an ARM SoC such as one of the OMAP family, however if you are using a simpler lower-specified ARM based microcontroller, you will need to consult the datasheet - assuming you can find out, the (Cortex-M0 based) LPC11U2X datasheet doesn't mention the word "abort", not once. Ditto the ARM9 Mini-SoC LPC2939...
- Multiple memory accesses on ARM v6+ are not guaranteed atomic. This means interrupts can occur between the accesses; plus data aborts can occur with either part of the access.
In short, it is safer to just not use unaligned memory accesses. The pain of needing to know the particular cases of all the different types of ARM your code may run upon is surely greater than using multiple LDRBs for cases where data cannot be held on word aligned boundaries...