Floating point



The BASIC assembler, as standard, does not have any support for true floating point instructions. You have the ability to convert integers to your implementation-defined 'floating point' and perform basic mathematics with them (most usually fixed point), but you cannot interact with a floating point co-processor and do things the 'native' way.
There are, however, patches which extend the things that the assembler can do - which include FP instructions.

Parts of this documentation has been taken from the ARM Assembler manual.

Please note that this describes the floating point implementation used with RISC OS and not the VFP provided with the more recent ARM processors (ARMv5 etc).


How it works with the ARM

The ARM processor can interface with up to sixteen co-processors. The ARM3 and later have virtual co-processors within the ARM to handle internal control functions. But the first co-processor that was available was the floating point processor. This chip handles floating point maths to the IEEE standard.

A standard ARM floating point instruction set has been defined, so that the code may be used across all RISC OS machines. If the actual hardware does not exist, then the instructions are trapped and executed by the floating point emulator module (FPEmulator). The program does not need to know whether or not the FP co-processor is present. The only real difference will be speed of execution.

If you are interested in the co-processor aspect, read the document on co-processor access.

Note: If a real FPU is attached, this will pick up on the unrecognised instructions and do the work itself. However in the case of systems such as the ARM7500FE, the work of the floating point unit is shared between hardware (for instructions like MUF (multiply)) and software (for instructions like LGN (log. to base e)).


FP registers

The ARM IEEE FP system has eight high precision FP registers (F0 to F7). The register format is irrelevant as you cannot access those registers directly, the register is only 'visible' when it is transferred to memory or to an ARM register. In memory, an FP register consumes three words, but as the FP system will be reloading its own register, the format of these three words is considered irrelevant.

There is also an FPSR (floating point status register) which, similar to the ARM's own PSR, holds the status information that an application might require. Each of the flags available has a 'trap' which allows the application to enable or disable traps associated with the given error.

The FPSR also allows you to tell between different implementations of the FP system.
There may also be an FPCR (floating point control register). This holds information that the application should not access, such as flags to turn the FP unit on and off. Typically, hardware will have an FPCR, software will not. Do not attempt to use the FPCR - some parts of it are read-sensitive, so your reading it will affect the rest of the system (like FPE!).

FP units can be software implementations such as the FPEmulator modules, hardware implementations such as the FP chip (and support code), or a combination of both.
The "most original" example of a 'both' that I can think of is the Warm Silence Software patch that will utilise the 80x87 chip on suitably equipped PC co-processor cards as a floating point processor for ARM FP operations. Talk about resource sharing...!

The results are calculated as though it were infinite precision, then they are rounded to the length required. The rounding may be to nearest, to +infinity(P), to -infinity(M), or to zero. The default is rounding to nearest. If a tie, it will round to nearest even.
The working precision is 80 bits, comprising of a 64 bit mantissa, a 15 bit exponent, and a sign bit. Specific instructions that work with single precision may provide better performance in some implementations - notably fully-software-based ones.

The FPSR contains the necessary status for the FP system. The IEEE flags are always present, but the result flags are only available after an FP compare operation.

Floating point instructions should not be used from SVC mode.

The FPSR is laid out as follows:

31           24 23           16 15             8 7              0
System ID      | Trap enable   | System control | Exception flags

The defined system IDs are:

&00   Old FPE  - FPE module prior to v4.00
&80   FPPC     - Interface between ARM and WE32206 (AT&T MAU)
&01   FPE 400  - FPE module v4.00 or later
&81   FPA      - ARM FPU

The Trap enable byte is:

23 22 21   20     19     18     17     16
Reserved   INX    UFL    OFL    DVZ    IVO
If an exception flag bit is set following an operation, and the corresponding trap enable bit is set, then the exception trap will be taken.

The System control byte is:

15 14 13   12     11     10     9      8
Reserved   AC     EP     SO     NE     ND
(these bits have no meaning on Old FPE and FPPC systems)

AC Use Alternative definition for C flag on compares
When set, the ARM 'C' flag is set after a FP comparison to mean Greater Than or Equal or Unordered. When clear, 'C' is set for Greater Than or Equal.
EP Use Expanded Packed decimal format
When set, choosing Packed Decimal format will result in Expanded Packed Decimal to be used.
The PRM says "Use of this expanded format allows conversion from extended precision to packed decimal and back again without loss of accuracy." Given that Double Extended Precision uses three words, while Extended Packed Decimal uses four, why not use Double Extended if you need to keep precision?
I'm sure there's a reason for EPD - I just don't get it...
SO Select Synchronous Operation of the FPU
When set, all FP operations are executed synchronously and the ARM will be forced to busy-wait until the FPU is done.
There is a trade off. With synchronous operation, exceptions will be raised at the expected time and place, and the addresses of such exceptions will be correct. However, with async. operation, the ARM can actually get on and do stuff while the FPU does it's own stuff (provided, of course, the ARM isn't waiting on an FP result).
Obviously, this bit has no meaning on the software FP systems!
NE NaN Exception
The PRM says: "If this bit is set, then an attempt to store a signalling NaN that involves a change of format will cause an exception (for full IEEE compatibility).
I have no idea what that actually means (to me, a 'signalling NaN' brings to mind an old lady with big glasses waving a red lantern out the back of a steam train...).
The alternative? If this bit is clear, then no exception will occur - for compatibility with 'old FPE'.
ND No Denormalised numbers
The PRM says "If this bit is set, then the software will force all denormalised numbers to zero to prevent lengthy execution times when dealing with denormalised numbers."
Unfortunately, "the software" is ambiguous - I assume this means the FPE module does this. Would a hardware FPU do this as well?
I looked in my dictionary for a definition of 'denormalised' - it went from 'denominator' to 'denotation'. Oh well...
Exception flags, the lower byte of the FPSR:
7  6   5   4      3      2      1      0
Reserved   INX    UFL    OFL    DVZ    IVO
Whenever an exception condition arises, the appropriate cumulative exception flag in bits 0 to 4 will be set to 1. If the relevant trap enable bit is set, then an exception is also delivered to the user's program in a manner specific to the operating system. (Note that in the case of underflow, the state of the trap enable bit determines under which conditions the underflow flag will be set.) These flags can only be cleared by a WFS instruction.

IVO InValid Operation
The IVO flag is set when an operand is invalid for the operation to be performed. Invalid operations are:
DVZ DiVision by Zero
The DVZ flag is set if the divisor is zero and the dividend a finite, non-zero number. A correctly signed infinity is returned if the trap is disabled. The flag is also set for LOG(0) and for LGN(0). Negative infinity is returned if the trap is disabled.
OFL OverFLow
The OFL flag is set whenever the destination format's largest number is exceeded in magnitude by the rounded result would have been were the exponent range unbounded. As overflow is detected after rounding a result, whether overflow occurs or not after some operations depends the rounding mode.
If the trap is disabled either a correctly signed infinity is returned, or the format's largest finite number. This depends on the rounding mode and floating point system used.
UFL UnderFLow
Two correlated events contribute to underflow: The UFL flag is set in different ways depending on the value of the UFL trap enable bit. If the trap is enabled, then the UFL flag is set when tininess is detected regardless of loss of accuracy. If the trap is disabled, then the UFL flag is set when both tininess and loss of accuracy are detected (in which case the INX flag is also set); otherwise a correctly signed zero is returned.
As underflow is detected after rounding a result, whether underflow occurs or not after some operations depends on the rounding mode.
The INX flag is set if the rounded result of an operation is not exact (different from the value computable with infinite precision), or overflow has occurred while the OFL trap was disabled, or underflow has occurred while the UFL trap was disabled. OFL or UFL traps take precedence over INX.
The INX flag is also set when computing SIN or COS, with the exceptions of SIN(0) and COS(1).
The old FPE and the FPPC system may differ in their handling of the INX flag. Because of this inconsistency, it is recommended that you do not enable the INX trap.


Precision and rounding

Precision is: A precision must be specified.


Rounding modes are:

If no rounding mode is specified, 'nearest' will be assumed.


Data format

We shall briefly describe the Single and Double precision numbers as they appear in memory. If you require details of the other three formats, you are referred to appropriate documentation, such a the RISC OS 3 Programmer's Reference Manuals volume 4, pages 167 to 169.

Because the floating point system is a little complex, we shall quickly look at how floating point typically operates.
As we all know, a computer can use bit patterns to represent numbers. Older machines could easily handle between 0 and 255, or 0 and 65535. The ARM processor can easily handle between zero and 4294967295. A 64 bit processor can easily handle between 0 and 18446744073709551615.
Obviously, by using other techniques, any system can store all sorts of numbers - BBC BASIC running on a 6502 didn't crash if you told it to count to 257, for example.
As the data widths get larger, and the numbers that can be handled in one go get larger, this still does not help with the simplest case of PI...
But... there's a solution. An eight-bit processor can handle numbers larger than 255 by using the simple formula ( (high_byte x 256) + low_byte ). Okay, things are more complex, but you get the idea.
So, if we want to store PI (we'll take PI as being 3.14159265), then why don't we simply store the number 314159265, and alongside it we'll store also a value saying that the 'real' decimal point should shift eight places to the left.

PI = 314159265.0 [<-8] = 31415926.5 [<-7] = ... = 3.14159265
In the above example, I've shown it after the first shift of the decimal point, just to give you a visual idea of how it works.
Unfortunately, the IEEE system isn't anywhere near this easy to understand. In fact, I've written it up below and given some examples, but I still don't get it...


IEEE Single Precision

 31   30      23 22                  0
Sign | Exponent | (msb) Fraction (lsb)

IEEE Double Precision

              31   30      20 19                  0
First word:  Sign | Exponent | (msb) Fraction (lsb)
Second word: (msb)          Fraction          (lsb)

To a non-mathematician (such as myself!), the system does not appear to make an awful lot of sense, for example:

DIM code% 64
FOR opt% = 0 TO 2 STEP 2
  P% = code%
  [  OPT  opt%
     ext  1
     MVFS F0, #0.5      ; or whatever other value...
     STFS F0, store
     MOV  PC, R14
     DCD  0   ; only one word - we're using single precision
CALL code%
Examining the memory location gives:
   0     &00000000   exponent = 0    fraction = 0        sign = 0
   0.5   &40000000   exponent = 9    fraction = 0        sign = 0
   1     &3F800000   exponent = 254  fraction = 0        sign = 0
   2     &08B4000D   exponent = 34   fraction = 13       sign = 0
   5     &40A00000   exponent = 2    fraction = 2097152  sign = 1
   10    &41200000   exponent = 4    fraction = 2097152  sign = 1

Continuing, working up some code with the 'printlib' example supplied with the C/assembler development software, we arrive at:

 -0.123456 is -1.2346E-1
   9.99996 is 1.0000E1
-0.0999998 is -1.0000E-1
  0.999997 is 1.0000E0
      -0.0 is 0
9.99999E99 is 1.0000E100
This apprarently does make sense - only not to me - so I cannot explain it! :-)


The FP instruction set

LDF<condition><precision><fp register>, <address>
Load Floating Point value
The address can be in the forms: This call is similar to LDR.
Your assembler may allow literals to be used, such as LDFS F0, [float_value]


STF<condition><precision><fp register>, <address>
Store floating point value. The address can be in the forms:

This call is similar to STR.
Your assembler may allow literals to be used, such as STFED F0, [float_value]


These are similar in idea to LDM and STM, but they will not be described because some versions of FPEmulator do not support them. The FP module in RISC OS 3.1x (2.87) does, as do (I think) later versions. If you know they your software will only operate on a system that supports SFM, then use it. Otherwise you'll need to 'fake' it with a sequence of STFs. Likewise for LFM/LDF.


FLT<condition><precision><rounding> <fp register>, <register>
FLT<condition><precision><rounding> <fp register>, #<value>

Convert integer to floating point, either an ARM register or an absolute value.


FIX<condition><rounding> <register>, <fp register>
Convert floating point to integer.


WFS<condition> <register>
Write floating point status register with the contents of the ARM register specified.


RFS<condition> <register>
Read floating point status register into the ARM register specified.


WFC<condition> <register>
Write floating point control register with the contents of the ARM register specified.
Supervisor mode only, and only on hardware that supports it.


RFC<condition> <register>
Read floating point control register into the ARM register specified.
Supervisor mode only, and only on hardware that supports it.


Floating point co-processor data operations:
The formats of these instructions are:

The #value constants should be 0, 1, 2, 3, 4, 5, 10, or 0.5.

The binary operations are...
ADF - Add
DVF - Divide
FDV - Fast Divide - only defined to work with single precision
FML - Fast Multiply - only defined to work with single precision
FRD - Fast Reverse Divide - only defined to work with single precision
MUF - Multiply
POL - Polar Angle
POW - Power
RDF - Reverse Divide
RMF - Remainder
RPW - Reverse Power
RSF - Reverse Subtract
SUF - Subtract

The unary operations are...
ABS - Absolute Value
ACS - Arc Cosine
ASN - Arc Sine
ATN - Arc Tangent
COS - Cosine
EXP - Exponent
LOG - Logarithm to base 10
LGN - Logarithm to base e
MVF - Move
MNF - Move Negated
NRM - Normalise
RND - Round to integral value
SIN - Sine
SQT - Square Root
TAN - Tangent
URD - Unnormalised Round

CMF<condition><precision><rounding> <fp register 1>, <fp register 2>
Compare FP register 2 with FP register 1.
The varient CMFE compares with exception.

CNF<condition><precision><rounding> <fp register 1>, <fp register 2>
Compare FP register 2 with the negative of FP register 1.
The varient CMFE compares with exception.

Compares are provided with and without the exception that could arise if the numbers are unordered (ie one or both of them is not-a-number). To comply with IEEE 754, the CMF instruction should be used to test for equality (ie when a BEQ or BNE is used afterwards) or to test for unorderedness (in the V flag). The CMFE instruction should be used for all other tests (BGT, BGE, BLT, BLE afterwards).


When the AC bit in the FPSR is clear, the ARM flags N, Z, C, V refer to the following after compares:
N = Less than
Z = Equal
C = Greater than, or equal
V = Unordered

When the AC bit in the FPSR is clear, the ARM flags N, Z, C, V refer to the following after compares:
N = Less than
Z = Equal
C = Greater than, or equal
V = Unordered

And when the AC bit is set, the flags refer to:
N = Less than
Z = Equal
C = Greater than, or equal, or unordered
V = Unordered


In APCS code with objasm, to store a floating point value, you would use the directive DCF. You append 'S' for single precision, and 'D' for double.



Here is a brief example. We MUL two numbers, but use the floating point unit instead of the ARM's multiplication instruction. This could be modified to multiply two floating point numbers, and give a floating point response, but as it is only a short example, it will simply use two integers.

REM >fpmul
REM Short example to multiply two integers via the
REM floating point unit. Totally pointless, but...

DIM code% 20

FOR loop% = 0 TO 2 STEP 2
  P% = code%
  [  OPT loop%

     FLTS   F0, R0
     FLTS   F1, R1
     FMLS   F2, F0, F1
     FIXS   R0, F2

     MOVS   PC, R14

INPUT "First number  : "one%
INPUT "Second number : "two%

A% = one%
B% = two%
result% = USR(multiply)

PRINT "The result is "+STR$(result%)
There is no option to download this program, as standard BASIC won't touch it. However, you can include FP statements if you can 'build' the instructions.
Alternatively, you could use ExtBASasm by Darren Salt.

This version will work in BASIC:

REM >fpmul
REM Short example to multiply two integers via the
REM floating point unit. Totally pointless, but...

DIM code% 20

FOR loop% = 0 TO 2 STEP 2
  P% = code%
  [  OPT loop%

     EQUD   &EE000110   ; FLTS F0, R2
     EQUD   &EE011110   ; FLTS F1, R1
     EQUD   &EE902101   ; FMLS F2, F0, F1
     EQUD   &EE100112   ; FIXS R0, F2

     MOVS   PC, R14

INPUT "First number  : "one%
INPUT "Second number : "two%

A% = one%
B% = two%
result% = USR(multiply)

PRINT "The result is "+STR$(result%)
Download this example


Remember to use the appropriate precision for what you are doing.

REM >precision
REM Short example to show how data can be 'lost' due
REM to using incorrect precision.


DIM code% 64

FOR loop% = 0 TO 2 STEP 2
  P% = code%
  [  OPT loop%

     EXT 1

     FLTS   F0, R0
     FIX    R0, F0
     MOV    PC, R14

     FLTD   F0, R0
     FIX    R0, F0
     MOV    PC, R14

     FLTE   F0, R0
     FIX    R0, F0
     MOV    PC, R14

A% = &1ffffff

PRINT "Original input is " + STR$~A%
PRINT "Single precision  " + STR$~(USR(single_precision))
PRINT "Double precision  " + STR$~(USR(double_precision))
PRINT "Double extended   " + STR$~(USR(doubleext_precision))
The result of this program is:
Original input is 1FFFFFF
Single precision  2000000
Double precision  1FFFFFF
Double extended   1FFFFFF
You don't need to use double precision everywhere, though, as it will be that much slower. Simply keep this in mind if you are dealing with large numbers.


In order to test the actual speed differences, I wrote a test program:

DIM code% 64

FOR loop% = 0 TO 2 STEP 2
  P% = code%
  [  OPT loop%

     MOV    R0, #23
     MOV    R1, #1<<16

     FLTD   F0, R0
     FLTD   F1, R0
     MUFD   F2, F0, F1
     SUBS   R1, R1, #1
     BNE    timetest

     MOV    PC, R14

t% = TIME
CALL code%
PRINT "That took "+STR$(TIME - t%)+" centiseconds."
I tried various precisions, and also the fast multiply. It showed something interesting. So I tried multiplication, and addition. All with the same data (input 23).


Here are my results for a million (roughly) convert-and-process operations. I've just timed my RiscPC and the times were MUCH slower - so I'm not entirely sure which system the timings below relate to - it did say "ARM710 processor, FPEmulator 4.14" but I doubt that...

   Operation        Fast single   Single        Double        Double extended

   Multiplication   1731cs        1755cs        1965cs        1712cs

   Division         2169cs        2169cs        2618cs        2479cs

   Addition         n/a           1684cs        1899cs        1646cs
This seems to show that double extended precision is the fastest on my machine for a selection of operations. Thus, it is incorrect to simply assume more complexity takes longer time. My personal suspicion here is the internal format is double extended, thus working directly with it entails no loss due to converting the value to a different precision.

Why do I doubt the above experiments? Simple. Here are the results for an ARM710 RiscPC using FPEmulator 4.14 (1.07Mz):

   Operation        Fast single   Single        Double        Double extended

   Multiplication   112cs         112cs         110cs         111cs

   Division         138cs         139cs         153cs         159cs

   Addition         n/a           108cs         107cs         106cs
These results seem more consistent, so... :-)

The moral here? Don't be afraid to experiment...

Return to assembler index
Copyright © 2004 Richard Murray