Floating point

Parts of this documentation has been taken from the ARM Assembler manual.
Please note that this describes the floating point implementation used with RISC OS and not the VFP provided with the more recent ARM processors (ARMv5 etc).
A standard ARM floating point instruction set has been defined, so that the code may be used across all RISC OS machines. If the actual hardware does not exist, then the instructions are trapped and executed by the floating point emulator module (FPEmulator). The program does not need to know whether or not the FP coprocessor is present. The only real difference will be speed of execution.
If you are interested in the coprocessor aspect, read the document on coprocessor access.
Note: If a real FPU is attached, this will pick up on the unrecognised instructions and
do the work itself. However in the case of systems such as the ARM7500FE, the work of the
floating point unit is shared between hardware (for instructions like MUF
(multiply)) and software (for instructions like LGN
(log. to base e)).
There is also an FPSR (floating point status register) which, similar to the ARM's own PSR, holds the status information that an application might require. Each of the flags available has a 'trap' which allows the application to enable or disable traps associated with the given error.
The FPSR also allows you to tell between different implementations of the FP system.
There may also be an FPCR (floating point control register). This holds information that the
application should not access, such as flags to turn the FP unit on and off. Typically, hardware
will have an FPCR, software will not. Do not attempt to use the FPCR  some parts of it
are readsensitive, so your reading it will affect the rest of the system (like FPE!).
FP units can be software implementations such as the FPEmulator modules, hardware implementations
such as the FP chip (and support code), or a combination of both.
The "most original" example of a 'both' that I can think of is the Warm Silence
Software patch that will utilise the 80x87 chip on suitably equipped PC coprocessor cards as a
floating point processor for ARM FP operations. Talk about resource sharing...!
The results are calculated as though it were infinite precision, then they are rounded to the
length required. The rounding may be to nearest, to +infinity(P), to infinity(M), or to zero.
The default is rounding to nearest. If a tie, it will round to nearest even.
The working precision is 80 bits, comprising of a 64 bit mantissa, a 15 bit exponent, and a sign
bit. Specific instructions that work with single precision may provide better performance in
some implementations  notably fullysoftwarebased ones.
The FPSR contains the necessary status for the FP system. The IEEE flags are always present, but the result flags are only available after an FP compare operation.
Floating point instructions should not be used from SVC mode.
The FPSR is laid out as follows:
31 24 23 16 15 8 7 0 System ID  Trap enable  System control  Exception flags
The defined system IDs are:
&00 Old FPE  FPE module prior to v4.00 &80 FPPC  Interface between ARM and WE32206 (AT&T MAU) &01 FPE 400  FPE module v4.00 or later &81 FPA  ARM FPU
The Trap enable byte is:
23 22 21 20 19 18 17 16 Reserved INX UFL OFL DVZ IVOIf an exception flag bit is set following an operation, and the corresponding trap enable bit is set, then the exception trap will be taken.
The System control byte is:
15 14 13 12 11 10 9 8 Reserved AC EP SO NE ND(these bits have no meaning on Old FPE and FPPC systems)
7 6 5 4 3 2 1 0 Reserved INX UFL OFL DVZ IVOWhenever an exception condition arises, the appropriate cumulative exception flag in bits 0 to 4 will be set to 1. If the relevant trap enable bit is set, then an exception is also delivered to the user's program in a manner specific to the operating system. (Note that in the case of underflow, the state of the trap enable bit determines under which conditions the underflow flag will be set.) These flags can only be cleared by a WFS instruction.
S 
single
D 
double
E 
double extended
P 
packed decimal
EP 
extended packed decimal (if enabled)
Rounding modes are:

nearest (no letter required)
P 
plus infinity
M 
minus infinity
Z 
zero
Because the floating point system is a little complex, we shall quickly look at how floating
point typically operates.
As we all know, a computer can use bit patterns to represent numbers. Older machines could easily
handle between 0 and 255, or 0 and 65535. The ARM processor can easily handle between zero and
4294967295. A 64 bit processor can easily handle between 0 and 18446744073709551615.
Obviously, by using other techniques, any system can store all sorts of numbers  BBC BASIC
running on a 6502 didn't crash if you told it to count to 257, for example.
As the data widths get larger, and the numbers that can be handled in one go get larger, this
still does not help with the simplest case of PI...
But... there's a solution. An eightbit processor can handle numbers larger than 255 by using
the simple formula ( (high_byte x 256) + low_byte ). Okay, things
are more complex, but you get the idea.
So, if we want to store PI (we'll take PI as being 3.14159265), then why don't we simply store
the number 314159265, and alongside it we'll store also a value saying that the 'real' decimal
point should shift eight places to the left.
PI = 314159265.0 [<8] = 31415926.5 [<7] = ... = 3.14159265In the above example, I've shown it after the first shift of the decimal point, just to give you a visual idea of how it works.
IEEE Single Precision
31 30 23 22 0 Sign  Exponent  (msb) Fraction (lsb)
IEEE Double Precision
31 30 20 19 0 First word: Sign  Exponent  (msb) Fraction (lsb) Second word: (msb) Fraction (lsb)
To a nonmathematician (such as myself!), the system does not appear to make an awful lot of sense, for example:
DIM code% 64 FOR opt% = 0 TO 2 STEP 2 P% = code% [ OPT opt% ext 1 MVFS F0, #0.5 ; or whatever other value... STFS F0, store MOV PC, R14 .store DCD 0 ; only one word  we're using single precision ] NEXT CALL code%Examining the memory location gives:
0 &00000000 exponent = 0 fraction = 0 sign = 0 0.5 &40000000 exponent = 9 fraction = 0 sign = 0 1 &3F800000 exponent = 254 fraction = 0 sign = 0 2 &08B4000D exponent = 34 fraction = 13 sign = 0 5 &40A00000 exponent = 2 fraction = 2097152 sign = 1 10 &41200000 exponent = 4 fraction = 2097152 sign = 1
Continuing, working up some code with the 'printlib' example supplied with the C/assembler development software, we arrive at:
0.123456 is 1.2346E1 9.99996 is 1.0000E1 0.0999998 is 1.0000E1 0.999997 is 1.0000E0 0.0 is 0 9.99999E99 is 1.0000E100This apprarently does make sense  only not to me  so I cannot explain it! :)
LDF<condition><precision><fp register>, <address>
STF<condition><precision><fp register>, <address>
Store floating point value.
The address can be in the forms:
LFM
and SFM
These are similar in idea to LDM and STM, but they will not be described because some versions
of FPEmulator do not support them. The FP module in RISC OS 3.1x (2.87) does, as do (I think)
later versions. If you know they your software will only operate on a system that supports SFM,
then use it. Otherwise you'll need to 'fake' it with a sequence of STFs. Likewise for LFM/LDF.
FLT<condition><precision><rounding> <fp register>, <register>
FLT<condition><precision><rounding> <fp register>, #<value>
Convert integer to floating point, either an ARM register or an absolute value.
FIX<condition><rounding> <register>, <fp register>
Convert floating point to integer.
WFS<condition> <register>
Write floating point status register with the contents of the ARM register specified.
RFS<condition> <register>
Read floating point status register into the ARM register specified.
WFC<condition> <register>
Write floating point control register with the contents of the ARM register specified.
Supervisor mode only, and only on hardware that supports it.
RFC<condition> <register>
Read floating point control register into the ARM register specified.
Supervisor mode only, and only on hardware that supports it.
Floating point coprocessor data operations:
The formats of these instructions are:
The binary operations are...
ADF 
Add
DVF 
Divide
FDV 
Fast Divide  only defined to work with single precision
FML 
Fast Multiply  only defined to work with single precision
FRD 
Fast Reverse Divide  only defined to work with single precision
MUF 
Multiply
POL 
Polar Angle
POW 
Power
RDF 
Reverse Divide
RMF 
Remainder
RPW 
Reverse Power
RSF 
Reverse Subtract
SUF 
Subtract
The unary operations are...
ABS 
Absolute Value
ACS 
Arc Cosine
ASN 
Arc Sine
ATN 
Arc Tangent
COS 
Cosine
EXP 
Exponent
LOG 
Logarithm to base 10
LGN 
Logarithm to base e
MVF 
Move
MNF 
Move Negated
NRM 
Normalise
RND 
Round to integral value
SIN 
Sine
SQT 
Square Root
TAN 
Tangent
URD 
Unnormalised Round
CMF<condition><precision><rounding> <fp register 1>, <fp register 2>
Compare FP register 2 with FP register 1.
The varient CMFE compares with exception.
CNF<condition><precision><rounding> <fp register 1>, <fp register 2>
Compare FP register 2 with the negative of FP register 1.
The varient CMFE compares with exception.
Compares are provided with and without the exception that could arise if the numbers are unordered (ie one or both of them is notanumber). To comply with IEEE 754, the CMF instruction should be used to test for equality (ie when a BEQ or BNE is used afterwards) or to test for unorderedness (in the V flag). The CMFE instruction should be used for all other tests (BGT, BGE, BLT, BLE afterwards).
When the AC bit in the FPSR is clear, the ARM flags N, Z, C, V refer to the following after
compares:
N =
Less than
Z =
Equal
C =
Greater than, or equal
V =
Unordered
When the AC bit in the FPSR is clear, the ARM flags N, Z, C, V refer to the following after
compares:
N =
Less than
Z =
Equal
C =
Greater than, or equal
V =
Unordered
And when the AC bit is set, the flags refer to:
N =
Less than
Z =
Equal
C =
Greater than, or equal, or unordered
V =
Unordered
In APCS code with objasm, to store a floating point value, you would use the directive DCF. You append 'S' for single precision, and 'D' for double.
REM >fpmul REM REM Short example to multiply two integers via the REM floating point unit. Totally pointless, but... DIM code% 20 FOR loop% = 0 TO 2 STEP 2 P% = code% [ OPT loop% .multiply FLTS F0, R0 FLTS F1, R1 FMLS F2, F0, F1 FIXS R0, F2 MOVS PC, R14 ] NEXT INPUT "First number : "one% INPUT "Second number : "two% A% = one% B% = two% result% = USR(multiply) PRINT "The result is "+STR$(result%) ENDThere is no option to download this program, as standard BASIC won't touch it. However, you can include FP statements if you can 'build' the instructions.
This version will work in BASIC:
REM >fpmul REM REM Short example to multiply two integers via the REM floating point unit. Totally pointless, but... DIM code% 20 FOR loop% = 0 TO 2 STEP 2 P% = code% [ OPT loop% .multiply EQUD &EE000110 ; FLTS F0, R2 EQUD &EE011110 ; FLTS F1, R1 EQUD &EE902101 ; FMLS F2, F0, F1 EQUD &EE100112 ; FIXS R0, F2 MOVS PC, R14 ] NEXT INPUT "First number : "one% INPUT "Second number : "two% A% = one% B% = two% result% = USR(multiply) PRINT "The result is "+STR$(result%) END
Remember to use the appropriate precision for what you are doing.
REM >precision REM REM Short example to show how data can be 'lost' due REM to using incorrect precision. ON ERROR PRINT REPORT$ + " at " + STR$(ERL/10) : END DIM code% 64 FOR loop% = 0 TO 2 STEP 2 P% = code% [ OPT loop% EXT 1 .single_precision FLTS F0, R0 FIX R0, F0 MOV PC, R14 .double_precision FLTD F0, R0 FIX R0, F0 MOV PC, R14 .doubleext_precision FLTE F0, R0 FIX R0, F0 MOV PC, R14 ] NEXT A% = &1ffffff PRINT "Original input is " + STR$~A% PRINT "Single precision " + STR$~(USR(single_precision)) PRINT "Double precision " + STR$~(USR(double_precision)) PRINT "Double extended " + STR$~(USR(doubleext_precision)) PRINT ENDThe result of this program is:
Original input is 1FFFFFF Single precision 2000000 Double precision 1FFFFFF Double extended 1FFFFFFYou don't need to use double precision everywhere, though, as it will be that much slower. Simply keep this in mind if you are dealing with large numbers.
In order to test the actual speed differences, I wrote a test program:
DIM code% 64 FOR loop% = 0 TO 2 STEP 2 P% = code% [ OPT loop% MOV R0, #23 MOV R1, #1<<16 .timetest FLTD F0, R0 FLTD F1, R0 MUFD F2, F0, F1 SUBS R1, R1, #1 BNE timetest MOV PC, R14 ] NEXT t% = TIME CALL code% PRINT "That took "+STR$(TIME  t%)+" centiseconds." ENDI tried various precisions, and also the fast multiply. It showed something interesting. So I tried multiplication, and addition. All with the same data (input 23).
Here are my results for a million (roughly) convertandprocess operations. I've just timed my RiscPC and the times were MUCH slower  so I'm not entirely sure which system the timings below relate to  it did say "ARM710 processor, FPEmulator 4.14" but I doubt that...
Operation Fast single Single Double Double extended Multiplication 1731cs 1755cs 1965cs 1712cs Division 2169cs 2169cs 2618cs 2479cs Addition n/a 1684cs 1899cs 1646csThis seems to show that double extended precision is the fastest on my machine for a selection of operations. Thus, it is incorrect to simply assume more complexity takes longer time. My personal suspicion here is the internal format is double extended, thus working directly with it entails no loss due to converting the value to a different precision.
Why do I doubt the above experiments? Simple. Here are the results for an ARM710 RiscPC using FPEmulator 4.14 (1.07Mz):
Operation Fast single Single Double Double extended Multiplication 112cs 112cs 110cs 111cs Division 138cs 139cs 153cs 159cs Addition n/a 108cs 107cs 106csThese results seem more consistent, so... :)
The moral here? Don't be afraid to experiment...