The views expressed in this article are purely the opinions of the author, Richard Murray, and should not be taken as truth, fact, or resembling anything whatsoever. So there.
Nick Roberts (author of ASM) contacted me to tell me that "your entire argument is fatally flawed as far as timing is concerned, as your are not comparing the same thing in C and assembler.".
That is more or less correct. As far as I was concerned, it was pretty much the same thing. A routine to convert strings to lowercase. One implemented using C's standard string handling facilities, the other knocked out in assembler.
Nick says "The assembler version scores primarily because the characters are converted
after using 2 SWIs to set up conversion tables;" (though, note, it needs to set up these
same tables each time, it could be further optimised).
He continues to say "in your C example, each character invokes the penalty of an APCS-compliant procedure call, with all that implies as far as stack management is concerned.".
Nick informs me that the C compiler has a brain (my words, not his) and if I wrote lowercase() in the same method in C, the timing difference would go away. I've not tried this, but I suspect any differences would be minimal.
Which teaches us one important thing. Don't be over zealous. Provided you are not writing device
drivers, FP code, intensive stuff that doesn't optimise well, or in an interpreted language; you
won't have an awful lot of need to code stuff in assembler. If something is being really slow,
than maybe it is your implementation at fault?
The moral... Don't write everything in assembler because you can. By now you should know that assembler is harder to maintain (that comment will probably get me some flames, but you gotta admit that a sequence of opcodes is harder to follow than a line of code to generate a percentage, say), has a steeper learning curve, and while it offers a lot there is even more to learn. I'm doing exactly that right now!
Many thanks to Nick for writing to me.
Addendum (December 2003):
I decided to do this to death, to find out exactly how the timings were affected by different things, and exactly what tolower() actually did. Read all about it here....