Rick's b.log - entry 2020/08/09

mailto: blog -at- heyrick -dot- eu

Google Docs sucks donkey balls

I want to like Google Docs.

I think it is a great idea to have a centralised place to store documents, and an accessible word processor that can be used on pretty much any capable system.
Your phone? Your tablet? Your desktop PC? Doable.

The problem is, with every iteration and update to Docs, it just gets more and more rubbish.

I wrote over 90 pages of a fiction story last summer holiday, and Docs worked fairly well (so long as one juggles the app and the browser version for capabilities). The app now, on the same hardware (my freebie tablet) needs to be left alone for several minutes at start so it can sync stuff "in the background", otherwise it will simply end up non-responsive until Android offers to kill it.
Google seems to have a serious issue in understanding the difference between background stuff (which should be low priority) and foreground tasks the user wants to do (which should always take precedence). I wish I had a copy of last summer's APK file for Docs. I make backups before upgrading now, but back then I didn't imagine Google could break it so much.

Today, I had a request for Google Docs. I wanted to print out a source code running to around 270K. As I simply wanted to browse the code to try to work out how it does what it does (the point being to learn for myself, not to simply copy-paste stuff StackOverflow style). So, set it up as a two column format, drop the text size to something like 4 point (normal text that you see in books and magazines is somewhere between 10 and 12 point), and then make it Courier (monospaced), because it's code. I don't this is an unreasonable request to make in this day and age.

The app started, imported the text, and then had a nervous breakdown and crashed when I wanted to select all of the text.

So I had the app start and import the text, and then I switched to the web version of Docs. Now the depressing thing is that the web version of Docs is insanely more capable than the dedicated app, which is sort of "Docs Lite" at best.
Unfortunately, running a full word processor in a browser is a heavy task for it. I was using Firefox and I think my session can be evenly divided into thirds. The first third was the operation worked, after a brief delay. The second was Firefox popping up a warning about an unresponsive script. The third? The browser simply dies.
Now, bear in mind I'm using my Samsung S9 and not the tablet, so it's hardly a slouch.

The browser version of Docs allowed me to set up the number of columns in the page (the app cannot), and also to add a header (the app cannot) and a page number at the bottom (the app can do this). So I did.
Remember, when the browser crashes, it tosses session cookies so one needs to sign in all over again.
Eventually I got to the stage where I had two columns of 4 point Courier. It ran to about 28 pages.

Going to the app version, to check it looked okay on the print preview, it was insanely slow. So I asked it to export the document as a PDF.
After thinking about it for a while, it told me that it was unable to create a copy. Repeatedly.

So, back to the browser version. It did it, and fairly quickly too.

Something that I noticed was that there were numerous lines in the code that looked like // ************** only with many more asterisks. So I thought it would be useful to search for six asterisks and replace them with three to make these lines shorter and thus not wrap over. Sorting that out should drop the page count.

The browser version simply froze when asked to do that. Okay. Fair enough, doing that in JavaScript some five thousand times is probably a nightmare.
So to the app. I mean, it's just a search and replace, right?

Wrong. It froze too, leading to this after about two minutes of apparent inactivity:

Google Docs - Application Not Responding

After the fourth time (and something like eight goddamn minutes), I gave up. I called up the task manager and swiped the task away.
Enough is enough!

On to plan B. Which, for reference, is what I should have done right in the beginning.

I turned on the laser printer and put some paper into it.
Then I started up my Pi2. It is normally always on, but I turned it off because of a nasty thunderstorm at 4am on Saturday morning, and as more were forecast (that didn't happen!), I just left it off until today.

I started OvationPro. Now, this is a little (but capable!) desktop publisher written by one man (David Pilling) for use on the RISC OS operating system. RISC OS isn't capable of using multiple cores, being heavily based on the methodology of the operating system of the 8 bit 6502 based BBC Micro (early '80s). So it is not a particularly fast system when you're comparing a single 900MHz 32 bit core against the eight cores (64 bit, 4 ARM and 4 thumb, 455-2704MHz) on my phone that was once upon a time a flagship (but technology moves on). Indeed, the Pi2 under RISC OS has even less grunt than my freebie tablet.

OvationPro accepted and imported the text in a flash, not even remotely fazed that I just handed it a quarter of a megabyte of text. It was completely fluid in selecting all of the text and modifying it. As a desktop publisher tends to be a little more capable than a word processor, I was able to set the text to 5 point with 85% width (to make it a little larger for ease of reading, but reduce the width to have fewer line wraps). I set the leading (said like "led-ing", it's the space between the lines) to 10% (about half of the normal inter-line spacing) again to reduce the page count. Being a desktop publisher, this was all fully WYSIWYG and updating on-screen in real-time. And unlike Google Docs, what was on the screen was what would appear on the printer, and not a vague approximation ... use Docs for a while, you'll soon learn that Print Preview in the app, the website, and the eventual PDF/print are all different in subtle (and sometimes not so subtle) ways.

Then came the hard part. Find every "******" and replace with "***". How fast would OvationPro on the Pi manage? The answer is "too fast to measure without videoing the screen and counting frames". Yep, what slaughtered Google Docs was a breeze in OvationPro.

Why is this?

Well, David Pilling has been around the Acorn scene for a long time. It wouldn't surprise me if he didn't cut his teeth on writing stuff for the BBC Micro in 6502 assembler. His later (RISC OS) software is written in C with bits of assembler for stuff that needed speed on the 8MHz systems.
In other words, like most old-school programmers, he knows what he is doing.

I don't think the same can be said about Google's programmers. For sure they turn out some amazing products, but they also turn out some utter crap. A piece of software that utterly fails on a search and replace of this sort? Seriously?

Maybe, Google, you ought to have less bad-ass notoriously difficult questions on your recruitment test, and instead simply hire people who can tell the difference between a good algorithm and a bad one.

To ram home the point, I threw together a remarkably crap bit of code to do the same thing. It is written in BASIC and compiled using the ABC BASIC compiler. ABC isn't smart, it pretty much just does a 1:1 translation between BASIC and output. I do this because running the pure BASIC version would take forever, with most of that being overheads in the interpreter. Android apps get built on installation, so why should this be any different?

REM >starsearch
REM
REM Search for something, replace it with another
REM

SYS "Hourglass_On"
t% = TIME

REM Input the file
f% = OPENIN("input_file")
s% = EXT#f%
DIM b% s%

REM Do it the slow way
FOR l% = 1 TO s%
  b%?l% = BGET#f%
NEXT
CLOSE#f%

REM What to look for
find$ = "******"
REM What to replace it with (must be shorter)
repl$ = "***"

REM Now do the search and replace
REM This is a bit of a nightmare because BASIC does not
REM support arbitrary length strings, so we must pull data
REM out of the buffer and compare it.
REM
REM Since Google Docs is so rubbish, let's use a lame
REM algorithm that isn't even remotely optimal. ;-)

o% = 0           : REM Offset into data
sl% = LEN(find$) : REM Length of what we're looking for
p% = 0

REPEAT
  REM Read <search-length> characters from the buffer
  cmp$ = ""
  FOR l% = 1 TO sl%
    cmp$ += CHR$(b%?(o% + l%))
  NEXT

  REM Does it match?
  IF ( cmp$ = find$ ) THEN
    o% += 1

    REM Paste the replacement string there
    $(b%+o%) = repl$ : REM NOTE - WILL ADD A CARRIAGE RETURN :-/

    REM Now block-copy everything down to fill the gap
    REM This is painful, this will take a lot of time
    x% = o% + LEN(repl$)
    y% = o% + LEN(find$)
    REPEAT
      b%?x% = b%?y%
      x% += 1
      y% += 1
    UNTIL (y% = s%)

    REM Fudge offsets to skip over this
    o% += LEN(repl$) - 1
    s% -= (LEN(find$) - LEN(repl$)) : REM Don't forget the length!

    REM Running statistics
    p% = (o% * 100) / s%
    SYS "Hourglass_Percentage", p%
  ELSE
    REM Not a match, advance the pointer
    o% += 1
  ENDIF

UNTIL ( (s% - o%) <= sl% )

SYS "Hourglass_Percentage", 0

REM Now save the result so we can look at it
f% = OPENOUT("output_file")
FOR l% = 1 TO s%
  BPUT#f%, b%?l%
NEXT
CLOSE#f%

SYS "Hourglass_Off"
PRINT "Lame-ass process completed in "+STR$(TIME-t%)+" centiseconds"

When run in a TaskWindow (multitasking), it does the job in 8,098 centiseconds. Or a shade over 80 seconds.

There's a lot that could be done to improve this. One of the main ones - given that the memory shuffling is the part that takes time - is to optimise that. A way that could consume an amount of memory is to maintain two buffers. One is the original, the other the replaced; and simply step through the original text and write the results into the replaced buffer. This means that, essentially, you're trading memory for time. The copying will only be done the one time, but you'll need two buffers. Would probably run in about three-four seconds?
If using only the one buffer, then a real program will probably call out to an optimised function like strcmp() that ought to try to align what is being copied so it can do as much as possible using word accesses rather than byte accesses. Would need to be done by hand in BASIC, fun! ☺

The limitation of the replacement being smaller than the original is because BASIC's memory handling is really rudimentary and I didn't want to get bogged down with clever memory handling routines just to prove a point (remember, you'd need to determine how big the buffer needs to be at the end - not so hard in C where you can realloc(), but BASIC lacks that).

However, since my piece of rubbish on a much slower machine manages to outperfom Docs' search and replace, I think I'll let it stand as-is without any further optimising.

The entire job, between dropping the file into the RISC OS filesystem and having a wodge of 21 pieces of paper in my hand, probably took a quarter of an hour. A lot less than the time I'd wasted on Google Docs to get to the point of no-go-further.
Speaking of wasting time, between trying to do this, doing it, and then whinging about it on my blog, it's taken most of the afternoon. Okay, I wasn't doing anything else on this overheated Sunday, but it does make one wonder, when things that are supposed to help end up having the opposite effect...

Really, Google, how did it manage to go so horribly wrong?

PS: Don't even get me started on how on the tablet (quad core 32 bit 1.2GHz), the app in Print Preview mode can't even keep up with my typing (about 55 words per minute). That's deplorable.

Your comments:

Please note that while I check this page every so often, I am not able to control what users write; therefore I disclaim all liability for unpleasant and/or infringing and/or defamatory material. Undesired content will be removed as soon as it is noticed. By leaving a comment, you agree not to post material that is illegal or in bad taste, and you should be aware that the time and your IP address are both recorded, should it be necessary to find out who you are. Oh, and don't bother trying to inline HTML. I'm not that stupid! ☺ ADDING COMMENTS DOES NOT WORK IF READING TRANSLATED VERSIONS.

You can now follow comment additions with the comment RSS feed. This is distinct from the b.log RSS feed, so you can subscribe to one or both as you wish.

David Pilling, 9th August 2020, 17:03

Ancient programmer here - long ago, there was WordWise for the BBC Micro and someone mentioned that it worked by holding all the text in memory with a gap where the cursor is. So 16K memory with an 8K document held as 4K text an 8K gap and then the rest of the text. The result is that you can type at the cursor without having to shift upto 8K of memory after each keystroke.
Dunno what the technical term is - heap with a gap.
Anyway someone mentioned this - magazine, BBS, can't recall and having written a simple text editor for my own use at the time, which did shift memory around, I took the hint a few years later when writing Ovation.

They do not want to know how little time there was between turning on a BBC Micro and starting typing text into WordWise (granted it would be weeks later if loading from cassette tape was involved).

Gavin Wraith, 10th August 2020, 10:28

I liked Wordwise. What is the best datatype for storing text in an editor, if byte shuffling is to be minimized? The trouble is that the notion of "string" can vary from one system to another, with RISC OS doing everything using memory buffers, rather than, say, hash tables. As usual, choice of datatype depends on what you need to do with the data.

Rick, 10th August 2020, 10:40

The problem here is that the data format used should also be flexible.
The best example is the one given in the article - a simple search and replace was handled by OPro in the blink of an eye, while Docs failed to do it in some eight minutes (with the underlying OS offering to terminate the unresponsive task at least four times). Would it ever have done it? Or did it just get utterly confused? I don't know, but I think it's worth remembering that no amount of fancy and gloss is going to overcome bad programming and/or a poor algorithm.

Jeff Doggett, 10th August 2020, 13:21

I also read about the gap in memory for the text and actually used that method in a text editor that I wrote in the early days of the Achimedes - before the wimp came along. A module all written in assembly using the BASIC assembler.
I too despair at the appalling quality of modern software and the utterly imcompetant generation of programmers.

Jeff Doggett, 10th August 2020, 13:22

And I'm so incompetant that I can't spell incompetant!

David Pilling, 11th August 2020, 02:07

In the past I did see editors that use a linked list of chunks of memory, one for each line in the text. That's not a bad idea because an insertion is typically into a line, and you've only got to shuffle a small amount of memory.

I guess all these techniques would be invented in the 50s or 60s when the first editors were being written on systems lacking resources.

My YouTube channel

Names of things

(Felicity? Marte? Find out!)

Mom (1948-2019)

(and what went wrong)

Tiny (2004-2016)

Get Ovation DTP v1.55

📺 The SIBA stories 📹

List all b.log entries

Return to the site index

Search Rick's b.log!

PS: Don't try to be clever.
It's a simple substring match.

Last read at 17:27 on 2024/04/24.

[ b.log2 development version log ]

This web page is licenced for your personal, private, non-commercial use only. No automated processing by advertising systems is permitted.
RIPA notice: No consent is given for interception of page transmission.

Have you noticed the watermarks on pictures?

Read the explanation.

Next entry - 2020/08/10
Return to top of page

Retrieved from https://www.heyrick.co.uk/blog/index.php?diary=20200809 on 24th April 2024