Video compression

 

Introduction

The primary benefit of digital television, from the consumer point of view, is the quality of the picture. While sometimes you might see JPEG artefacts, in general the image will be clear and exactly as you'd like it to be. No ghosting, no snow, no sparklies, and none of the other image disturbances that affect the older analogue systems.
It comes at a price. While an analogue picture can 'fade out' as the quality of the signal deteriorates, the digital picture will only last as long as its error correction systems can cope. When the signal is too weak, the error correction will fail and so will the picture.
Note, also, that what constitutes a 'failure' in digital terms may be a reasonably watchable picture in analogue terms.

 

Resolution

The proper PAL transmission system utilises 625 lines horizontally, however for historical reasons a chunk of these are not used for the picture information - teletext and sync signals live in the unseen part.
For reference, SÉCAM uses 625 lines as well while NTSC uses 525 lines.

With PAL (or SÉCAM) you should be able to see 576 lines on your television. To give you an idea of how this compares:

It is hard to put a resolution on an off-air UHF broadcast because the primary cause of signal degradation is the broadcast itself!

The digital satellite system attempts to generate an image that is 576 lines in height, and 720 pixels wide, for a normal 4:3 broadcast. Widescreen is handled slightly differently.
The Digibox can output UHF, composite video, and RGB. Some models apparently have s-video too. Of these, you'll never get quality from a UHF connection as the signal is mucked around with too much. Many will have a composite video connection. It is good, but your Digibox won't fly. To make your Digibox fly, you need to feed the RGB directly into your television.

Please understand that when we talk about 'lines', we mean the resolvable quality of the system in use. Every full-frame picture will fill a TV screen, whether it is 576 lines or 275. This is why test engineers use a test pattern consisting of black and white bars of differing widths. The smaller they can see, the more 'lines', the better the quality.
I don't want you to think, as did one person who contacted me, that VHS - being of a low quality - is only capable of making a little picture in the middle of the screen!

Here is an example:

An example image, original quality.  An example image, reduced quality.
If we assume that the picture on the left is what we can obtain from a satellite receiver, then the picture on the right represents a VHS recording of the same thing.
In actuality, the picture on the right was created entirely digitally (and not by mucking with JPEG's quality options!).
The image was scaled down (50% vertically, 80% horizontally) and then scaled back up to the original size.
The difference, the loss in quality, reflects the loss of resolution you would experience with a system that is limited in how many 'lines' it can resolve. I'm sure anybody who has used video tape will be more than aware of this sort of thing...

For what it is worth, I only stopped on The Game Network long enough to take this one picture. I do not know the result of the competition shown in the image, but my guess as to the Famous Face would be Mr. Kofi Annan.

 

YUV

In analogue systems, the signal is broken into two parts. These are the chrominance (C) and the luminance (Y). Basically, this means the 'brightness' is separated from the 'colour' and they are transmitted separately. I suspect this has its origins in the birth of colour television in a place with lots of monochrome receivers. Even today, an old black-and-white valve television will quite happily show you Anna Ford on the BBC even though the television was manufactured before colour television existed!

The colour signals, in order to consume less bandwidth, are not the Red, Green, and Blue that you might expect. Instead they are the difference between the luminance and Red (called 'U') and the difference between the luminance and Blue (called 'V'). By knowing the overall luminance, and the differences between it and two of the colours, it is pretty simple to figure out what the value for the Green should be. It is the difference of all the other differences!

Let's see this in practice now...

We start with a picture:

Now we extract the luminance (Y):

This is what a black and white TV responds to.

By simple mathematics, we can take the luminance away from the original picture. This leaves us with the chrominance (C), or the colour information. The version on the right has been digitally inverted so you can see the detail.

 

From this point onwards, the colour information shown is in black and white. It is better if you view this as 'amounts of colour' rather than the colours themselves.
As before, on the right is an inverted copy of the image. This is to aid you in seeing detail in what might otherwise look like a black rectangle!

We must break the colour signal yet further. If we extract the red component, we can look at the difference between luminance and red, or 'U'. It looks like this (again, inverted on the right):

 

The final part that is broadcast is the difference between luminance and blue, or 'V'. It looks like:

 

By knowing the overall brightness, and the amount of red and the amount of blue, it is not difficult to calculate the amount of green...

 

So colour television receives the signals YUV (brightness, red difference, blue difference) and from this creates the RGB for the three cathode emitters in the television tube.

Likewise, the encoding used to send picture information to your Digibox, MPEG, uses YUV.

 

Data rate

We all know the television frames are received twenty five times per second (actually, it is fifty half-frames per second, interlaced; doing it like this reduces flicker).
If we do a little bit of multiplication, we can gasp in amazement at the data rate required to keep a raw digital signal intact second by second.

No? Okay, let's do it together. We have 720 pixels across the screen, 576 lines down the screen, 25 frames per second, three components of data, and eight bits per component.
720 × 576 × 25 × 3 × 8 = 248832000 bits
One second of image data requires just over 237 megabits, divide by eight to turn this into something we can understand. The result is 31104000 bytes, 30375 kilobytes, or just short of 30 megabytes per second. You'll be able to hold around twenty seconds of raw video on a standard CD-R, though you may encounter read difficulties due to having to extract 30Mb/sec. In comparison, a 52X reader operates at around 8Mb/sec flat out.
This, very obviously, isn't a workable arrangement!

 

4:2:2

I first came across this strange marking on a digital Beta camera. Here's what it means...
For each 'frame', we have 720 pixels and 576 lines. The frames are actually 288 lines in height as there is an odd frame (lines 1,3,5,7...) and an even frame (lines 2,4,6,8...) and these are sent in alternating sequence.

In the 4:2:2 system, the luminance (Y) part is sent out for each pixel (an effective resolution of 720 by 576). The chrominance (U and V) is only sent out for every other pixel (an effective resolution of 360 x 576). This works because the eye is much more sensitive to changes in luminance than to changes in chrominance, you can double up every other row without affecting the picture too greatly.

A diagram of 4:2:2 subsampling
This example diagram is, itself, subsampled at 4:2:2.
Using 4:2:2 coding, we can reduce the data rate to around 20Mb/sec.

 

MPEG

We hit a problem in that the maximum data rate from a transponder is around 34 megabytes. We can fit in a raw video stream, but only one. This provides us with no benefits over analogue. And just where do we think we'd put all those interactive thingies?

This is where MPEG comes in. It is a compression system that works in a number of ways. The first is the obvious data compression. Things such as this document that you are reading can compress extremely well.

This is a JPEG of Elena Sokolova. It has been created with '80% quality' resulting in an image file that is 11K in size.
The picture above is a Russian ice skater called Elena Sokolova. Taken from EuroSport, this picture is 11 kilobytes long when JPEG compressed - about the length of this document.
In uncompressed form, this exact same picture is 159 and a half kilobytes; over fourteen times bigger!
If you look at the picture carefully, you will see that while large parts of it use similar colours, the picture actually doesn't have any areas of flat colour as it is a real-life image and not a cartoon.
This is a JPEG of Elena Sokolova. It has been created with '100% quality' resulting in an image file that is 41K in size.
The image above is exactly the same, only this time it is compressed at 100% quality and not 80% as in the first. Can you spot the differences? If you know what you are looking for, then the answer is probably yes. Otherwise, they'll probably look the same.
This, second, picture of Elena is four times larger than the first.
But, wait...
This is a JPEG of Elena Sokolova. It has been created with '40% quality' resulting in an image file that is a mere 5K in size.
This third picture of Elena is 4732 bytes in size (a tenth of the size of the second picture, above). I regularly write emails larger than that!
But, here's the thing - can you see much difference? This time the picture is at 40% quality.
JPEG, like MPEG, is a 'lossy' system. When things are compressed using JPEG or MPEG, what you get back is not the same as the input. This is fundamentally different to things like zip archives where the input is exactly the output. MPEG (and JPEG) is smart in that it sort-of knows what it can best get away with. In the highest quality and lowest quality images of Elena, you might notice that the background (upper left) suffers quite a lot in the extra compression, while her eyes, nose, and mouth suffer a lot less.
I don't actually know how JPEG works (the maths does my head in!), however I would be inclined to suggest that areas of low contrast (similar colours) lose their resolution before areas of higher contrast.

You can get away with an awful lot of additional loss of quality when you're flashing the images at you 25 a second. You might have scrolled the screen up and down a few times to see what exactly the differences are. They're pretty easy to spot when you have them next to each other and lots of time to ponder over them. At a television refresh rate, all the lossage will blur into a visible picture. Freeze-frame a VHS tape some time, you'll see what I mean. The VHS still picture is quite horrible, but it isn't bad to watch it playing because all the yuck rolls together to become a usable picture.

Simply by compressing the image, we make it a fourteenth of the size of the original. By dropping the quality, we can easily make it a fourth of that (a 56th overall). We could push the quality down further to make the picture size smaller, up to a hundred-and-fortieth of the size of the original. Some things, such as the picture of Elena, cope quite well with the extra compression. Other things, like advert logos will start to break into very visible blocks if compressed too far.

That isn't the end of the story as MPEG works on a key-frame/difference principle. Every so often, say once a second, an entire frame is broadcast and the rest of the time only the differences are sent.
This, actually, depends a hell of a lot on the media being sent. For example, Fionnula Sweeney on CNN will be pretty good for a low data rate - it is a woman, at a desk, reading the news. The background of "CNN Center" is fairly static. Some frames might be compressible down to simply a sequence of 'lips moved', 'she blinked', 'lips moved', 'sombre little nod - this is serious news', 'lips moved'.
For an interesting look at this, please read the document about digital quality.
Other things, such as a James Bond movie or Alizée dancing around TOTP, will clobber the system. TOTP is a good example as often the production cuts between three or four cameras at an almost frenetic rate, and the cameras are invariably always moving, as is the subject (the dancer and the background dancers too!). With this arrangement, key frames will be required when the changes outnumber the reasonable, and for a fast-paced music show, they'll come a lot faster than once a second unless we're listening to 'Mad World'.
The same goes for sports events. In things like football or ice skating, there will be periods where practically nothing happens, and periods where it is nothing but movement (and, hence, key frames).

The decoder itself pays attention to the movement, and there is actually some intelligence going on here. If the data stream is interrupted during a small movement, there is a chance the Digibox will assume the logical and continue the movement on the same trajectory. So if a guy points at Milano on a weather map and the data-stream is interrupted, his hand might continue moving down to end up pointing at Roma!

More bizarrely, I was watching a kids programme on TV Andalucía the other day (I was bored, okay?), and the data got interrupted. These kids were painting and we must have cut to something that wasn't received. The picture (or the girls sitting at a big desk) started sliding around in a most peculiar way. It wasn't for a good second or so that the picture correctly updated to show a close-up following a brush on a white canvas. So the Digibox was obviously receiving the "move block here" commands, but never having received the new key frame it did the next logical thing and started rearranging the previous frame!

 

Conclusion

All together, the data-stream that started at well over thirty megabytes per second can be compressed down to a little over one or two megabytes per second, without us viewers really noticing any difference!

 


Prev:Fault finding and FAQ · Main index · Next:The symbol rate and FEC


Copyright © 2006 Richard Murray