Interlaced and progressive "scan" video

Doug Kerr · Aug 6, 2010

In connection with modern video technology, we have a new wave of encounters with the terms "interlaced scan" and "progressive scan". Not surprisingly, there can be considerable confusion arising in this regard. I thought I would speak a little about the concept involved and some of the notation we currently encounter.

Two separate issues

First, let me alert the reader to the distinction between these two matters:

• The format in which a signal is presented in storage or in transmission.
• The operation of the display mechanism.

In the basic analog broadcast television format, and with traditional (normally CRT-based) receiver displays, these were the same thing. That is, the signal was developed in real time in the camera (by a raster scan process), transmitted on-the-fly, and the received video signal was laid on the screen in real time with the same process.

Flicker

Suppose we were trying to establish a transmission format for broadcast television (as the NTSC was doing in about 1940). Suppose we had decided to use a certain number of lines in our image and a certain frame repetition rate (these having been chosen in the light of bandwidth considerations). Suppose the frame rate chosen was 30 frames/s.

Firstly, we often hear that the interlaced format (which I will describe shortly) was developed because a frame rate of 30 frames/s, used in a straightforward way, was not sufficient to give the human eye a suitable "temporal resolution" with regard to motion. That is not so. Motion pictures, for example, had for many years operated at a frame rate of 24 frames/s, which gave perfectly satisfactory motion resolution.

Rather, the issue was that, if we laid out the image on the screen using the obvious form of raster scan, painting a frame every 1/30 second, and considering that the decay rate of the phosphors had to be short enough to not "blur" motion, the human eye would perceive flicker from the "pulsation" of the overall luminance at any region of the screen.

(The same was true of motion pictures; the eye would perceive flicker at a frame rate of 24 frames/s. So in fact the projected image was interrupted at a rate of 48 time/sec by a rotating shutter. Thus each frame was projected once, the screen briefly blanked, and then projected again. Thus the pulsation of the overall light on the screen occurred at a rate of 48 Hz, which the eye would tolerate.)

Now, although improvement of the perceived motion resolution was not the driver for the adoption of the interlaced format, it turns out that, in many case, an improvement in motion resolution does occur. (In effect, during motion, we have a half-vertical-resolution image at double the "frame" rate.) But there can also be some unhappy artifacts of this. A discussion of this is beyond the scope of this note.

That approach would not work in the broadcast TV setting. For one thing, it would require that at the receiver, each frame be held, unchanging, on the screen for 1/30 second, and then the visible output of the screen would be blanked twice for each frame.

The interlaced format concept

Rather, the concept of interlaced scan was adopted. Here, over a period of 1/60 second, the camera scanned, sequentially, all the odd-numbered lines of the raster. These were sent out in real time. At the receiver, these lines were laid down (in real time) at a pitch of twice the image line pitch (essentially creating all the odd-numbered lines of the image). Then, in the next 1/60 second, the camera scanned all the even-numbered lines of the image, which were transmitted in real time. At the receiver, these were laid down at a pitch of twice the image line pitch, displaced by one line from the first pass, thus completing the image.

One subset of lines (odd or even) was called a field, while the entire set of lines was called a frame.

Thus the frame rate of the original NTSC format (before color) was 30 frames/sec; the field rate was 60 fields/sec.

Non-CRT television displays

In the modern era, in many non-CRT television display systems (even in the context of analog TV), an entire frame is built up in a frame buffer and then loaded into the display mechanism itself. That loading may be done in a "scan" fashion (rather than simultaneously), but the organization of that may not follow the field structure. (This is reminiscent of the various ways a digital camera sensor can be read out, or a digital instrument panel in a car driven.)

Progressive Scan

The term progressive scan implies that, in transmission, and/or in the working of a display mechanism (recall that these are separate issues), all the lines of a frame are laid down sequentially.

Digital High-Definition TV

The ATSC broadcast television standards (covering digital TV broadcast) make provisions for numerous transmission formats. These have a range of pixel dimensions, include transmission organized under both the progressive scan and interlaced scan paradigms (such as these appear in the context of digital video representations, such as the various MPEG forms), and different frame rates.

A popular transmission format, and the "highest" one commonly used today, is often referred to as just "1080i", implying 1080 lines, interlaced format. It is normally used at a frame rate of 30 frames/s, and, pertinent since it is an "interlaced" format, this implies a field rate of 60 fields/s. Sometimes this is shown as "1080i60" (in American practice, it is the field rate that is cited; for a non interlaced format, such as "720p", that is identical to the frame rate, but for an interlaced format, it is twice the frame rate). In European practice, it is the frame rate that is always cited, with a slight difference in presentation format as a cue to that. Thus the format above would be called, in the European convention, "1080i/30". (We will use that convention here for most technical descriptions.)

"1080i" television receivers

We often see a certain TV receiver advertised as "1080i". What does that mean? Usually it means that the receiver has a native display mechanism resolution of 1080 lines, and that the receiver can receive ATSC broadcasts in the 1080i format. Assuming that we are not speaking of a CRT-based display, The "i" does not mean that the display operates on an interlaced basis; modern TV receiver display chains always have a full frame buffer, and load that full frame into the display mechanism itself in a variety of ways.

1080p transmission

The latest update of the ATSC standard provides for a new transmission format (using a MP4 encoding) operating at 1080 lines, progressive format (that is, not interleaved), and a frame rate of 60 frames/sec.

The normal US technical description of this would be 1080p60 (somewhat ambiguous). The less-ambiguous European notation would be 1080p/60. A common "marketing" description of this format is "1080p" (totally ambiguous).

"1080p" TV receivers

We often read of TV receivers that are said to be "1080p". What does that mean?

Recall that this does not mean that their display mechanisms are "progressive scan". This is essentially meaningless for non-CRT TV receivers.

Normally, that designation means that they are prepared to deal with what can be called (using the unambiguous European notation) a "1080p/60" signal: 1080 lines of resolution, transmitted non-interlaced, 60 frames/sec. It is the 60 frame/sec display capability that is of importance in such a receiver; the receivers spoken of a "1080i" are usually only capable of displaying 30 frames (1080 lines each) per second.

The intent is that such receivers will be capable of exploiting a 1080p/60 signal, since it has been expected that such will be transmitted in the future as a "better motion" format, perhaps ideal for sporting events.

So, now that a 1080p/60 format has been included in the ATSC standard, and it is expected that transmission in this format will start to happen, will these sets be able to receive and display that?

For most such sets available at the present time, no. How can that be?

Because they have only MPEG-2 decoders, and MPEG-4 (H.264) encoding is prescribed by the ATSC standard for the new 1080p/60 format.

It is of course expected that new high-end TV receivers will provide for the ATSC 1080p/60 format.

Best regards,

Doug

Doug Kerr · Aug 7, 2010

Further thoughts on interlacing and such.

De-interlacing

In analog TV, which uses an interlaced format, two consecutive fields represent different time instants (in fact, each line represents a different time instant).

In interlaced digital formats, each field also represents a different time instant. However, all lines in it are creatures of the same instant).

In a modern TV receiver, the display itself is really neither interlaced nor progressive. But, if the received signal is in an interlaced format, for each fame time we need to create a full frame to feed to the display.

The trivial way is just to take the lines of two successive fields and gather them together, interleaved. But, since one field represents a different time epoch than the other, a more sophisticated approach is to use an "de-interpolation algorithm" (it reminds one a little of the CFA demosaicing issue). A guy named Yves Faroudja is the big wheel in that arena.

The 24PsF format

In the 24PsF format used in much digital motion picture work, the recording format is like that of 1080i/24, except that the two fields (called there segments to warn of the distinction) are creatures of the same epoch. That makes it easier to put them back together for ultimate use without Faroudja-work.

The motivation for using this format, rather than a straight "progressive" format, is that these signals can be very readily introduced into interlace-format chains, such as for broadcast work, since they provide two "fields" for each frame.

Film conversion

Initially, the NTSC television signal used a nominal frame rate of 30.00 frames/sec. Normal release motion pictures are at a frame rate of 24 frames/sec.

To allow the transmission of films over television ("telecine"), a clever ploy was used. One frame of the film was used as the source of three consecutive fields, the next frame for two consecutive fields, and next frame for three consecutive fields, and so forth. Thus, in one second, we had 60 TV fields (30 frames) carrying 24 film frames.

This scheme is referred to as "3:2 pulldown", pulldown referring to the movement of the film in the scanning apparatus from one film frame to the next. The term meant that the pulldowns occurred at intervals of 3, 2, 3, 2, etc. field times.

A disadvantage is that in de-interlacing, we develop frames half of whose lines come from one film frame and half from the next ("dirty frames").

The NTSC "compatible color" frame rate

The adaptation of the NTSC TV signal format to carry color images was done in a way called "compatible color", meaning that:

• Non-color receivers receiving a color signal would present a monochrome form of the image
• Color receivers, receiving a monochrome signal, could present it with no substantial change in operating parameters.

This was done by representing the image in "luma-chroma" form, where luma was a quasi-luminance coordinate and chroma conveyed the chrominance of the image. The luma coordinate was transmitted in the same way that the monochrome video signal (a gamma-precompensated luminance) had always been transmitted. Chroma was transmitted by phase-amplitude modulation (or quadrature amplitude modulation, if you wish) of a chrominance subcarrier, which was added to the luma signal, then composite of the two modulating the actual radio signal just as the monochrome video signal had always done.

(A monochrome receiver, not knowing anything about this, would just treat the luma signal as the video signal and present it. It would "ignore" the chroma signal)

For various reasons (beyond the scope of this note), to make everything work out properly (minimizing the perception of certain artifacts of the process), the frame rate had to be changed slightly. The new NTSC frame rate was (expressed to 6 significant figures) 29.9700 frames/s (usually stated as 29.97). (The precise nominal value is 30/1.001, which is 29.9700299700299700... .) The corresponding field rate is of course 59.9400 fields/s (usually stated as 54.94).

Telecine in color TV

The "3:2 pulldown" scheme for the transmission of motion pictures was retained, but given the new field rate, the average rate at which the film frames were processed became (to six significant figures) 23.9760 frames/s, usually cited as 23.976. Thus produced a 0.01% lengthening of the run time of the movie, and until pitch control systems were developed) a slight decrease in the pitch of the motion picture sound.

"De-telecine"

Modern digital TV receivers sometimes produce a clever scheme for dealing with motion picture material that has been introduced on a "3:2 pulldown" basis. As part of their de-interlacing process, they recognize when the signal has a "3:2 pulldown" origin, deduce the "cadence" of the ploy, and extract 24 clean image frames per second (or 23.976 if the digital transmission is at a frame rate of 29.97 frames/s). These are then used in an interpolation process to give a "smooth" stream of 30 frames per second (none of which are just a crude mix of two film frames). Again, Yves Faroudja is the guru of this.

29.976 frames/s

We sometimes read of a frame rate of 29.976 frames/s being used in certain "NTSC-related" video formats.

To the best of my knowledge this is a bum steer, perhaps started by someone who knew that 29.97 was a "rounded" version of the precise rate, wanted to state the value more precisely, and somehow inappropriately remembered the ".976" from the 23.976 color telecine film frame rate.

I fear that certain video encoders systems actually offer the possibility of a frame rate of 29.976 as a result of this misunderstanding.

(If any reader knows a legitimate premise for a frame rate of 29.976 frames/s, I'd be pleased to hear of it.)

Well, time for breakfast.

Best regards,

Doug

Doug Kerr · Aug 7, 2010

This really has nothing to do with interlace, but the premise for it was introduced in the last section of this series, so . . .

The SMPTE "drop-frame" time code

In video editing and the like, frames along the "project" are identified using an hour:minute:second:frame scheme. In some cases, this time marking is digitally embedded in the recorded project media. The system is called the SMPTE Time Code (the acronym referring to the Society of Motion Picture and Television Engineers, which promulgated the scheme).

Of course, the NTSC color TV format, with a frame rate of slightly over 29.97 frames/s, throws a real clinker into this system. If we proceeded in the obvious way, and let the frames subfield run from 0 through 29 (as if there were 30 frames per second), there would be a growing discrepancy between the time indicated in our editing systems (and on the recorded media) and real time (about 1.5 minutes per day). This could cause numerous problems in the actual delivery of TV programming (even perhaps, God forbid, the misreckoning of about 0.1% of billable advertising time).

An ingenious ploy was developed to finesse this problem. The basic plan is that the frames subfield runs from 0 to 29 for each second. But whenever the seconds subfield has the value zero (the "first second" of each minute), the frame subfield starts at 2 (not 0), running through 29 as usual. However, this is not done if the minutes subfield is evenly divisible by 10.

The result is an average of the span of the frame subfield of 29.97 (exactly) frames, very nearly rationalizing "SMPTE time" and "wall time". (In exactly ten minutes of SMPTE time, there would be 17982 frames, while in ten minutes of real time at the nominal NTSC frame rate there would be 17982.018+ frames - a residual discrepancy of about one part per million.)

The worst instantaneous discrepancy is about ±0.03 seconds.

The scheme is called "SMPTE 30-frame drop-frame time coding". It is of course not frames that are dropped, but rather frame numbers.

When SMPTE time notation is used on an actual 30 frame/s basis (or any other integral frame rate basis), the written or displayed notation normally looks like this:

hh:mm:ss:ff

As a hint that the drop-frame notation is in effect, it is customary to write or display the time this way:

hh;mm;ss;ff

For systems using displays that cannot present a semicolon, this notation is sometimes used:

hh.mm.ss.ff

Although I introduced the topic by mentioning video editing. it also applies to audio or music editing (when the result is to be synchronized with video). Many MIDI editors, for example, offer the option of giving the time index on an SMPTE code basis, with one option being the "30-frame drop-frame" form.

Best regards,

Doug

Interlaced and progressive "scan" video

Doug Kerr

Well-known member

Doug Kerr

Well-known member

Doug Kerr

Well-known member