Resolution, the MTF, and all that

Doug Kerr · Aug 29, 2013

Part 1

Introduction

Rigorously characterizing the "behavior" of any system can be very complex, and in detailed scientific and engineering work we have to deal with that complexity.

But in practical work, we have to adopt and use "metrics" of performance that can be readily grasped (perhaps naïvely), practically determined, and conveniently usable to hep guide our use of the system. Thus we speak of the brake horsepower of an auto engine, the recovery rate of a water heater, the bed load capacity of a pickup truck, or the high-frequency response limit of a pair of headphones.

Thus it is with the matter of the resolution of a camera system.

In this series of essays I will try and relate my take on the different ways this "property" can be expressed.

Background - the modulation transfer function

In an audio system, we can best intuitively grasp the nature of the "signal" as a waveform - the plot of instantaneous voltage vs. time, as we might see on an oscilloscope.

But we recognize that this signal can also be described by the spectral powered density plot - the "spectrum" of the signal, which is a plot that reflects to us in a rigorous way that the signal comprises components of many different frequencies (an infinite number in the case of a not really recurrent signal).

Now if we wish to examine the response of an audio amplifier to signals in general, one way is to plot the ratio of output voltage to input voltage vs. the frequency of the "component", probably normalizing it to the ratio for some nice mid-band frequency, such as 1000 Hz.

Now, with that plot in hand, how might we describe the "high frequency limit" of the amplifier's response? Perhaps the frequency at which the ratio of output to input reaches zero. Well, perhaps it never reaches zero (at least not at any frequency within the range of human hearing).

Well, OK then, suppose where the ratio drops to 0.001? Well, that is not a really useful finding.

In fact, for reasons of technical convenience, it is common to state as the high frequency limit the point at which the voltage ratio drops to 0.707 times its value at "mid-band" (the so-called "3-dB rolloff point".)

Does this completely describe the high-frequency behavior of the amplifier as perceived by the user? No. But it can be useful in making comparisons between different amplifier designs on a "practical" basis.

Now, in a photographic imaging situation, what we are really trying to capture (let me assume a "monochrome" system for convenience) is the varying luminance (point-by-point) of a two-dimensional projection of the "scene" regarded by the camera.

Let's for convenience consider the luminance along a "track" across the whole image. We could plot the variation in luminance along that line, a curve that is quite analogous to the "waveform" of an audio signal. The variation in the luminance (let's say for a repetitive test pattern) is analogous to the voltage of the audio signal.

As with the audio waveform, we realize that we can consider this "signal" to comprise a spectrum of different frequencies. In the case of the audio signal, these are "temporal" (time-based) frequencies, denominated in cycles per unit time (perhaps cycles per second, for which we have a named unit, the hertz.

In the optical case, these are "spatial" (space-based) frequencies. If we are considering the "scene" to be our "input" signal (into the camera), they are denominated in cycles per unit of angle (perhaps cycles per radian). But we generally think of this scene as projected (in the theoretical, not actual sense) onto the focal plane, so we can then think in terms of the unit cycles per meter (or cycles per mm).

One way to describe the "response" of the system (in the sense of interest to us here) is to consider the ratio of the variation in luminance reported by the system (at whatever output point we are concerned with - perhaps in the delivered digital image) to the variation of luminance in the scene itself.

That variation of luminance, quantitatively, is spoken of as the "modulation" of the luminance (and again, it is the property most analogous to voltage in the audio signal case).

The ratio of the modulation at the output point to the modulation at the input "scene" can be called for now the modulation transfer ratio. I will use the symbol M.

Now, if we consider an interchangeable lens camera, then we realize that the modulation transfer ratio is a function of (that is, depends on):

• The spatial frequency of the input "signal" of interest.

• The wavelength of the light of the input "signal"

• The model, and particular "copy", of the lens on board.

• The "zoom" setting (if applicable)

• The aperture setting.

• The location of the scene area in the overall field of view. (We often assume that the lens is rotationally-symmetrical in this regard, so we only think in terms of the distance from the axis.)

• Whether the modulation of interest is in the radial or circumferential direction.

Thus, we say that M "is a function of" those five independent variables". We can refer to the relationship as the modulation transfer ratio function.

But now we run into one of those little things in mathematical "practice" which causes us no trouble (except where it causes us some trouble).

If we have a situation in which the temperature of an oven (T) is dependent on the rate of gas flow into the burner (R) and the setting of the air shutter on the burner (A), in a consistent way, we can say that T is a function of R and A.

We might choose describe the relationship as the oven temperature function.

Now how do we describe the property T? Well, it is of course the temperature. But we can also say that it is the value of the oven temperature function. Or, carelessly, we can say it is the oven temperature function ("if the oven temperature function would be 400° F or greater for the user settings, the controller should shut off the burner").

Now back to the modulation transfer ratio function. Firstly, we have chosen to just call it the modulation transfer function, for short. Fair enough.

The "output" of the function is the quantity I designated M. What do we call it?

Well, it is the modulation transfer ratio (a function of perhaps five variables).

But in fact we call it the modulation transfer function.

But I thought that name applied to the relationship that determined M. It does. But because M is the value of that function, we call it by the name of the function.

Not really a good idea. But that's the way it it is done.

What is the usual "symbol" for the modulation transfer ratio"? "MTF"

[To be continued]

Doug Kerr · Aug 29, 2013

[Continued]

Part 2

Presenting the modulation transfer function

We saw in Part 1 that the modulation transfer function of an optical system refers to the way in which the modulation transfer ratio (which is analogous to the "gain" of an audio amplifier) varies with perhaps five variables, notably:

• The spatial frequency of the input "signal" of interest.

• The wavelength of the light of the input "signal"

• The model, and particular "copy", of the lens on board.

• The "zoom" setting of the lens (if applicable).

• The aperture setting.

• The location of the scene area in the overall field of view. (We often assume that the lens is rotationally-symmetrical in this regard, so we only think in terms of the distance from the axis.)

• Whether the modulation of interest is in the radial or circumferential direction.

We cannot of course display the function (as it applies in a particular situation) on a two dimensional graph.

So what we do is to "freeze" all but one of the independent variables (for example, if we are reporting on a particular lens model, based on the copy we have, that is inherent for two of the variables) and then plot the modulation transfer ratio vs. the "unfrozen" variable.

We might choose to plot on the same sheet the curve for several values of one or two of the frozen variables, by using different curves (the "family of curves" approach).

For our work here, the "unfrozen" variable is, of course, the spatial frequency of the modulation.

But we must note that the most common "MTF" plots we see, in connection with lenses, are done in a different way. There, the "unfrozen" variable is the distance of the area of interest from the axis (so we can easily see what is normally a decline in modulation transfer ratio (for any given spatial frequency) as we get further from the center of the image), with the spatial frequency "frozen" at two arbitrarily-chosen values (each reflected on a separate curve on the graph).

[To be continued]

Doug Kerr · Aug 29, 2013

[continued]

Part 3

Resolution - introduction

The concept of resolution (in the sense of importance to us here) originally referred to the ability of the human eye (by itself, or through an optical instrument) to "resolve" (that is, recognize as distinct) two objects (by implication each of quite small angular size) at a certain (typically small) angular spacing.

The concept has now been extended to describe the ability of an optical system, or a camera, to "preserve" (or "capture") "fine detail" in a scene.

Harking back to our discussion of the frequency context of a scene, and the "spatial frequency response" of an optical system (as perhaps reflected in its modulation transfer function) we can look forward and imagine that the "high spatial frequency response" of the system is somehow involved in the matter of "preserving or capturing fine detail". And that is in fact a good insight.

But the concept of the resolution of a camera largely unfolded before the concepts of spatial frequency and spatial frequency response were widely in hand. They instead unfolded along more pragmatic lines.

What finally became common (and is still of importance today) is the ideal of exposing a camera to several sets of parallel black lines on a white background, with different spacings. We would today say that they had differ fundamental spatial frequencies.

We would then examine the output of the camera (perhaps putting the negative under a microscope) and conclude which was the most closely-spaced set of test target lines that were - well, were what?

A. Maybe the most closely-spaced set of lines for which we could in any way see that we were looking at separate lines, and not just a uniform gray field.

B. Maybe the most closely-spaced set of lines that looked really nice.

C. Maybe (in a later era) the most closely-spaced set of lines for which (if we imagined a positive image, as on a test print) the ratio of the luminance at the center of the white spaces between the lines to the luminance at the center of the lines was at least - well, you pick a number. 10:1? 50:1?

In fact, approaches A and B are alive (I would hardly say "and well") today.

If we look at reports of various testing establishments, we often find the overall resolution of a certain camera reported in terms of "extinction resolution" (which turns out to be essentially the criterion of A, above) and "<some other kind of> resolution", which turns out to be essentially the criterion of B, above. There is usually some elaborate accompanying palaver that replaces "looks really nice", just to make it seem more scientific.

More rarely, we will find an assessment based on the MTF (modulation transfer ratio vs. spatial frequency), in which the spatial frequency at which the modulation transfer ratio drops to some fraction of its value at "low" frequencies is cited as indicative of the resolution of the camera. What fraction? Oh, that depends on the reviewer.

The unit of resolution

In the earliest practice, if we ruled black lines at a spacing of 1.0 mm on our test target, and perhaps the test setup was such so they had a spacing of 0.01 mm on the focal plane, and that was the "best resolvable" set of test lines, we would report the resolution of that camera as 100 lines per mm (often written 100 lpmm, not 100 l/mm, since in this country, we have always shied away from clear mathematical and scientific notation in anything seen by the general public).

We note on speedometers "100 kM/hr" but "60 MPH".

Work on facsimile systems and later on television led to a different outlook. Then, we realized that, in the vertical direction, if a TV display had 100 scan lines per mm of screen height, then the best theoretical resolution we could hope for for would be to resolve a pattern in which the black lines were 0.02 mm apart. This was described as 100 lines per mm (of course written as 100 lpmm) (where the black "line" was a "line" and the intervening white "line" was a line", just as they were handled by the rater scan process..

To render compatible these different conventions, it was decided that the "black lines 0.01 mm apart" should be described as "50 line pairs per mm" (where a "line pair" was now a black "line" and the adjoining white "line"). But of course that was written not as "50 lp/mm" but rather as as 50 lpmm (where the "p" means "pairs", or maybe "per"; well actually, both: pairsper).

The only unambiguous expression here relates to the concept of the modulation transfer function, where we can describe the "fineness" of a certain test pattern (or even a spatial frequency component in a more complicated image) as "75 cycles per mm" ("75 cy/mm"). Yes, that is the same as 75 lp/mm.

[continued]

Doug Kerr · Aug 29, 2013

[continued]

Part 4

The Kell factor

If we have a digital camera with a sensel layout (and thus a pixel layout, if we take the "highest resolution output) of 3000 pixels for the entire height of the frame, we may find that the observed resolution (perhaps the "looks pretty nice" one) is reported as perhaps 1200 line pairs per picture height.

Why not 1500?

The usual suspects today are:

• The antialiasing filter

• The demosaicing process

• President Obama

In fact the major reason is this:

To avert discussion of the demosaicing issue, I will assume a monochrome sensor.

Suppose we used a test target whose image on the sensor had 3000 lines over the picture height (counting both black and white lines). Suppose the image was aligned so that the black and white lines each fell exactly on a row of pixel detectors.

Then this image would be resolved very nicely. We have attained the "geometrical" resolution, that implied by the pixel detector pitch.

Now suppose before the next test we bump the camera, and now the image of the test pattern is aligned so that the black "stripes", and white "stripes" each lie on the boundary between two adjacent rows of pixel detectors.

This image would be rendered as a uniform gray region.

Thus, even in the absence of other degrading factors, it might be overoptimistic to predict the vertical resolution of the camera aa 1500 cycles per picture height. Sometimes it is, sometimes not.

This phenomenon came to the attention of Raymond D. Kell, a television researcher at RCA, around 1934. He conducted tests with various test patters, and concluded that overalls, for comrade scenes at random alignment, we should probably rate the vertical "resolution" of a TV system at perhaps about 65% the value that would be implied by the line pitch itself. In fact, larger values seem appropriate in the face of current technology. (This "discount" ratio is called the Kell factor.)

This finding is sometimes inappropriately confused with, or entangled with, the issue of the needed spatial bandlimiting of an image before discrete "sampling" to avert "aliasing", although the implications of the two are often in harmony.

#

Best regards,

Doug

Resolution, the MTF, and all that

Doug Kerr

Well-known member

Doug Kerr

Well-known member

Doug Kerr

Well-known member

Doug Kerr

Well-known member