The "aperture effect" and alsiing in sampling

Doug Kerr · Oct 18, 2013

Representation by sampling

In the classical model of the representation of a continuous variable (such as the voltage of an audio waveform) by sampling, we capture, at regular intervals, the instantaneous value of that variable. From this series of values, we can precisely reconstruct the entire original "waveform" if that waveform contains no frequency components whose frequencies are at or above half the frequency at which we take the samples.

When we do this electrically, we in effect open a gate for an infinitesimal time at every sampling instant to allow the instantaneous voltage right then through to be captured. Of course, is the gate is open. We can still capture it with, for example, and amplifier followed by a storage capacitor. We can then "measure" this voltage at our leisure. (In a digital system, we will in fact develop a digital description of the value and "send that on" to the distant end for use in reconstructing the original waveform.)

Suppose that for whatever reason, we open the gate each time for a finite duration, and that the circuit is such that the value that is "captured" is the average of the voltage over that duration. We can show that this is precisely equivalent to the original case (with an infinitesimal sampling time) but where the waveform is first passed thorough a certain type of "low-pass" filter. The frequency response of that filter is exactly that of a filter whose time response is a "spread function" that is constant for the sampling duration (and we can determine that response by taking the Fourier transform of a "rectangular pulse" of that duration).

That frequency response turns out to have the classical "sine x over x" form. The parameter x here is frequency, scaled by a factor based on the sampling duration.

One implication of doing this is that, after we "reconstruct the"original" waveform from the samples (perhaps conveyed in digital form), the signal as been afflicted by a "drooping" frequency response. Thus, we must use an "post-reconstruction equalizer" that restores all the frequency components to their original relative magnitude (else the restored audio waveform will have a "high-frequency rolloff").

This phenomenon is often called "aperture distortion", where "aperture" refers to the sampling process as a window (aperture) of finite time width through which the signal voltage is observed.

Aliasing

We were reminded earlier that for the sampling process to work properly it is necessary that the waveform being sampled contain no frequency components at or above half the sampling rate (a frequency limit called the "Nyquist frequency"). If this is not true, the "overfrequency" components are not just lost. Rather, they appear in the reconstructed waveform as components whose frequency is as far below the Nyquist frequency as their original frequency was above it. These spurious components are a source of "corruption" in the reconstructed waveform.

This phenomenon can be called "foldover distortion" (because of what happens to the frequency of the component, which is "folded" about the Nyquist frequency)s or, commonly, "aliasing".

The premise of that name is this. If the sampling rate is 8000 kHz, and thus the Nyquist frequency is 4000 Hz, then a component of the original signal at 4100 Hz will have exactly the same representation in the train of samples as a component with a frequency of 3900 Hz. That is, it travels "with the papers of" a 3900 Hz component - it travels under an "alias".

To avert this, we in general must filter out of the signal, before it is samples, all frequency components at or above the Nyquist frequency . The lowpass filter used for this purpose is often called an "antialising filter".

If we have a finite "sampling window", then the aperture distortion effect in effect provides us with the equivalent of a pre-sampling lowpass filter. Can that be our antialiasing filter? Not actually. The response of this virtual filter does not decline rapidly enough (and after, at a certain frequency, it drops to zero, it then rises again).

Still, in many cases of digital audio, we use a less-than-ideal antialising filter for many reasons, and it is not out of the question that, to simplify system design, we might use a very wide sampling duration and accept the resulting virtual lowpass filter as our "antialising filter".

Now, to digital photography

Of course, digital photography also involves the representation of a continuous phenomenon (in this case, the color of the image) with discrete samples. The principles I discussed above all basically apply, but there are many matters that change the details of their implications.

Let us for a moment assume a "monochrome" sensor.

The matter of aliasing is still with us. If in fact the variation of luminance (the only aspect of the color of the image that is pertinent, given that I have assumed a monochrome sensor) contain spatial frequency components at spatial frequencies higher than the spatial Nyquist frequency (determined by the pitch of the sensor grid), then the delivered image will have luminance components not in the original image. If the image also has components not too far below the Nyquist frequency, then the improperly-reconstructed components will interact with these other components to produce, visually, "beats" (which we describe as moiré patterns).

To avert this, we would apply some type of optical lowpass filter (working in the spatial frequency domain) before the image is sampled by the sensor array.

Our sampling aperture

Our sampling organ is the array of detectors. To attain theoretically ideal sampling, each would have to only respond to the illuminance over an infinitesimal region of the image. But then the luminous energy captures would be infinitesimal. We cannot amplify it before by the photodetector, so this will not work.

So we in fact strive (with such tools as microlenses) to equip each detector to accept luminous energy from as large a region of the image as possible, usually a region that is nearly as large as suggested by the spacing between photodetectors. This is a very large sampling aperture.

This of course gives us a serious "aperture distortion" problem, in fact the major cause of the decline in the MTF of the sensor itself with increasing spatial frequency.

But, so it shouldn't be a total loss, can this declining spatial frequency response in fact be used as our antialising filter? Its response is hardly ideal, but then we get it free.

Well, just as in simple digital audio systems, an ill-suited lowpass filter may be better than none. And thus in fact, in many monochrome cameras, there is no overt antialiasing filter. We just use the frequency response caused by our "large sampling aperture" to do that for us, such as it is.

The same if true, for the same reason, of many "true tricolor sensor" cameras, such as video cameras with three sensors, or a still camera with a sensor such as the Foveon type.

Next: with a CFA sensor.

Best regards,

Doug

Doug Kerr · Oct 18, 2013

We can think of a color filter array (CFA) sensor as being devised this way (I will assume the "Bayer" pattern).

Imagine a final sensor with a sensel pitch of 10 µm.

We begin by making three photodetector arrays, each one equipped with a spectral filter of the kinds we describe (rather imprecisely) as "R", "G", and B".

They each have photodetectors on a grid of pitch 20 µm in both directions. However, for the "G" sensor, the detectors are staggered, so that their density per unit area is four times that for the detectors in the "R" and "B" sensors.

We could use these, with a beam-splitting arrangement, in the manner of the "three chip" color video camera. Its geometric resolution would be 50 pixels per mm in both vertical and horizontal directions for the "R" and "B chips and 100 px/mm for the "G" chip (this curiosity is complicated to fully describe).

Fujifilm camera fans of a few years ago may have already had to do through that mental exercise with regard to their famous "diagonal array" sensors.

But we don't do that. Rather, we carefully align the three arrays so that no photodetector of one "sensor" falls in the same spot as a photodetector on another "sensor" (and the Bayer array allows that). Then we collapse all three sensors onto one (each detector carrying its own little piece of its sensor's "color filter"). And we now have out CFA sensor.

How might we use it

We might use the three subarrays to each generate a "color layer" of our image, the "R" and "B" ones each with a geometric resolution (in the vertical and horizontal directions) of 50 px/mm, the "G" one having a geometric resolution, in the vertical and horizontal directions, of 100 px/mm).

We thus could develop an image whose geometric resolution was perhaps 50 px/mm (but, since the "G" layer is so prominent in the reckoning of luminance, perhaps its "luminance" resultion was 100 px/mm, and its chromaticity resolution of only 50 px/mm).

How re really use it

But we would like the whole image to have a pixel resolution of 100 px/mm.

So we take these "layer images" and for each, through some scheme of "interpolation", determine a best estimate of the values in each layers for all the pixel positions not actually equipped with photodetectors - the "demosaicing" process.

We now have three "color layers", with geometric resolutions of 100 px/mm, which we encode in some fashion (perhaps JPEG).

The interpolation process assumes that the representation of each "color aspect" of the image (R, G, B) is properly represented by the collection of samples delivered by the photodetectors of that subset of the CFA sensor.

Well, that will be so theoretically, if, for that "color aspect" of the image, there are no spatial frequency components at frequencies at or above the Nyquist frequency - which in this case (in the vertical and horizontal directions) is 25 cycles/mm. (The sampling frequency is 50 cy/mm, since the photodetector pitch is 20 µm.)

But if our lens is any good, there will be such higher-frequency components in the image. Thus the set of outputs of any given subset of the sensor does not properly represent the structure of that color aspect of the image. It will in fact imply (through "aliasing") a layer with spurious components, not present in the layer proper.

If we take this erroneous representation of a color layer of the image (with a geometric resolution of 50 px/mm) and by interpolation, "uprez" it to 100 px/mm, of course that will be erroneous as well.

And so when we use that as one layer of our "delivered" image, that image will contain possibly serious color errors (erroneous in perhaps luminance and/or chromaticity.

It is important to note here that these errors do not emerge when "uprez-ing", during demosaicing, the low-geometric-resolution image delivered by a subset of the photodetectors. The errors were there in the low-geometric-resolution image. Thus to say that "aliasing in the case of a CFA sensor leads to errors in the interpolation (demosaicing)" is misleading. The errors come from the fact that each layer is "undersampled" by the "sparse" collection of photodetectors devoted to that layer.

A bush-league antialising filter?

We noted before that the "virtual lowpass filter" effect caused by a non-infinitesimal sampling window (sampling aperture), while hardly ideal as an antialising filter, could nevertheless do a modest job of that duty, often "sufficient" in a monochrome or true-color sensor. Will that be true in the case of a CFA sensor?

Well, in a typical monochrome sensor, the sampling aperture might be, say, 0.8 times the photodetector pitch, which is 0.8 times the image pixel pitch (and the decline in the frequency response of the virtual lowpass filter, with respect to the Nyquist frequency, is determined by that factor, 0.8).

In a Bayer CFA sensor, we might assume a sampling window of 0.8 times the photodetector pitch, which is 0.4 times the image pixel pitch. As a result, the decline of frequency response of the virtual lowpass filter, with respect to the Nyquist frequency (determined by the pixel pitch), is much slower. And thus the chance that we can get away with just using this response as our antialising filter is much less.

Thus the common use of an actual overt antialising filter in cameras with a CFA sensor.

Best regards,

Doug

Doug Kerr · Oct 18, 2013

Much of the published literature on the "aperture effect" in sampling is confusing because the author does not take sufficient care to distinguish between two different "contexts".

In what follows, I will assume that the waveform being sampled does not contain any frequency components at or above the Nyquist frequency.

Generally, when we begin to teach the concepts of reconstruction of a waveform by sampling, we start with a model that is in no way digital. We "sample" the signal with some kind of gate that "opens" periodically (when we begin the discussion, usually for an infinitesimal duration), thus creating a train of (electrical) pulses. We then feed those into a reconstruction filter, and voilà, out comes a precise copy of the original waveform.

There is always a scaling complication, as only a small fraction of the power of the original signal is preserved (the rest being discarded while the sampling gate is closed). In fact, in the limit, as the sampling duration approaches zero (the real premise of our early presentations), the power retained approaches zero, and thus at best the power in the reconstructed waveform has zero power, and thus will have zero amplitude. But we are able to overcome this intellectually.

If, getting practical a bit, we consider sampled pulses of finite duration, there are several forms they can have. Two common ones are:

a. They can have exactly the waveform of the original waveform during the entire "sampled pulse duration" - and have a zero value otherwise. (This is often called "natural sampling".)

b. They can have a fixed value during the entire "sampled pulse duration" - that value being the instantaneous value of the original waveform at some instant, perhaps at the start of the sampled pulse duration - and have a zero value otherwise. (This is often called "natural sampling".)

Again, in our situation where we immediately feed the train of sampled pulses to the reconstruction filter, we have two different results.

• With sampling type a, we find that the reconstructed waveform is (other than for the scaling matter I mentioned) identical to the original waveform. In particular, that means that the relative amplitudes of its different frequency components are the same as for those components in the original waveform. This does not depend in any way on the actual duration of the sampling window and sampled pulses!

• With sampling type b, we find that in the reconstructed waveform, the higher frequency components are relatively attenuated compared to lower-frequency components (one manifestation of "aperture distortion"). The nature of this "rolloff" depends on the duration of the sampled pulses.

Next, in our lecture, we move beyond this model and move to the one that is almost always of actual interest. In this:

v. The value of the original waveform is "captured" at periodic intervals.

w. Those values are given digital numerical representations.

The finite precision of that is the cause of an imperfection in reconstructing the waveform, "quantizing distortion", but I will ignore that here.

x. The set of those digital representations is stored and/or transported to a distant place.

y. Then and/or there, for each original sample, the digital description of its value is used to generate a pules of that amplitude.

z. The train of those reconstructed pulses is fed into a reconstruction filter, out of which comes a waveform that is something like the original waveform.

Now, lets look at some possible variations in the details of two of these steps.

v. There are two important ways we can "capture" the value of the waveform:

v1. We can (essentially) capture its instantaneous value at precisely the defined sampling instant.

v2. We can capture its average value over a small interval, perhaps beginning at the defined sampling instant.

Notice I do not mention "pulses" in this, even though they may exist in the actual circuitry, since this is only a conceptual stage. Thus there are not any such notions as "natural sampling" or "flat-top sampling". Those terms only apply in the model where the pulses generated by the sampler are immediately fed into a reconstruction filter. It is here that many textbook presentations cause us to go off the rails.

y. There are two important ways we can generate a pulse for each sample value:

y1. We can (essentially) generate a pulse of infinitesimal duration. A problem with this is, in the limit, that pulse train itself has infinite bandwidth, and is thus hard to handle with real circuitry, and the train contains no energy, so it is hard to do anything with it.

y2. We can generate a pulse of significant duration.

Now, if we do v2, the effect on end-to-end reconstruction of the waveform is as if we had a lowpass filter with a certain response ahead of the sampling process. This is one manifestation of the concept of "aperture distortion".

Then is we do y2, the result on end-to-end reconstruction of the waveform is as if we had a lowpass filter with a certain response after the reconstruction filter. This is another manifestation of the concept of "aperture distortion".

The two together affect the "frequency response" of the whole end-to-end process of converting the waveform, and we may well wish to introduce, at the end, an equalizer that will restore "uniform" frequency response to the whole process.

How does this apply to digital photography?

In digital photography, where we represent (here a two-dimensional) "waveform" through sampling and then digital representation, there are some key differences from the models I described above. But the basic principles indeed apply.

The sampling is essentially done in mode v2, since the sampling "window" is that of the acceptance area of the photodetector, which we intentionally make as large as possible.

In reconstruction, the process is essentially y2, since for each pixel value, we generate a spot of almost the pixel width in size in our display.

Now where is the reconstruction filter? There is none. Instead, we rely on the virtual low-pass filter created by the finite "reconstructed pulse duration" (here the spot size) to be a bush-league reconstruction filter. (This is the same as using the detector window size to be a bush-league antialiasing filter.) The shortcomings of this ploy are the cause of another ailment in image reconstruction called "display aliasing". But that is for another time.

Best regards,

Doug

The "aperture effect" and alsiing in sampling

Doug Kerr

Well-known member

Doug Kerr

Well-known member

Doug Kerr

Well-known member