The antialiasing filter - how does it work?

Doug Kerr · Jun 16, 2011

We often hear discussions of the antialiasing filter (AA filter) found in most digital camera sensor systems. Some say they wish there were none, or that they could have it removed. Some discuss that the filter in a certain camera is more of less "strong" than in some other camera. Some wonder whether we can "back out" the effect of the AA filter as part of deconvolution processing (and the answer is "hopefully not entirely").

We know that the purpose of the AA filter is to prevent or minimize the appearance of unwanted moiré patterns in our images. But how does it actually do that?

Here, I will give the theoretical background to this matter. As you might expect, I will lay the basis in a wholly different context, the digital representations of speech waveforms, as first commercially practiced in telephone transmission.

A speech waveform is continuous, which means that in any time period there are an infinite number instants at which it has a value. That's pretty worrisome if we aspire to put each of those values into the form of a digital number. Sounds hopeless, in fact.

But Messrs. Shannon and Nyquist demonstrated mathematically that:

If we have a signal all of whose frequency components have frequencies less than some frequency f(N), then, if we capture the instantaneous value of the waveform at regular intervals at a rate of 2*f(N), the resulting suite of values completely describes the waveform.

This means that from that suite of values, we can reconstruct the waveform. Not a close approximation of it, but exactly the original waveform.

Note that to actually attain that ideal goal, the values we capture must be infinitely precise. In reality we cannot do that (it would take numbers of infinite bit length), so we cannot actually reconstruct the waveform "exactly".

The sample is called "representation by sampling".

The frequency f(N) is called the "Nyquist frequency". It is half the sampling frequency.

The traditional upper limit of the transmission capability of analog transmission circuits in the interior of the telephone network was 3450 Hz (a number that came from the specific details adopted for a certain transmission method). We might think that in devising a digital transmission system that honored this same upper frequency limit, we might choose f(N) to be, say, 3500 Hz, resulting in a sampling rate of 7000 Hz. But for reasons that are at the very heart of this note, the decision was to "have a little more margin" and use a sampling rate of 8000 Hz. Then f(N) would be 4000 Hz, and in theory our scheme could accommodate any signal components whose frequency was less than (not equal to or less than) 4000 Hz.

No suppose we presented to a model system a signal comprising a single frequency at 4010 Hz. Would that signal just not be accommodated - would just not appear in the "reconstructed waveform" output? No., worse than that. It would be reconstructed as a frequency of 3900 Hz.

Why? Well, if we sample a 4100 Hz signal at a rate of 8000 Hz, we get exactly the same suite of values as if we sampled a 3900 Hz signal at the rate of 8000 Hz.

The "decoder", presented with this suite of values, might say, "Wow, this suite of values I am receiving could represent either a 3900 Hz signal or a 4010 Hz signal (actually, an infinity of others as well, at higher frequencies yet). Which should I deliver?"

Well, the decoder has been "promised" that all frequencies in the transmission will be less than 4000 Hz, so its decision is easy: deliver a signal at 3900 Hz.

But that is an error in reconstruction. If the 4010 Hz signal were indeed just one component in the actual waveform, its "replacement" by a component at 3900 Hz will give us a different waveform. From a perceptual standpoint, the delivered waveform is "distorted".

In fact this phenomenon is sometimes called foldover distortion, "foldover" meaning that signals whose frequency is above f(N) by a certain amount are "folded over f(N)" - that is, come out that same distance below f(N).

The phenomenon is also referred to as aliasing. The premise is that these "out of band" components travel as the series of values legitimately "worn" by an in-band component - they travel under an alias.

How can we prevent this? Basically, we need to be certain that the signal presented to the "encoder (where it is first sampled) does not contain any components at or above f(N). We do that with a low pass filter. This is often called the antialiasing filter.

Will we make it so its response is essentially uniform up to, for example, 3990 Hz and then "drops like a rock", becoming zero by the time we reach 4000 Hz? Such a filter is hard to implement, and unavoidably brings some undesirable side effect of its own.

And we don't have to do anything that drastic. We do not intend our digital system to transport any components higher in frequency than 3450 Hz. Thus we can have a filter whose response starts to "roll off" just above 3450 Hz and has fallen to nearly zero by 4000 Hz.

Decoding

How does the receiving end actually "decode" this suite of values into the reconstructed analog waveform? Sounds like a lot of clever decision making is required.

Nope. For each arriving digital word (describing the value at one instant), a simple D/A converter generates a voltage pulse of corresponding height. The train of pulses is fed into a low-pass filter with a cutoff frequency of - guess what - f(N), half the sample rate. And what comes out is the reconstructed waveform. Mirabile dictu!

This filter is often called the reconstruction filter. It's design is very carefully planned so it will have the optimum effect. (It actually also serves the purpose of an "equalizer" to overcome the nonuniformity of overall frequency response that results from phenomena we need not discuss here. No sense having two separate filters in cascade to do all this.)

[continued]

Doug Kerr · Jun 16, 2011

Part 2

IN DIGITAL PHOTOGRAPHY

In digital photography, we also take a "continuous" phenomenon and seek to represent it with discrete samples. Here the "signal" is the variation of color across the image generated by the lens.

For a while, to simplify the premises of discussion, I will assume a "monochrome" camera, so that we are only concerned with the illuminance of the image, and heed not consider the CFA sensor arrangement. These are not truly negligible maters, and I will return to them (a little) after we have discussed the basic concepts.

The image is continuous in that, for any given square millimeter of it, there are an infinite number of points at which it has an illuminance.

We of course do not capture it with an infinite number of photodetectors but rather with a finite array, thus sampling the illuminance at regular intervals of space.

But an important difference here is that we do so along two dimensions (usually vertical and horizontal) rather than along one dimension (time) as in the speech waveform case.

But again for the moment let me dodge the implications of that by speaking of only the variations in illuminant along a "horizontal track" - namely along the line in the image plane that runs along the centers of a horizontal row of photodetectors.

The variation in illuminance along that track is in fact a waveform. Its variation can be analyzed into terms of a distribution of components and different frequencies - here spatial frequencies rather than the temporal frequencies we consider in real-time electrical signal work. The unit of those frequencies is the cycle per millimeter. We often call that lines per millimeter, or sometimes line pairs per millimeter (so there is ample chance for a 2:1 confusion here and there).

The photodetectors along this track occur at a certain spatial rate (expressed in photodetectors per millimeter, which is of course a spatial frequency. This is of course the sampling rate we spoke of in the electrical signal case.

And so half that (spatial) sampling frequency is the Nyquist (spatial) frequency for the system. It is expressed in cycles per millimeter.

Now, suppose that the variation in luminance across our track contains one or more components at or above the Nyquist frequency. That means that one aspect of the variation of illuminance is "faster", or "finer", than the Nyquist frequency, as of course almost inevitably occurs in the variation of luminance along a track in the scene, and (hopefully) is the same in a track across the image (depending of course on the lens' MTF - its "frequency response").

Well, the result is the presence of components in the reconstructed stripe of the image that were not in the original image, their frequency being "folded" around the Nyquist frequency from their proper values.

When this happens in the entire two-dimensional realm of the image, what does that look like? Generally, as the visual phenomenon we call a moiré pattern!

Decoding

Before I proceed, I need to discuss how the "suite of values" conveyed by the digitized outputs of the photodetectors along one track is "decoded" into a continuous variation in luminance in what we see when the image is put up on the scree or printed. (Remember, in the digital image itself, it is only the suite of discrete values, one per pixel. And we can't see that.)

Lets' assume a screen display, at 1:1 pixel-wise. The display screen generates a "pulse" (luminous dot actually) based on the value of the data for each pixel. These are then passed through a (spatial) low-pass filter (the reconstruction filter). Where is that?

That occurs because the little dots are not in fact dots of zero diameter - they are little figures with luminance that declines in a certain way out to a certain distance. This in effect constitutes a spatial low-pass filter - the reconstruction filter in our overall system.

Is it frequency response ideal for the purpose? Afraid not. It just happens.

Nevertheless, (If we are not observing from too close to the screen) we do perceive what seems to be a "continuous" image.

When we view the image on a basis of more than one screen pixel per image pixel, each image pixel is blown out into a pattern of screen pixels whose falloff is such that it constitutes a spatial low-pass filter of more appropriate properties - still far from ideal.

[continued]

Doug Kerr · Jun 16, 2011

Part 3

BACK TO THE AFFLICTION OF ALIASING

The antialiasing filter

Of course, we know by now that the way to prevent the phenomenon of aliasing is to insure that there are no frequency components in the "signal" at or above the Nyquist frequency. We do that with an optical, spatial low-pass filter - our antialiasing filter.

How does it work?

Well, if we take an image and blur it, the higher-frequency components are diminished, or even eliminated. Thus the blurring "agent" becomes a spatial low-pass filter.

How do we tailor its spatial frequency response? By tailoring its spread function - the description of the spatial distribution of intensity that results when a "point on the original image" is presented to it. Can we make what we might think is an ideal low pass filter that way? No - even once we decide what we want that to be. Practical implementations are limited in their flexibility.

And what filter characteristic do we want? Suppose our Nyquist frequency is 100 cycles/mm (that means its pixel pitch is 200 px/mm). If this were being done by telephone engineers, we would say, "lets set 75 cycles/mm as the highest image spatial frequency we will commit to 'record' ". Then we can make a filter whose response, over the range 75-100 cycles/mm, falls nicely from about 1.00 to about 0.00.

Then, of course, tests of the camera will show (assuming a "perfect" lens, a maximum resolution (on an NTF basis) of a little over 75 cycles/mm.

Well, nobody would stand for that in a camera with a pixel pitch of 200 px/mm.

So compromises are made. And guess what, the response of the antialiasing filter that results is not down to zero at f(N) - not even close.

So we have some moiré patterns.

And still the resolution of the camera may be, say, 85 cycles/mm. And people who understand this (partially) say, "Well, that's that dumb antialiasing filter that denies us the real restitution potential of this sensor. Bummer! Isn't there some guy in Iowa who for $175 will take it out of our camera?"

Deconvolution?

We can think of the frequency response of an actual antialiasing filter to be the concatenation of the responses of an "ideal AA filter" (whatever that means - we might think one whose frequency response drops fairly sharply as we approach f(N) ) and an "unwanted" filter. We can accuse the latter of being responsible for denying us the full realization potential of our sensor system.

Conceptually, we could "back out" the effect of this second filter during deconvolution "improving" of our image.

Others here are much more familiar with the realities of this (Bart? Jerome?) and I leave to them to possibility take this thought further.

Two-dimensionality

I have so far been working with a "one-dimensional signal" - the profile of image illuminance along a stripe of the image. We of course are actually concerned with a two-dimensional image, which we will sample at a two-dimensional array of points (remember, for the time being, we are thinking of a monochrome camera.)

Many modern cameras have an antialiasing filter that deals separately with the two axes. That is, its point spread function is not the same for all directions of spread. In fact, it is likely made up of two filters, one with a "line spread function" that provides the low-pass filter property along a vertical column of sample points, and another that has a line spread function that provides the low-pass filter property along a horizontal row of sample points. This ameliorates the intrusion of the antialiasing filter with respect to tracks at an angle.

The CFA array

Of course, in almost all cases, we are dealing with a camera that provides a pseudo-full-color image using a CFA (color filter array). This complicates the matter of an antialiasing filter, and of the generation of moiré patterns.

A simplistic, but revealing, view of this is that for each of three "layers" of the image color description, we sample the image at a sample frequency less than the sensel frequency, but rather at half that frequency for the "R" and "B" layers and essentially at 0.707 that frequency for the "G" layer.

When we "demosiac" the image, a simple way is just to consider us to have lower-resolution color component layers, to interpolate between the samples to get what seems to be a full resolution layer, and go from there. In fact, the actual algorithms used are more clever than that, with the result that we get a "full-color" image whose pixel pitch is that of the sensel pitch and seems to exhibit the resolution that goes with that.

Grasping the impact of aliasing in this context is a bit tricky, and I will not attempt to lead us through that here. Suffice it to say that the artifacts are often "colored" moiré patterns.

#

Best regards,

Doug

Ben Rubinstein · Jun 16, 2011

I think I actually understood most of that! You must have dumbed it down too much if I can get it Doug...

EDIT make that after two slow readings.

Doug Kerr · Jun 16, 2011

Hi, Ben,

Ben Rubinstein said:
I think I actually understood most of that! You must have dumbed it down too much if I can get it Doug...

No, it actually wasn't dumbed down at all. It just wasn't made "unprofitably ornate".

I often find that for those who really want to understand these things, a "fully valid" explanation works best. Attempts to take short cuts often leave the reader who follows in some detail with a "how the devil can that be".

Glad you liked it.

Best regards,

Doug

Jerome Marot · Jun 17, 2011

What you wrote about the AA filter is perfectly correct, but this is not how they are used in practice. What you wrote would apply for a B&W array for which we would want to avoid artifacts caused by details finer than 2 pixels wide when projected on the sensor.

In practice these artifacts are not so ugly that we find them distracting and using a filter so as to lower spatial frequency down to Nyquist criteria would divide the resolution by 2. A 24 Mpix camera suddenly becomes a 6 Mpix camera. Nobody wants that.

So in practice AA filters are not used for B&W sensors (e.g. in infrared cameras or astronomy). They are not used in 3-CCDs video cameras either and they are not used with the Foveon sensor.

In practice, AA filters are exclusively used to reduce color artifacts produced by Bayer arrays. The theory behind Bayer array sampling is quite complex and need to take into account the response of the human vision system to explain why we see some artifacts and not others and why the degradation of resolution due to Bayer arrays subsampling is smaller on high contrast black and white line structures (e.g. a resolution target) than on low contrast colored subject. The AA filters are designed to take that into account. Removing them increases the apparent sharpness, but mainly when peeping at line transitions at the pixel level. When confronted at real fine details, the gains are less obvious.

Doug Kerr · Jun 17, 2011

Hi, Jerome,

Jerome Marot said:
What you wrote about the AA filter is perfectly correct, but this is not how they are used in practice. What you wrote would apply for a B&W array for which we would want to avoid artifacts caused by details finer than 2 pixels wide when projected on the sensor.

In practice these artifacts are not so ugly that we find them distracting and using a filter so as to lower spatial frequency down to Nyquist criteria would divide the resolution by 2. A 24 Mpix camera suddenly becomes a 6 Mpix camera. Nobody wants that.

I don't quite follow that concept. Suppose (for such a monochrome camera) the inverse pixel pitch is 100 px/mm. Thus the sampling rate is 100/mm, and the Nyquist rate is 50 cycles/mm.

Imagine an antialiasing filter that had a sharp cutoff at the Nyquist rate (not of course implementable, and problematical for other reasons, of course).

If we think of a sinusoidal test pattern at the Nyquist rate (a limiting case, of course, not actually quite workable), it would have a frequency of 50 cycles/mm. We think of this as having 100 "elements" (like "lines", in the video usage) per mm.

That is commensurate with the "geometric resolution" corresponding to the pixel pitch - not half that.

Of course we couldn't actually enjoy that resolution, on the average, because of the Kell effect (that is, the actual resolution enjoyed for such a limiting target in a specific "trial" would depend on the accident of its phase with respect to the sampling grid).

Even in the "audio waveform" model, the antialiasing filter does not cut the "time resolution" of the representation in half. If we have a sampling rate of 8000 Hz, the Nyquist rate is 4000 Hz, and we might imagine an idealized antialiasing filter that cut off at that frequency.

Thus the limiting case of a signal component that could be "carried" is 4000 Hz. For the ideal "phase accident", it would be sampled once at its positive peak and once at its negative peak. If we thought that, absent the antialiasing filter, the system would be able to represent (as the limiting case) an 8000 Hz signal, we would be misguided. In fact, such a signal, if sampled, would appear to have a frequency of 0 Hz (aliasing).

Suppose your "24 Mpx" camera had a frame layout of 6000 px x 4000 px, and frame dimensions of 36 mm x 24 mm. Its sampling spatial rate would be 166.7 samples/mm.

Its Nyquist frequency would be 83.3 samples/mm. Assume an antialiasing filter with that cutoff.

Forgetting about the antialiasing filter, we might simplistically expect such a sensor to be able to resolve a sinusoidal test target with a frequency of 83.3 cycles/hz. It would have (counting black and white "lines" separately) 166.7 lines/mm. Again, ignoring the Kell problem, this would match the pixel array.

Now we introduce the "ideal" antialiasing filter, with a cutoff at 83.3 cycles/mm. It would accommodate that test target.

So conceptually, an antialiasing filter does not cut the linear resolution of the system in half.

Now of course, a "practical" antialiasing filter is not so benign, and will reduce the "resolution" of the system to below its theoretical potential. But not likely to half. (We in fact often find resolutions reported for modern digital cameras that are on the order of 80% the pixel spatial rate, a typical "high" value for the Kell factor.

Thus we can reasonably attribute the fact that the observed resolution is less than the "geometric resolution (based on pixel spatial rate) mostly to the Kell effect, and not to any "cutting in half" cause by an (idealized) antialiasing filter.

This is not to quarrel with your observations regarding actual practice in a monochrome context, which I will now pick up.

So in practice AA filters are not used for B&W sensors (e.g. in infrared cameras or astronomy). They are not used in 3-CCDs video cameras either and they are not used with the Foveon sensor.

Your earlier observation that the aliasing artifacts are not, in reality, "ugly and distracting", is then probably the key to this practice.

In practice, AA filters are exclusively used to reduce color artifacts produced by Bayer arrays. The theory behind Bayer array sampling is quite complex and need to take into account the response of the human vision system to explain why we see some artifacts and not others and why the degradation of resolution due to Bayer arrays subsampling is smaller on high contrast black and white line structures (e.g. a resolution target) than on low contrast colored subject. The AA filters are designed to take that into account. Removing them increases the apparent sharpness, but mainly when peeping at line transitions at the pixel level. When confronted at real fine details, the gains are less obvious.

Makes sense to me.

Thanks for those very pertinent observations.

Best regards,

Doug

The antialiasing filter - how does it work?

Doug Kerr

Well-known member

Doug Kerr

Well-known member

Doug Kerr

Well-known member

Ben Rubinstein

pro member

Doug Kerr

Well-known member

Jerome Marot

Well-known member

Doug Kerr

Well-known member