Doug Kerr
Well-known member
We often hear discussions of the antialiasing filter (AA filter) found in most digital camera sensor systems. Some say they wish there were none, or that they could have it removed. Some discuss that the filter in a certain camera is more of less "strong" than in some other camera. Some wonder whether we can "back out" the effect of the AA filter as part of deconvolution processing (and the answer is "hopefully not entirely").
We know that the purpose of the AA filter is to prevent or minimize the appearance of unwanted moiré patterns in our images. But how does it actually do that?
Here, I will give the theoretical background to this matter. As you might expect, I will lay the basis in a wholly different context, the digital representations of speech waveforms, as first commercially practiced in telephone transmission.
A speech waveform is continuous, which means that in any time period there are an infinite number instants at which it has a value. That's pretty worrisome if we aspire to put each of those values into the form of a digital number. Sounds hopeless, in fact.
But Messrs. Shannon and Nyquist demonstrated mathematically that:
This means that from that suite of values, we can reconstruct the waveform. Not a close approximation of it, but exactly the original waveform.
The sample is called "representation by sampling".
The frequency f(N) is called the "Nyquist frequency". It is half the sampling frequency.
The traditional upper limit of the transmission capability of analog transmission circuits in the interior of the telephone network was 3450 Hz (a number that came from the specific details adopted for a certain transmission method). We might think that in devising a digital transmission system that honored this same upper frequency limit, we might choose f(N) to be, say, 3500 Hz, resulting in a sampling rate of 7000 Hz. But for reasons that are at the very heart of this note, the decision was to "have a little more margin" and use a sampling rate of 8000 Hz. Then f(N) would be 4000 Hz, and in theory our scheme could accommodate any signal components whose frequency was less than (not equal to or less than) 4000 Hz.
No suppose we presented to a model system a signal comprising a single frequency at 4010 Hz. Would that signal just not be accommodated - would just not appear in the "reconstructed waveform" output? No., worse than that. It would be reconstructed as a frequency of 3900 Hz.
Why? Well, if we sample a 4100 Hz signal at a rate of 8000 Hz, we get exactly the same suite of values as if we sampled a 3900 Hz signal at the rate of 8000 Hz.
The "decoder", presented with this suite of values, might say, "Wow, this suite of values I am receiving could represent either a 3900 Hz signal or a 4010 Hz signal (actually, an infinity of others as well, at higher frequencies yet). Which should I deliver?"
Well, the decoder has been "promised" that all frequencies in the transmission will be less than 4000 Hz, so its decision is easy: deliver a signal at 3900 Hz.
But that is an error in reconstruction. If the 4010 Hz signal were indeed just one component in the actual waveform, its "replacement" by a component at 3900 Hz will give us a different waveform. From a perceptual standpoint, the delivered waveform is "distorted".
In fact this phenomenon is sometimes called foldover distortion, "foldover" meaning that signals whose frequency is above f(N) by a certain amount are "folded over f(N)" - that is, come out that same distance below f(N).
The phenomenon is also referred to as aliasing. The premise is that these "out of band" components travel as the series of values legitimately "worn" by an in-band component - they travel under an alias.
How can we prevent this? Basically, we need to be certain that the signal presented to the "encoder (where it is first sampled) does not contain any components at or above f(N). We do that with a low pass filter. This is often called the antialiasing filter.
Will we make it so its response is essentially uniform up to, for example, 3990 Hz and then "drops like a rock", becoming zero by the time we reach 4000 Hz? Such a filter is hard to implement, and unavoidably brings some undesirable side effect of its own.
And we don't have to do anything that drastic. We do not intend our digital system to transport any components higher in frequency than 3450 Hz. Thus we can have a filter whose response starts to "roll off" just above 3450 Hz and has fallen to nearly zero by 4000 Hz.
Decoding
How does the receiving end actually "decode" this suite of values into the reconstructed analog waveform? Sounds like a lot of clever decision making is required.
Nope. For each arriving digital word (describing the value at one instant), a simple D/A converter generates a voltage pulse of corresponding height. The train of pulses is fed into a low-pass filter with a cutoff frequency of - guess what - f(N), half the sample rate. And what comes out is the reconstructed waveform. Mirabile dictu!
This filter is often called the reconstruction filter. It's design is very carefully planned so it will have the optimum effect. (It actually also serves the purpose of an "equalizer" to overcome the nonuniformity of overall frequency response that results from phenomena we need not discuss here. No sense having two separate filters in cascade to do all this.)
[continued]
We know that the purpose of the AA filter is to prevent or minimize the appearance of unwanted moiré patterns in our images. But how does it actually do that?
Here, I will give the theoretical background to this matter. As you might expect, I will lay the basis in a wholly different context, the digital representations of speech waveforms, as first commercially practiced in telephone transmission.
A speech waveform is continuous, which means that in any time period there are an infinite number instants at which it has a value. That's pretty worrisome if we aspire to put each of those values into the form of a digital number. Sounds hopeless, in fact.
But Messrs. Shannon and Nyquist demonstrated mathematically that:
If we have a signal all of whose frequency components have frequencies less than some frequency f(N), then, if we capture the instantaneous value of the waveform at regular intervals at a rate of 2*f(N), the resulting suite of values completely describes the waveform.
This means that from that suite of values, we can reconstruct the waveform. Not a close approximation of it, but exactly the original waveform.
Note that to actually attain that ideal goal, the values we capture must be infinitely precise. In reality we cannot do that (it would take numbers of infinite bit length), so we cannot actually reconstruct the waveform "exactly".
The sample is called "representation by sampling".
The frequency f(N) is called the "Nyquist frequency". It is half the sampling frequency.
The traditional upper limit of the transmission capability of analog transmission circuits in the interior of the telephone network was 3450 Hz (a number that came from the specific details adopted for a certain transmission method). We might think that in devising a digital transmission system that honored this same upper frequency limit, we might choose f(N) to be, say, 3500 Hz, resulting in a sampling rate of 7000 Hz. But for reasons that are at the very heart of this note, the decision was to "have a little more margin" and use a sampling rate of 8000 Hz. Then f(N) would be 4000 Hz, and in theory our scheme could accommodate any signal components whose frequency was less than (not equal to or less than) 4000 Hz.
No suppose we presented to a model system a signal comprising a single frequency at 4010 Hz. Would that signal just not be accommodated - would just not appear in the "reconstructed waveform" output? No., worse than that. It would be reconstructed as a frequency of 3900 Hz.
Why? Well, if we sample a 4100 Hz signal at a rate of 8000 Hz, we get exactly the same suite of values as if we sampled a 3900 Hz signal at the rate of 8000 Hz.
The "decoder", presented with this suite of values, might say, "Wow, this suite of values I am receiving could represent either a 3900 Hz signal or a 4010 Hz signal (actually, an infinity of others as well, at higher frequencies yet). Which should I deliver?"
Well, the decoder has been "promised" that all frequencies in the transmission will be less than 4000 Hz, so its decision is easy: deliver a signal at 3900 Hz.
But that is an error in reconstruction. If the 4010 Hz signal were indeed just one component in the actual waveform, its "replacement" by a component at 3900 Hz will give us a different waveform. From a perceptual standpoint, the delivered waveform is "distorted".
In fact this phenomenon is sometimes called foldover distortion, "foldover" meaning that signals whose frequency is above f(N) by a certain amount are "folded over f(N)" - that is, come out that same distance below f(N).
The phenomenon is also referred to as aliasing. The premise is that these "out of band" components travel as the series of values legitimately "worn" by an in-band component - they travel under an alias.
How can we prevent this? Basically, we need to be certain that the signal presented to the "encoder (where it is first sampled) does not contain any components at or above f(N). We do that with a low pass filter. This is often called the antialiasing filter.
Will we make it so its response is essentially uniform up to, for example, 3990 Hz and then "drops like a rock", becoming zero by the time we reach 4000 Hz? Such a filter is hard to implement, and unavoidably brings some undesirable side effect of its own.
And we don't have to do anything that drastic. We do not intend our digital system to transport any components higher in frequency than 3450 Hz. Thus we can have a filter whose response starts to "roll off" just above 3450 Hz and has fallen to nearly zero by 4000 Hz.
Decoding
How does the receiving end actually "decode" this suite of values into the reconstructed analog waveform? Sounds like a lot of clever decision making is required.
Nope. For each arriving digital word (describing the value at one instant), a simple D/A converter generates a voltage pulse of corresponding height. The train of pulses is fed into a low-pass filter with a cutoff frequency of - guess what - f(N), half the sample rate. And what comes out is the reconstructed waveform. Mirabile dictu!
This filter is often called the reconstruction filter. It's design is very carefully planned so it will have the optimum effect. (It actually also serves the purpose of an "equalizer" to overcome the nonuniformity of overall frequency response that results from phenomena we need not discuss here. No sense having two separate filters in cascade to do all this.)
[continued]