The "primaries" of a digital camera sensor

Doug Kerr · Mar 13, 2010

I have spent much of this past week looking into many layers of the colorimetric theory behind the behavior of a digital camera sensor and how we process its outputs. Many fascinating facts have come to light. Here's one that I find especially fascinating.

********

We generally label the three "channels" of a digital camera sensor R, G, and B. What does that mean?

We often speak of it in terms of the three channels "being sensitive to red, green, and blue light, respectively", understanding that those terms don't precisely describe kinds of light; after all, the notation is just for "labeling", and is not intended to imply a "specification".

But, even subject to that understanding, the description falls short. The "response" of a sensor channel is described by a curve extending over a substantial range of wavelengths. We can't really say that it is to some particular kind of light.

Perhaps we mean that, generally speaking, the "peaks" of the three curves fall at wavelengths we can reasonably associate (in the usual broad way) with the colors red, green and blue. And that is "sort of so" in most cases.

Since the sensor delivers three outputs that describe the color of the light on the sensor, it must operate in some sort of "color space", albeit perhaps not some standard one but rather a "private" one for the sensor. It would be what we call an "additive" color space. These describe a color in terms of the amounts of three "primaries" we would need to add together to create light of that color. The "RGB" family of color spaces are of course the most familiar examples of such, and in fact we might think that the use of the designations R, G, and B for our sensor channels implies that they operate in some type of RGB color space.

If that were actually true, then the would be a set of three primaries associated with it (whose chromaticities could be reasonably called, with the customary lack of precision, red, green, and blue).

Suppose we wanted to know, "well, just what are the three primaries of a certain camera sensor. If we know the response curves of the three channels of the sensor (DxO Labs is kind enough to publish those for many cameras, based on their laboratory tests), can we determine the three primaries of the sensor.

We can. But the results may be shocking. It turned out that in every case, the three primaries are imaginary. That means that they cannot be physically generated, and if they could be, could not be seen. They are mathematical fictions (although that does not prevent them from being the premises of a color space. (The same is true of the primaries of the CIE XYZ color space, widely used as the description of color for much scientific work.)

If we plot the chromaticities of these primaries on a CIE x-y chromaticity chart (the one on which we most often see plotted such things as the chromaticity gamut of a color space), we would find their dots all outside the "region of visible chromaticities".

How can I say with assurance, "in all cases" (meaning, for any sensor design)?

In an actual sensor design, the spectral sensitivity curves cannot have any negative values, not because of any mathematical rule, but because such would be impossible as the behavior of an actual sensor channel.

It then turns out that the primaries implied by a set of sensor spectral response curves that were "everywhere non-negative" will always be all imaginary. (I will spare you the proof.)

Now, how can we deal with this bizarre situation in working with the sensor outputs? Easily. It turns out that (assuming certain conditions are met by the sensor response curves), a description of a color in terms of any valid set of three primaries (even imaginary ones) can be mathematically converted ("transformed") into a description of the color in terms of any other valid set of three primaries, including (for example) the primaries of the sRGB color space or the primaries of the Adobe RGB color space. And doing so is part of the "development" of the raw sensor data.

Now here's the opposite face of this surprising situation.

Assume that we are designing a digital camera, and we only contemplate its output being in the sRGB color space. Why don't we just equip the three kinds of sensels with filters that will result in the three channel outputs being the description of the color in terms of the primaries of the sRGB color space (that is, the outputs would be ready to be turned into sRGB coordinates by merely applying gamma precompensation)?

We can't, because the sensor response curves needed to bring this about would have negative values for some ranges of wavelength, which as we mentioned before, could not actually happen.

Thus we could have, in a sensor design, one of these two but not both:

• The three response curves would be everywhere non-negative (a physically-realizable sensor).
or
• The implied primaries would be physically realizable.

Accordingly, we have no choice but to use a set of response curves that would lead to an implied set of "imaginary" primaries, and then transform the representation in those terms (given by the sensor outputs) into a representation in terms of a set of physically-realizable primaries, as in one of the color spaces in which we want our image output to be.

Fun stuff, wot?

StuartRae · Mar 13, 2010

Hi Doug,

I wonder if you'd be kind enough to amplify the following statement?

Since the sensor delivers three outputs that describe the color of the light on the sensor.......

It makes sense to me only if we're talking about a Foveon sensor, not for the more common CFA-type sensor. Surely for the latter there's no colour involved until after de-mosaicing?

Please be gentle with me.

Regards,

Stuart

Doug Kerr · Mar 13, 2010

Hi, Stuart,

StuartRae said:
It makes sense to me only if we're talking about a Foveon sensor, not for the more common CFA-type sensor. Surely for the latter there's no colour involved until after de-mosaicing?

This is an interesting issue, about which I probably should have said more in my "note". In the paper I am writing, I specifically deal with the dilemma in a preamble.

In a CFA sensor, although we do not deploy the three types of sensel to a single pixel site (as would happen in a Foveon sensor), this does not change the story about the implications of the meaning of the three kinds of outputs - and the implications of the three spectral responses - insofar as color representation is concerned.

We can for example look at a CFA situation as being one in which three "aspects" of the image color are sampled at different rates and with different phases, whereas in a Foveon sensor the three aspects are sampled at the same rate and the same phase. But in each case, we can ask the same questions about the three "organs".

One way to dispose of uneasiness about that is this way:

1. Assume that the test scene image has a substantial region of "the same light" (a consistent color).

2. Assume that demosaicing in this case follows a trivial model: all the so-called "R" sensels over this region have the same outputs, and so we will use that as the the so-called "R" output of the sensor system for all the pixels spanned by the region. The same for all the so-called "G" sensels, and the same for all the so-called "B" sensels. (I say "so called" since, as is the point of my message, they are not related to primaries we can think of as "some sort of red", "some sort of green", and "some sort of blue".)

Now, having adopted that conceit, we can proceed to talk about the real topic as if the sensor were like a Foveon sensor - in fact, we must.

If we're not willing to take that outlook, we are completely out of luck insofar as thinking about sensor colorimetric response. There is probably no such thing as a demosiaced image still in sensor colorimetric coordinates, and if there is, we can't see it.

By the time we can put our hands on the demosaiced image (in an Exif JPEG file), it has been transformed into the sYCC color space (which, if we decode the Exif JPEG file, gets turned into the sRGB color space for "delivery" to our editor or a display chain). I suspect that the transformation actually occurs as part of the demosaicing (which I think you suspect as well, from your comments).

I think you'll be interested in my actual article on this (which is big!) It should be out within 24 hours. But it takes the outlook I mentioned above as to how to deal with the CFA issue (pretend it doesn't exist).

Best regards,

Doug

Doug Kerr · Mar 14, 2010

Hi, Stuart,

To add to my previous note:

In the area of concern I discuss, we are intersted in the response of the three different types of sensel, individually, and then in the joint implications of those three responses - as we find them.

That specific interest is not altered by the fact that, in a CFA sensor, we assign these three kinds of sensels to do their work on different street corners. ("Brophy, you count the tall people at 5th and Maple. Callahan, you count the old people at 6th and Maple. Berkowitz, women at 6th and Elm. Meet me at 5:00 and we'll figure out how many tall old women there probably are at each of those corners.")

But certainly (as I think you sensed) the demosaicing process is part of what happens "after the sensor" in a colorimetric sense, just like the more straightforward coordinate transformation that happens "after the sensor" with a Foveon-style sensor. It's just that in a CFA sensor, two processes - demosaicing and coordinate transformation are integrated.

That's one reason that the result of the so-called "demosaicing" process in a Canon camera is in the sYCC color space - the demosaicing process inherently yields luminance and chromaticity estimates for each pixel, not "tristimulus" value estimates (like R-G-B), either in the implicit "private" color space of the sensor or in sRGB.

Best regards,

Doug

StuartRae · Mar 14, 2010

Hi Doug,

Thanks for taking the time to explain. It's beginning to make more sense.
I'm still not entirely comfortable with the idea of a CFA sensor outputting three channels, but I'll wait until I've read the complete article and see how I feel then.

Regards,

Stuart

Doug Kerr · Mar 14, 2010

Hi, Stuart,

StuartRae said:
I'm still not entirely comfortable with the idea of a CFA sensor outputting three channels, . . .

Well, what we must live with is that it does give outputs that are of three "kinds", the outputs of the so-called "R" sensels, those of the "G1" and "G2" sensels, and those of the "B" sensels (Speaking of them is "channels" is a popular colloquialism, which I followed, but it might do more harm than good.) And of course we can observe those outputs in the raw data (maybe not exactly verbatim, as there may be some "pre-processing" done there - but that reality doesn't spoil the concept).

And the behavior of those three kinds of sensels, insofar as the output value of each kind when presented with light of a particular spectrum, have exactly the same colorimetric significance as if there were one of each kind at every "pixel location" in the array.

The fact that we cheat by only "sampling" the "R-ness", "G-ness", and "B-ness" of the light on the sensor at a lower frequency than "once per pixel", with different "sampling phases" for each, is of great importance, and influences what we must do with the entire suite of outputs (that is, including the aspect of "demosaicing").

But that does not wipe away the fact that:

• The four sets of outputs of the sensor proper (of three "kinds") do exist.

• The colorimetric implications of those outputs are exactly the same as if we had one of each type at every pixel location.

Let me try this homily. We have bars of steel, intended to be round, but whose cross section may not be, and in fact may not be the same at each point along its length (which is 150 mm).

We would like to characterize its cross-section, at intervals of 5 mm along its length, in terms of its "diameter" measured along three axes, at 120° intervals (diameters d1, d2, and d3). And we want to do that automatically, for each specimen, by putting it in a "nest" of diameter transducers (like mechanized micrometers).

But we can't put three of these micrometer sensors, oriented in three different directions, at any given location (such as at "stations" located every 5 mm along the length). So we put a "d1" oriented micrometer at location 2.5, a "d2" oriented micrometer at location 7.5 (in mm), a "d3" oriented micrometer at station 12.5, and so forth.

We now take the suite of outputs from these 30 micrometers (the each of types "1", "2", and "3"), and with a sophisticated mathematical algorithm, derive from that a best estimate of the "d1", "d2", and "d3" "diameters" of the bar at each of 15 stations along its length (at 5-mm intervals), and we deliver that as our "QC report" on the bar.

Now this fact, while greatly important as to where our final "report" comes from, does not alter in any way the fact that we have three kinds of micrometers, which measure different properties of the bar at their station, and whose behavior in that regard can be objectively defined.

I'll wait until I've read the complete article and see how I feel then.

You may be disappointed with regard to this issue. I do not give in the article (as it now stands) any extended justification for the fact that it is meaningful to speak of the colorimetric properties of three classes of sensels in a CFA sensor. I just dispose of the issue with a short paragraph at the beginning.

Finally, to close the loop, let me imagine that we did not choose to character the sensor array by characterizing the behavior of each of the three kinds of sensels in the way we do, but rather by characterizing the "colorimetric" behavior of the demosaiced data (which we could do).

But the demosaicing algorithm is an arbitrary one, and there isn't just one for each sensor (nor an inherent "correct" one for any set of sensor properties). There's the one for that sensor used in its camera, there's the one for that sensor used in each raw development software package, and so forth.

Now suppose we were to undertake the design of a demoscaicing algorithm to be used with a particular camera's sensor. One of the things we would need to know is the colorimetric behavior of the sensels, since it is their outputs (more-or-less) that we start with. So I ask a lab to determine that for me. How could they report it except by describing the colorimetric behavior of each kind of sensel? I warn them, "remember, there isn't a sensel of each kind at every pixel site". They say, "yes, we know." It would not be reasonable to ask them to report the sensor behavior in terms of the colorimetric properties of a demosaiced image produced by some arbitrary-chosen algorithm.

Of course, we might postulate, in pursuit of the outlook I think you are wrestling with, some "canonical" demosaicing algorithm that preserved the "pseudo color space" of the sensel outputs. For example, suppose that it just, for each pixel, developed "R", "G", and "B" values that were the average of the four "R", "G", or "B" sensels (respectively) nearest to the pixel site.

Then, that suite of data, an "R", "G", and "B" value for each pixel site, works in the same "pseudo color space" as the sensels themselves (considered as independent agents). By that I mean, if we had a many-pixel region of consistent color, then the "R", "G", and "B"" values we constructed for each pixel (with our primitive demosaicing algorithm) would be identical to the "R", "G", and ""B outputs of the respective classes of sensel across that region. And so a characterization of the sensels would be a characterization of the behavior of the sensor plus demosaicing algorithm.

I sense that you are struggling to find a way to characterize the colorimetric behavior of the sensor array other than in terms of the colorimetric behavior of the three classes of sensels - a way that somehow recognizes the matter of "demosaicing" that is to come later in the image processing chain. I don't yet visualize what that might be - other than to characterize the behavior of the sensor array plus demoscaicing algorithm. We can do that. But then it is not a description of the behavior of the sensor, even indirectly - since there are so many different demosaicing algorithms.

Best regards,

Doug

StuartRae · Mar 14, 2010

Hi Doug,

Thanks again.

What I was struggling with was reconciling my previous simplistic view with your three channels. I had imagined the output from the sensor to be one 'channel' of greyscale values which I liked to think of as luminance. (Am I technically allowed to call it that?)
But then of course we're measuring three kinds of luminance, just as your micrometers are measuring three kinds of diameter.
If we can get rid of the term 'channel' then I'm now comfortable with what you're saying.
I'm also comfortable with there being three types of sensel as long as we accept that the filter is part of the sensel.

Agreed that de-mosaicing has no part in this if we're talking purely about output from the sensor. I don't think I ever suggested that it should, or at least didn't mean to. What I said (meant) was that I couldn't accept a three channel output (thinking in terms of colour) from the sensor unless de-mosaicing were part of the process (and therfore implying that because it's not I couldn't).

Peace and contentment at last! i'll let my brain cool down and then celebrate with a small glass of malt.

Slainte mhath!

Stuart

StuartRae · Mar 14, 2010

Hi Doug,

As an aside, I was interested by your statement that

...the result of the so-called "demosaicing" process in a Canon camera is in the sYCC color space

I had always believed that the initial output from demosaicing is linear RGB, and that the conversion to the sYCC colour model only takes place in the JPEG compression process prior to application of the DCT. But then, of course, the only reason for in-camera demosaicing is to produce a JPEG (JFIF?) image (if only for the thumbnails embedded in the raw file), so why bother with an RGB stage?
However in order to produce the JPEG a gamma curve must be applied. Is it possible to do this to an image described by the sYCC model?
My brain's heating up again.

Does an external raw converter also demosaic into sYCC? If so it must convert to RGB for screen display and then back to sYCC if the image is saved as a JPEG. Is there any loss of quality incurred from converting from one colour model to another?

Regards,

Stuart

Doug Kerr · Mar 15, 2010

Hi, Stuart,

StuartRae said:
Hi Doug,

However in order to produce the JPEG a gamma curve must be applied. Is it possible to do this to an image described by the sYCC model?

In fact, in the sYCC color space the three coordinates (Y, Cb, Cr) are based on nonlinear R, G, and B coordinates. (Y, and Cb + Cr are thus pseudo-luminance and pseudo-chrominance - they are not even nonlinearized forms of true luminance and chrominance.

So it almost seems as if the results of the demosaicing are in fact in linear RGB form.

I agree almost.

Indeed, I think my concept that the output of the demosaicing is in sYCC is erroneous.

But I suspect that it is in some (linear) luminance-chromaticity (or luminance-chrominance) space (not linear RGB). (I need to look at some papers I have here on demosaicing algorithms to see whether that is typical.)

But we can't get true sYCC from that in any simple way - doing so would require us to first reconstruct linear RGB from the luminance-chrominance model, the apply the nonlinear function to get nonlinear RGB, and then reformulate that into sYCC.

But that may well be what is done in a Canon camera.

It may also be that when I was told (officially) that the Canon cameras use sYcc as their internal color space, that was not quite accurate. It may be that what is actually referred to is the linear luminance-chrominance form I suggested above.

Thanks for simulating me to revisit my thoughts on this.

Best regards,

Doug

Doug Kerr · Mar 15, 2010

Hi, Stuart,

Xin Li, Bahadir Gunturk and Lei Zhang, in their paper "Image Demosaicing: A Systematic Survey", say (my annotations in square brackets):

Our findings suggest most existing works [demosaicing algorithms] belong to the class of sequential demosaicing - i.e., luminance channel is interpolated first and then chrominance channels are reconstructed based on recovered luminance information.

Their work is largely based on the concept that the "G" sensor output channel, as is, is very nearly an indicator of luminance. (Bayer's original patent was also predicated on this outlook.) That's the main reason we have one for every two pixels.

In fact, the response curve that reflects the eye's "luminance" perception is amazingly close to the G-channel response curve for typical Canon cameras. (I have all the curves, but I need to do a little housekeeping before I can put them up here.)

Thus, the demosaicing may start with essentially interpolating the G outputs (treated as "essentially luminance") to get luminance estimates for all the pixels.

The determination of the chrominance estimate is not so simply described (at least not by me at this moment!).

Best regards,

Doug

StuartRae · Mar 15, 2010

Hi Doug,

Thank you once again for your time and patience. It's all very interesting.

I wish I dare say illuminating, but I won't.

Regards,

Stuart

Mike Shimwell · Mar 16, 2010

Doug, just a quick thank you for an interesting set of posts.

MIke

cecil kirksey · Mar 21, 2010

Hi Doug:
Read your articel at "The Pumpkin" and have some questions: Is there an optimal set of spectral responses for the three photodetector channels in a standard CFA configuration that would minimize the color errors?

Camera profiling is usually used to estimate the incamera transformation matrix either by the camera manufacturer or a raw converter developer. Why would one want to use anything other than the supplied matrix by the camera manufacturer? Clearly doing your own camera profiling would not be easy and why use a third party raw developer?

Doug Kerr · Mar 21, 2010

Hi, Cecil,

cecil kirksey said:
Hi Doug:
Read your articel at "The Pumpkin" and have some questions: Is there an optimal set of spectral responses for the three photodetector channels in a standard CFA configuration that would minimize the color errors?

Yes, there is - a set of response curves that are all linear combinations of the eye cone responses (as prescribed by von Luther and Ives).

One complication, though, is that there is not full agreement on exactly what those eye cone response curves are. (I was just reading a paper on that specific issue, in fact.)

But evidently it is not attractive to do that in a camera design. Firstly, it may be tough to actually makes such filters. Secondly, as I understand it, two of those response curves would probably very similar, and a consequence is that some of the "non-diagonal" coefficients of the transform matrix will be fairly large. A consequence of that is when the matrix multiplication is done, the impact of noise at the sensor level can be exacerbated.

Thus (as I understand it), some compromise with overall metameric accuracy is normally made in the interest of better noise performance.

But my understanding of all this is very recent and still unfolding! And I have had no experience whatsoever in camera design.

Camera profiling is usually used to estimate the in camera transformation matrix either by the camera manufacturer or a raw converter developer. Why would one want to use anything other than the supplied matrix by the camera manufacturer? Clearly doing your own camera profiling would not be easy and why use a third party raw developer?

One reason is likely that the facilities in a third party raw developer for controlling various aspects of the image processing might be "better" (more flexible, perhaps, or incorporating further functionality) than those provided in the manufacturer's external raw development software, and would certainly be "more flexible" that doing the development in camera.

That does not suggest that the demosaicing algorithm used is "more crafty" than that of the manufacturer (but it might be - there is no inherent "best" strategy for that), or that the color space transformation matrix used does a "better job" of minimizing average metameric error (but it might do a "better job" based on some different metric for grading the overall performance - perhaps it does a better job, for example, over some set of typical Asian skin reflective spectrums).

There is no inherent "best" transformation matrix in this situation. This differs, say, from the matter of a matrix to transform images from sRGB to Adobe RGB, which is explicit. That is largely because the sensor does not actually practice a true color space - if it did, there would be no metameric error.

But I hardly ever use external raw development, and I'm not at all in a position to know all the practical reasons for choosing one over another.

I'm glad to hear of your interest.

Best regards,

Doug

The "primaries" of a digital camera sensor

Doug Kerr

Well-known member

StuartRae

New member

Doug Kerr

Well-known member

Doug Kerr

Well-known member

StuartRae

New member

Doug Kerr

Well-known member

StuartRae

New member

StuartRae

New member

Doug Kerr

Well-known member

Doug Kerr

Well-known member

StuartRae

New member

Mike Shimwell

New member

cecil kirksey

New member

Doug Kerr

Well-known member