JPEG and the darker areas

Doug Kerr · Apr 13, 2014

It is well recognized that is we seek to manipulate the tonal scale mapping of an image (perhaps to overcome the results of less-than-optimum exposure), we will generally have more success if working with the raw output of the camera than with the JPEG output.

In a recent very insightful discussion of the notions of the "richness" and "robustness" of images, one of our members commented (color emphasis added for ease of later reference):

A good example of "less rich" pictures [based on his earlier discussion, I suspect the writer may have meant here "less robust"] are compressed jpeg pictures. Jpeg is dependent on the limitations of human vision to achieve good compression. For example, it [JPEG] spends less data on dark parts, because human vision is less sensitive to details in the darker parts. When one raises the shadows, this becomes obvious and the image breaks (which is to be expected).

I did not remember having heard the notion in blue before, and I decided to look into just what that might mean.

The JPEG data compression

I first looked into whether the actual JPEG "data compression" process somehow allocates less bits of the delivered data to representing the color of points with lower luminance. I can't find anyplace where that seems to happen. Of course, it could happen in some very obscure way that I missed - the whole process is very complex.

sRGB gamma precompensation

Next I looked into whether this might be an implication of the "gamma precompensation" that is part of the sRGB color space upon which the JPEG representation we use is (indirectly) based. This refers to the fact that the 8-bit R, G, and B values are not linear representations of the corresponding coordinates of the color space.

We find that near the very bottom of the tonal scale (lets say at about 8 stops down from the saturation luminance), a one unit difference in R,G,B represents a difference in luminance of about 0.0003 (where 1.000 represents saturation luminance).

In contrast, at 1/2 stop down from the saturation luminance, a one unit difference in R,G,B represents a difference in luminance of about 0.007.

Thus, if we think in the representation of absolute luminance, the precision of representation in the "dark" region is about 23 times as precise as in the "light" region.

But of course the human response to luminance is itself logarithmic, so perhaps we should consider the precision of the representation not of absolute differences between luminances but rather of luminance difference ratios.

In our "light" region, we find that one unit of R,G,B corresponds to a luminance change of about 1%.
In our "dark" region, we find that one unit of R,G,B corresponds to a luminance change of about 8%, which we can in a certain sense consider to be "8 times worse precision".

So perhaps this is the sense in which lower luminances are recorded "less precisely".

In fact, the gamma precompensation mitigates this effect (one of its purposes). If we instead imagined a linear 8-bit RGB representation, then:

• In our "light" region, at a base luminance 0.5 stops below saturation, one RGB unit (up) would correspond to a change in luminance of 0.5%.

• In our "dark" region, at a base luminance 8 stops below saturation, one RGB unit (up) would correspond to a change in luminance of 100%, which we can in a certain sense consider to be "200 times worse precision".

Loss of sensor precision

In many digital cameras, the luminance detected by the photodetectors is registered on a 12-bit basis. In the sRGB color space used in our JPEG files, luminance is in effect recorded on an 8-bit basis. We can easily imagine this "reduction in precision" to be a cause of worse results when making changes in tonal mapping using the JPEG output of the camera compared to using the raw output. And this wisdom is broadly true.

But in this article we are focusing on the "darker regions" of the image.

We noted above that at 8 stops down from saturation, a one-unit change in the RGB encoding corresponds to a luminance difference of 0.0003 (where 1.000 represents saturation luminance).

If we were working with 12-bit raw values, a one-unit change would correspond to a luminance difference of 0.00024.

Thus, in this region, the precision of luminance representation via the sRGB color space is only 1.2 times less precise than that via the raw data.

Of course, if we consider a camera with 14-bit encoding, the precision in luminance afforded by the raw data is, even in this region, substantially greater than with the JPEG representation.

Observation

Thus I am so far unable to find an actual situation that seems to fit the concept of the JPEG representation devoting less data to the dark areas.

So what does happen?

So then how is it that, when trying to do a postprocessing "exposure adjustment", we have less success working with a JPEF representation than a raw representation (and I concentrate here on the "darker areas" of the image)?

Well, I don't really know. I would be eager to hear the views of my colleagues here on this conundrum.

One author has suggested that in the typical digital camera, when processing the sensor data into JPEG form, a transfer curve is applied to give the "nicest looking" images in typical cases. These curves may well "compress" both the very dark and very light regions.

That's what we know here for now.

Best regards,

Doug

Asher Kelman · Apr 13, 2014

Doug,

A fellow I know works on image compression and in his work, he throws away most of the data in the dark regions, as no one can notice that the detailed descriptions are missing!

Where there is detail discernible by the human eye, at certain distances, that data tends to be conserved. It is just, I believe that more data points can be discarded in image compression in the darker areas. But throwing away data, (not needed for discernment of form and detail), ioccurs all over the image.

I'll see him tomorrow night on our Passover Seder and ask him…but knowing him, it will itself be an entire narrative, a "megilla", so to speak, LOL!

Asher

Jerome Marot · Apr 13, 2014

Doug Kerr said:
In a recent very insightful discussion of the notions of the "richness" and "robustness" of images, one of our members commented (color emphasis added for ease of later reference):

A good example of "less rich" pictures [based on his earlier discussion, I suspect the writer may have meant here "less robust"] are compressed jpeg pictures. Jpeg is dependent on the limitations of human vision to achieve good compression. For example, it [JPEG] spends less data on dark parts, because human vision is less sensitive to details in the darker parts. When one raises the shadows, this becomes obvious and the image breaks (which is to be expected).

Indeed I meant "less robust".

I did not remember having heard the notion in blue before, and I decided to look into just what that might mean.

The JPEG data compression

I first looked into whether the actual JPEG "data compression" process somehow allocates less bits of the delivered data to representing the color of points with lower luminance. I can't find anyplace where that seems to happen.

I may indeed have been in error, jpeg does not necessarily spends less bits on the darker parts of the picture or, more accurately, it is not a simple linear function. Jpeg compression, in its more advanced implementations, uses a perceptual model of human vision. This model may include what is called "luminance masking", which is what I was thinking about. What this means is that, depending on the luminance and contrast of a jpeg block, more or less bits may be used. There are quite a few papers on the subject, this one and that one can be downloaded for free. The second one gives a curve of visibility threshold against luminance at page 48 that would imply that less bits are needed in the darkest parts but also in the lightest parts.

What stays true in practice, and was the reason behind my post about the "dark parts of the picture", is that when one tries to manipulate the contrast of a jpeg picture, the darkest parts are the ones that are the most likely to show artefacts. Probably the reason is the curve cited above and the fact that, when manipulating an image, the parts which were originally very dark are the ones most likely to be moved to the zone of maximum sensitivity (around 50 on the curve).

Doug Kerr · Apr 13, 2014

Hi, Jerome,

Jerome Marot said:
Indeed I meant "less robust".

I may indeed have been in error, jpeg does not necessarily spends less bits on the darker parts of the picture or, more accurately, it is not a simple linear function. Jpeg compression, in its more advanced implementations, uses a perceptual model of human vision.

Yes, and my discussion only applied to the "original" form of JPEG.

This model may include what is called "luminance masking", which is what I was thinking about. What this means is that, depending on the luminance and contrast of a jpeg block, more or less bits may be used. There are quite a few papers on the subject, this one and that one can be downloaded for free. The second one gives a curve of visibility threshold against luminance at page 48 that would imply that less bits are needed in the darkest parts but also in the lightest parts.

Thnaks so much for those references.

What stays true in practice, and was the reason behind my post about the "dark parts of the picture", is that when one tries to manipulate the contrast of a jpeg picture, the darkest parts are the ones that are the most likely to show artefacts. Probably the reason is the curve cited above and the fact that, when manipulating an image, the parts which were originally very dark are the ones most likely to be moved to the zone of maximum sensitivity (around 50 on the curve).

Make sense to me.

Thanks for your insights.

Best regards,

Doug

Doug Kerr · Apr 13, 2014

Hi, Jerome,

The two papers you cited look very interesting. I plan to peruse them today. Thanks again.

Do we know to what extent the concepts they teach (in particular relating to luminance adaptation) are actually practiced in the "advanced" forms of JPEG (such as JPEG 2000 and JPEG XR)? A very superficial review of descriptions of those two standards which compare them to the "original" JPEG standard does not seem to obviously reveal such.

Best regard,

Doug

Jerome Marot · Apr 13, 2014

Doug Kerr said:
The two papers you cited look very interesting. I plan to peruse them today. Thanks again.

Do we know to what extent the concepts they teach (in particular relating to luminance adaptation) are actually practiced in the "advanced" forms of JPEG (such as JPEG 2000 and JPEG XR)? A very superficial review of descriptions of those two standards which compare them to the "original" JPEG standard does not seem to obviously reveal such.

There may be a misunderstanding here. The techniques described in the papers are used in standard jpeg and can probably be used in jpeg2000 and jpeg XR. Keep in mind that the jpeg format only describes a technique to encode and decode images which allows to discard some bits ("lossy compression"). It does not necessarily specify which bits should be discarded. The papers describe techniques useful to choose the bits to be discarded and still make sure that the image will appear unmodified to the average human viewer. But wether you use the perceptual models described in these papers or not, the images will be decoded by a standard jpeg viewer, as the encoding syntax will still be "standard jpeg".

Doug Kerr · Apr 13, 2014

Hi, Jerome,

Jerome Marot said:
There may be a misunderstanding here. The techniques described in the papers are used in standard jpeg and can probably be used in jpeg2000 and jpeg XR.

Ah, yes, I now see that the provision to use different quantizing tables by block is in fact part of the "basic" JPEG specification (via an "extension").

Keep in mind that the jpeg format only describes a technique to encode and decode images which allows to discard some bits ("lossy compression"). It does not necessarily specify which bits should be discarded. The papers describe techniques useful to choose the bits to be discarded and still make sure that the image will appear unmodified to the average human viewer. But wether you use the perceptual models described in these papers or not, the images will be decoded by a standard jpeg viewer, as the encoding syntax will still be "standard jpeg".

Well said. Thanks for that clarification.

So I guess the actual question is, in the uses of JPEG by, for example, modern digital cameras, is the ability to use different quantization tables by block actually exploited to optimize the quantization based on the mean luminance of the block?

I notice that in the paper by Rosenholtz and Watson that they use the term luminance masking (which you mentioned) to mean:

Due to light adaptation, the greater the mean luminance of an image region, the greater the amplitude required to see a pattern within that region.

This is in fact consistent with the well-known logarithmic sensitivity of the eye.

But this is not consistent with the recognized situation that the eye seems less sensitive to luminance variations in a region that is generally "dark" (with respect to the overall range of luminance of the image). In fact, it seems in opposition to that phenomenon.

One author has suggested that the phenomenon is (at least in part) due to the fact that there is a "veiling" of contrast in the darker regions by the presence of the brighter regions (something like the phenomenon of veiling flare in the camera itself).

It also may be that, while we speak of the logarithmic response of the eye (essentially, it responds to the ratio of luminance in adjacent objects, not the difference in absolute luminance), this is in great part due to the (slow) adaptation of the eye. It is more "automatic exposure control" than "logarithmic response of the retina".

Thus, in viewing an image of a certain mean luminance, the eye's adaptation is to that mean luminance, and accordingly the does not respond to differences in luminance in a generally-dark area in term of their difference compared to the mean luminance of the local region, but rather in terms of their difference compared to the mean luminance of the entire image.

This then gives detail (whether well or badly represented) in a dark region a "discount" in perception (a good thing if that detail is "badly" represented).

If we "lift" the luminance of the region, then if the detail is "badly represented", we can more readily see its "badness". It is much like, "Gee, you didn't look so ugly when you were standing in the dark doorway."

Best regards,

Doug

Doug Kerr · Apr 13, 2014

Hi, Jerome,

It is important to note that, even without adaptive modification of the quantizing matrices based (in a way) on the mean luminance of the block, we already do the same thing in a less-sophisticated way.

By this I mean that the quantization does not work on luminance (or on the coefficients of cosine functions that describe the variations in luminance).

Rather, it works on the coefficients of cosine functions that describe the variations in Y (which here is not luminance). Y is non-linearly related to luminance (through the gamma-precompensation function).

Thus, in a region of greater luminance, one step in Y represents a greater difference in luminance, and the same quantization table regimen leads to a greater "coarseness" in the representation of luminance (as is said to be desirable).

Best regards,

Doug

Doug Kerr · Apr 15, 2014

Hi, Jerome,

In the paper by Rosenholtz and Watson ("Perceptual adaptive JPEG coding") that you referenced, appears this passage (in connection with the concept of luminance masking):

Due to light adaptation, the greater the mean luminance of an image region, the greater the amplitude required to see a pattern within that region.

This is of course no surprise, as it comports with the concept of Weber's law (essentially suggesting a logarithmic response of the eye to luminance).

But we encountered that passage in connection with our interest in the recognized fact that the human eye seems less sensitive to luminance changes (including "unwanted" ones created by artifacts of the whole image processing chain) in darker areas of the image. And that statement does not seem to comport with that.

However, in the paper by Tong ("A perceptually adaptive JPEG coder") that you referenced, we find this interesting passage, again about luminance masking:

The implication from Weber's law is that the human eye is less sensitive to errors in the bright areas (areas with high luminance values) of a picture because AL is relatively high in those areas. Weber's law is generally accurate over the normal range of middle-low to high luminance values. However, in very dark area, it has been reported that the Weber fraction tends to increase with decreasing background luminance values. In other words, the human eye's sensitivity to distortion also decreases in very dark area.

In other words, the relationship mentioned by Rosenholtz and Watson is perhaps, as it seemed to me, just a reflection of Weber's law (as discussed in the first sentence in the Tong quotation), whereas luminance masking in fact seems to refer to a departure from Weber's law for low luminance.

Tricky stuff,

Best regards,

Doug

JPEG and the darker areas

Doug Kerr

Well-known member

Asher Kelman

OPF Owner/Editor-in-Chief

Jerome Marot

Well-known member

Doug Kerr

Well-known member

Doug Kerr

Well-known member

Jerome Marot

Well-known member

Doug Kerr

Well-known member

Doug Kerr

Well-known member

Doug Kerr

Well-known member