• Please use real names.

    Greetings to all who have registered to OPF and those guests taking a look around. Please use real names. Registrations with fictitious names will not be processed. REAL NAMES ONLY will be processed

    Firstname Lastname

    Register

    We are a courteous and supportive community. No need to hide behind an alia. If you have a genuine need for privacy/secrecy then let me know!
  • Welcome to the new site. Here's a thread about the update where you can post your feedback, ask questions or spot those nasty bugs!

Photoshop - encoding of copyright symbol in metadata

Doug Kerr

Well-known member
Several members have noticed that Photoshop encodes the copyright symbol ("©") in all metadata (Exif, IPTC IIM, and IPTC XMP) as the sequence of characters C2h A9h. [In Windows Code Page 1252, the usual "extended ASCII" encoding used inside Windows, that would be interpreted as "©".]

Here's the story.

IPTC XMP metadata

IPTC XMP metadata is encoded in UTF-8 encoding. In UTF-8, the character "©" is not encoded as the single byte A9h (as it is in Windows Code Page 1252). Rather, it is encoded as the two byte sequence C2h A9h.[Only ASCII characters get single-byte representations in UTF-8.]

Thus, the encoding used by Photoshop for "©" in IPTC XMP metadata (C2h A9h) is appropriate.

Any XMP interpreting program should render this on screen as "©".

IPTC IIM metadata

IPTC IIM metadata ("legacy" IPTC metadata) can use several encodings. The encoding used should be indicated by a data item, CodedCharacterSet.

IPTC IIM metadata generated by Photoshop indicates the encoding as UTF-8. Thus, the encoding used by Photoshop for "©" in IPTC metadata (C2h A9h) is appropriate.

Fully-observant IPTC IIM metadata XMP interpreting programs should render this on screen as "©".

Exif metadata

According to the Exif specification, the Exif metadata item Copyright should be encoded in ASCII. The character "©" does not exist in the ASCII character set.

Sometimes, to deal with this, characters beyond ASCII but which are included in Windows Code Page 1252 (such as "©") are encoded in Exif metadata in Windows Code Page 1252 form (A9h). Many Exif metadata-reading applications assume that text strings are in Windows Code Page 1252. Others apparently are prepared to recognize whether characters beyond the ASCII character set are encoded in UTF-8 form or Windows Code Page 1252 form.

Photoshop encodes the character "©" in UTF-8 form in Exif metadata. This cannot be said to either correct nor incorrect under the Exif specification given that the character "©" is not really allowed in Exif metadata. [But see my recommendation below.]

• Receiving Exif metadata applications that strictly follow the Exif specification will not display the sequence C2h A9h at all (those code values do not represent ASCII characters). [A substitute character - perhaps "?" - may be displayed for each byte.]

• Receiving Exif metadata applications that assume UTF-8 encoding of characters beyond the ASCII character set will display the sequence C2h A9h as "©". [This is what has been reported as an anomaly.]

• Receiving Exif metadata applications that are prepared to recognize whether Windows Code Page 1252 or UTF-8 encoding is being used for text strings will display the sequence C2h A9h as "©".

Conclusion

It is my opinion that it is inappropriate for Photoshop to encode the character "©" into Exif metadata in UTF-8 encoding. It would be more prudent for it to encode the character "©" into Exif metadata in Windows Code Page 1252 form (as the byte A9h).

It is my opinion that it is perfectly appropriate for Photoshop to encode the character "©" into IPTC IIM and IPTC XMP metadata in UTF-8 form (as it does now).
 

Mike Bailey

pro member
Hi Doug,

Thank you for posting your research into this. That would mean that all the applications that have until now displayed only the single character will be displaying two characters when the copyright character is placed by CS5 - and other applications following the strict, recent interpretation of the specification. However, it sure feels and looks like a bug when one sees this extraneous character is so many earlier versions of the software. It almost feels like a child telling a parent they aren't right because they don't know the facts!

What might be questionable I'd think is that CS5 rewrites (encodes) all the fields of the EXIF header whenever it does any editing whatsoever, even if those fields, such as the copyright notice, are not touched specifically at all in the edit.

BreezeBrowser, which has about a billion point releases a year, is up to the task and displays only a single character for the copyright, even when it is encoded by CS5 with the two characters.

Mike
______________
http://BlueRockPhotography.com
http://www.facebook.com/pages/Blue-Rock-Photography
 

Doug Kerr

Well-known member
Hi, Mike,

Thank you for posting your research into this. That would mean that all the applications that have until now displayed only the single character will be displaying two characters when the copyright character is placed by CS5 - and other applications following the strict, recent interpretation of the specification.
I'm not sure which specification you mean.

So far as I know (and I may not be fully up-to-date) The Exif specification itself does not make provisions for UTF-8 encoding. (So far as I know, it does not make any provisions for characters beyond ASCII.) Neither is there any such in the DCF specification (which is an enlargement of the Exif specification that actually governs our camera output files). (But I haven't checked the latest version of DCF.)

(I'f you have information beyond this I'd be glad to know of it.)

Based on that assumption, I think it is inappropriate for Photoshop to encode characters beyond ASCII but within the character set of Windows CP 1252 (e.g., "©") in UTF-8 in the Exif metadata. (That is fine in IPTC XMP metadata, and it is fine in IPTC IIM metadata if that encoding is declared, as is done by Photoshop.)

BreezeBrowser, which has about a billion point releases a year, is up to the task and displays only a single character for the copyright, even when it is encoded by CS5 with the two characters.
Yes, I discovered that last night after I posted my report. For the moment, I have to think that it is clever of Chris Breeze to have his product overcome a questionable practice by Adobe!

I'll be poking around on this some more today.

Thanks for your inputs.

Best regards,

Doug
 

Doug Kerr

Well-known member
Mike,

I have made a quick ("on the way to breakfast") examination of the latest Exif and DCF specifications (both current as of 2010.04.26).

By the way, both are available here (free):

http://www.cipa.jp/english/hyoujunka/kikaku/cipa_e_kikaku_list.html

Both standards prescribe that the Copyright field is to be in ASCII (by which they really mean that - characters whose codes are in the range 20h-7Fh).

Neither make any provision for UTF-8 encoding.

The Exif spec has for some while provided that the UserComment item may be encoded in several ways, including "Unicode", but gives no insight into what encoding of Unicode characters is implied. Likely the implication is UTF-16, with the byte order based on that of the platform involved. That provision is an incompletely-cooked egg.

I uphold my view that it is inappropriate for Photoshop to encode the symbol "©" in UTF-8 in Exif metadata.

I think that encoding it in Windows Code Page 1252 (called by ICANN "windows-1252") (or ISO-8859-1, which is the same for that character) would be much more prudent.

This despite the fact that inclusion of the symbol "©" in Exif metadata in any encoding is actually still "illegal" as of four weeks ago.

More after breakfast.

Best regards,

Doug
 
Last edited:

Doug Kerr

Well-known member
Here are the results of a quick survey here of the response of various Exif-reporting software to characters beyond ASCII encoded in UTF-8 form.

In each case, the test was done with the CopyrightNotice (or equivalent) data item. Separate results are given for that item in the Exif metadata and in the IPTC metadata. In the case of IPTC metadata, I will not distinguish between the IIM and XMP forms (not all reporting applications report both, and I have not taken the time to sort that out here).

The issue is whether such an encoded character is decoded by the application and presented "as intended".

The symbol "n/a" means that the application does not report metadata of that class.

Results in red are "noncompliant" with the applicable industry specification(s).

Results in blue are "inconvenient" when dealing with files generated by Photoshop, but cannot be considered noncompliant. No industry specification I am aware of legitimatizes the encoding of characters in UTF-8 form in Exif metadata. Of course, no industry standard I know of allows characters beyond the ASCII character set in Exif metadata at all (except for the UserComment item).


Code:
[B]UTF-8 encoded characters beyond the ASCII character set
   "properly" decoded and displayed[/B]

[B]                         Metadata class[/B]
[B]Application               Exif*   IPTC[/B]

Photoshop CS2              Yes     Yes
Photoshop CS5              Yes     Yes
BreezeBrowser              Yes     Yes
Irfanview                   [COLOR="Blue"]No[/COLOR]      [COLOR="Red"]No[/COLOR]
Vueprint                    [COLOR="Blue"]No[/COLOR]     n/a
Qimage                      [COLOR="Blue"]No[/COLOR]      [COLOR="Red"]No[/COLOR]
ExiftoolGUI                 [COLOR="Blue"]No[/COLOR]     Yes
Gexifview                   [COLOR="Blue"]No[/COLOR]     n/a
ExifReader                  [COLOR="Blue"]No[/COLOR]     n/a
Opanda iExif **             [COLOR="Blue"]No[/COLOR]      [COLOR="Red"]No[/COLOR]
* Note that characters beyond the ASCII character set are not actually permitted, in any encoding, in Exif metadata (other than in the UserComment) under the applicable industry standards.

** A plugin for browsers allowing metadata in images in Web pages to be read.


Best regards,

Doug
 

Mike Bailey

pro member
Hi Doug,

I didn't have a specific specification (if I can say that without getting too tongue-tied) in mind, but was just being generalistic.

You are tenacious and thorough! Interesting to see how the different applications react to the UTF-8 encoding. Also appreciate your opinions on what seems appropriate or not for EXIF information. Like you, I'm considering dumping the copyright symbol © in favor of just "Copyright' or maybe 'Copyright (c)' as the preface to the string so that extra CS5-generated character does not appear. Might be a moot point for me as I use CS2 most of the time so far, rather than CS5, since it seems t be slightly faster on my computers.

Mike
_____________
http://BlueRockPhotography.com
http://www.facebook.com/pages/Blue-Rock-Photography
 

Doug Kerr

Well-known member
Hi, Mike,

Like you, I'm considering dumping the copyright symbol © in favor of just "Copyright' or maybe 'Copyright (c)' as the preface to the string so that extra CS5-generated character does not appear.

Note that:

• With regard to United States copyright law, there is no requirement (nor advantage) to use "©". Just "Copyright" or "Copyr" is sufficient (of course with the rest of the obligatory elements: the year of publication and the name of the creator).

• The string "(c)" has absolutely no standing in this regard anyplace.

Might be a moot point for me as I use CS2 most of the time so far, rather than CS5, since it seems t be slightly faster on my computers
.
CS2 works just like CS5 in this matter.

CS2 loads much more slowly than CS5 on my machine. I'm not sure about operating speed.

Best regards,

Doug
 
Top