Open Photography Forums  
HOME FORUMS NEWS FAQ SEARCH

Go Back   Open Photography Forums > Digital Darkroom > Image Processing and Workflow

Image Processing and Workflow RAW, DNG , TIFF and JPG. From Capture to Ready for Publish/Display. All software and techniques used within an image workflow, (except extensive retouching and repair or DAM).

Reply
 
Thread Tools Display Modes
  #1  
Old May 15th, 2010, 10:50 PM
Doug Kerr Doug Kerr is offline
Senior Member
 
Join Date: May 2006
Location: Alamogordo, New Mexico, USA
Posts: 8,598
Default Photoshop - encoding of copyright symbol in metadata

Several members have noticed that Photoshop encodes the copyright symbol ("") in all metadata (Exif, IPTC IIM, and IPTC XMP) as the sequence of characters C2h A9h. [In Windows Code Page 1252, the usual "extended ASCII" encoding used inside Windows, that would be interpreted as "©".]

Here's the story.

IPTC XMP metadata

IPTC XMP metadata is encoded in UTF-8 encoding. In UTF-8, the character "" is not encoded as the single byte A9h (as it is in Windows Code Page 1252). Rather, it is encoded as the two byte sequence C2h A9h.[Only ASCII characters get single-byte representations in UTF-8.]

Thus, the encoding used by Photoshop for "" in IPTC XMP metadata (C2h A9h) is appropriate.

Any XMP interpreting program should render this on screen as "".

IPTC IIM metadata

IPTC IIM metadata ("legacy" IPTC metadata) can use several encodings. The encoding used should be indicated by a data item, CodedCharacterSet.

IPTC IIM metadata generated by Photoshop indicates the encoding as UTF-8. Thus, the encoding used by Photoshop for "" in IPTC metadata (C2h A9h) is appropriate.

Fully-observant IPTC IIM metadata XMP interpreting programs should render this on screen as "".

Exif metadata

According to the Exif specification, the Exif metadata item Copyright should be encoded in ASCII. The character "" does not exist in the ASCII character set.

Sometimes, to deal with this, characters beyond ASCII but which are included in Windows Code Page 1252 (such as "") are encoded in Exif metadata in Windows Code Page 1252 form (A9h). Many Exif metadata-reading applications assume that text strings are in Windows Code Page 1252. Others apparently are prepared to recognize whether characters beyond the ASCII character set are encoded in UTF-8 form or Windows Code Page 1252 form.

Photoshop encodes the character "" in UTF-8 form in Exif metadata. This cannot be said to either correct nor incorrect under the Exif specification given that the character "" is not really allowed in Exif metadata. [But see my recommendation below.]

Receiving Exif metadata applications that strictly follow the Exif specification will not display the sequence C2h A9h at all (those code values do not represent ASCII characters). [A substitute character - perhaps "?" - may be displayed for each byte.]

Receiving Exif metadata applications that assume UTF-8 encoding of characters beyond the ASCII character set will display the sequence C2h A9h as "©". [This is what has been reported as an anomaly.]

Receiving Exif metadata applications that are prepared to recognize whether Windows Code Page 1252 or UTF-8 encoding is being used for text strings will display the sequence C2h A9h as "".

Conclusion

It is my opinion that it is inappropriate for Photoshop to encode the character "" into Exif metadata in UTF-8 encoding. It would be more prudent for it to encode the character "" into Exif metadata in Windows Code Page 1252 form (as the byte A9h).

It is my opinion that it is perfectly appropriate for Photoshop to encode the character "" into IPTC IIM and IPTC XMP metadata in UTF-8 form (as it does now).
Reply With Quote
  #2  
Old May 16th, 2010, 03:03 AM
Mike Bailey Mike Bailey is offline
Member
 
Join Date: May 2006
Location: Wisconsin, United States
Posts: 114
Default

Hi Doug,

Thank you for posting your research into this. That would mean that all the applications that have until now displayed only the single character will be displaying two characters when the copyright character is placed by CS5 - and other applications following the strict, recent interpretation of the specification. However, it sure feels and looks like a bug when one sees this extraneous character is so many earlier versions of the software. It almost feels like a child telling a parent they aren't right because they don't know the facts!

What might be questionable I'd think is that CS5 rewrites (encodes) all the fields of the EXIF header whenever it does any editing whatsoever, even if those fields, such as the copyright notice, are not touched specifically at all in the edit.

BreezeBrowser, which has about a billion point releases a year, is up to the task and displays only a single character for the copyright, even when it is encoded by CS5 with the two characters.

Mike
______________
http://BlueRockPhotography.com
http://www.facebook.com/pages/Blue-Rock-Photography
Reply With Quote
  #3  
Old May 16th, 2010, 06:07 AM
Doug Kerr Doug Kerr is offline
Senior Member
 
Join Date: May 2006
Location: Alamogordo, New Mexico, USA
Posts: 8,598
Default

Hi, Mike,

Quote:
Originally Posted by Mike Bailey View Post
Thank you for posting your research into this. That would mean that all the applications that have until now displayed only the single character will be displaying two characters when the copyright character is placed by CS5 - and other applications following the strict, recent interpretation of the specification.
I'm not sure which specification you mean.

So far as I know (and I may not be fully up-to-date) The Exif specification itself does not make provisions for UTF-8 encoding. (So far as I know, it does not make any provisions for characters beyond ASCII.) Neither is there any such in the DCF specification (which is an enlargement of the Exif specification that actually governs our camera output files). (But I haven't checked the latest version of DCF.)

(I'f you have information beyond this I'd be glad to know of it.)

Based on that assumption, I think it is inappropriate for Photoshop to encode characters beyond ASCII but within the character set of Windows CP 1252 (e.g., "") in UTF-8 in the Exif metadata. (That is fine in IPTC XMP metadata, and it is fine in IPTC IIM metadata if that encoding is declared, as is done by Photoshop.)

Quote:
BreezeBrowser, which has about a billion point releases a year, is up to the task and displays only a single character for the copyright, even when it is encoded by CS5 with the two characters.
Yes, I discovered that last night after I posted my report. For the moment, I have to think that it is clever of Chris Breeze to have his product overcome a questionable practice by Adobe!

I'll be poking around on this some more today.

Thanks for your inputs.

Best regards,

Doug
Reply With Quote
  #4  
Old May 16th, 2010, 06:47 AM
Doug Kerr Doug Kerr is offline
Senior Member
 
Join Date: May 2006
Location: Alamogordo, New Mexico, USA
Posts: 8,598
Default

Mike,

I have made a quick ("on the way to breakfast") examination of the latest Exif and DCF specifications (both current as of 2010.04.26).

By the way, both are available here (free):

http://www.cipa.jp/english/hyoujunka...kaku_list.html

Both standards prescribe that the Copyright field is to be in ASCII (by which they really mean that - characters whose codes are in the range 20h-7Fh).

Neither make any provision for UTF-8 encoding.

The Exif spec has for some while provided that the UserComment item may be encoded in several ways, including "Unicode", but gives no insight into what encoding of Unicode characters is implied. Likely the implication is UTF-16, with the byte order based on that of the platform involved. That provision is an incompletely-cooked egg.

I uphold my view that it is inappropriate for Photoshop to encode the symbol "" in UTF-8 in Exif metadata.

I think that encoding it in Windows Code Page 1252 (called by ICANN "windows-1252") (or ISO-8859-1, which is the same for that character) would be much more prudent.

This despite the fact that inclusion of the symbol "" in Exif metadata in any encoding is actually still "illegal" as of four weeks ago.

More after breakfast.

Best regards,

Doug

Last edited by Doug Kerr; May 16th, 2010 at 09:10 AM.
Reply With Quote
  #5  
Old May 16th, 2010, 09:07 AM
Doug Kerr Doug Kerr is offline
Senior Member
 
Join Date: May 2006
Location: Alamogordo, New Mexico, USA
Posts: 8,598
Default

Here are the results of a quick survey here of the response of various Exif-reporting software to characters beyond ASCII encoded in UTF-8 form.

In each case, the test was done with the CopyrightNotice (or equivalent) data item. Separate results are given for that item in the Exif metadata and in the IPTC metadata. In the case of IPTC metadata, I will not distinguish between the IIM and XMP forms (not all reporting applications report both, and I have not taken the time to sort that out here).

The issue is whether such an encoded character is decoded by the application and presented "as intended".

The symbol "n/a" means that the application does not report metadata of that class.

Results in red are "noncompliant" with the applicable industry specification(s).

Results in blue are "inconvenient" when dealing with files generated by Photoshop, but cannot be considered noncompliant. No industry specification I am aware of legitimatizes the encoding of characters in UTF-8 form in Exif metadata. Of course, no industry standard I know of allows characters beyond the ASCII character set in Exif metadata at all (except for the UserComment item).


Code:
UTF-8 encoded characters beyond the ASCII character set
   "properly" decoded and displayed

                         Metadata class
Application               Exif*   IPTC

Photoshop CS2              Yes     Yes
Photoshop CS5              Yes     Yes
BreezeBrowser              Yes     Yes
Irfanview                   No      No
Vueprint                    No     n/a
Qimage                      No      No
ExiftoolGUI                 No     Yes
Gexifview                   No     n/a
ExifReader                  No     n/a
Opanda iExif **             No      No
* Note that characters beyond the ASCII character set are not actually permitted, in any encoding, in Exif metadata (other than in the UserComment) under the applicable industry standards.

** A plugin for browsers allowing metadata in images in Web pages to be read.


Best regards,

Doug
Reply With Quote
  #6  
Old May 17th, 2010, 08:01 AM
Mike Bailey Mike Bailey is offline
Member
 
Join Date: May 2006
Location: Wisconsin, United States
Posts: 114
Default

Hi Doug,

I didn't have a specific specification (if I can say that without getting too tongue-tied) in mind, but was just being generalistic.

You are tenacious and thorough! Interesting to see how the different applications react to the UTF-8 encoding. Also appreciate your opinions on what seems appropriate or not for EXIF information. Like you, I'm considering dumping the copyright symbol in favor of just "Copyright' or maybe 'Copyright (c)' as the preface to the string so that extra CS5-generated character does not appear. Might be a moot point for me as I use CS2 most of the time so far, rather than CS5, since it seems t be slightly faster on my computers.

Mike
_____________
http://BlueRockPhotography.com
http://www.facebook.com/pages/Blue-Rock-Photography
Reply With Quote
  #7  
Old May 17th, 2010, 03:40 PM
Doug Kerr Doug Kerr is offline
Senior Member
 
Join Date: May 2006
Location: Alamogordo, New Mexico, USA
Posts: 8,598
Default

Hi, Mike,

Quote:
Originally Posted by Mike Bailey View Post
Like you, I'm considering dumping the copyright symbol in favor of just "Copyright' or maybe 'Copyright (c)' as the preface to the string so that extra CS5-generated character does not appear.
Note that:

With regard to United States copyright law, there is no requirement (nor advantage) to use "". Just "Copyright" or "Copyr" is sufficient (of course with the rest of the obligatory elements: the year of publication and the name of the creator).

The string "(c)" has absolutely no standing in this regard anyplace.

Quote:
Might be a moot point for me as I use CS2 most of the time so far, rather than CS5, since it seems t be slightly faster on my computers
.
CS2 works just like CS5 in this matter.

CS2 loads much more slowly than CS5 on my machine. I'm not sure about operating speed.

Best regards,

Doug
Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
A copyright conundrum Doug Kerr Layback Cafe 14 April 25th, 2009 07:12 PM
Image protection Howard Jones Sales, Exhibitions and Web Presence 2 June 11th, 2008 03:51 AM
Copyright versus trade mark? Jeff O'Neil Layback Cafe 3 December 9th, 2006 09:35 AM
Copyright Infringement of my copyright notice! Robert Edwards Sales, Exhibitions and Web Presence 5 November 17th, 2006 04:03 AM
Photography of copyrighted buildings Doug Kerr Architectural - Industrial 9 July 5th, 2006 10:45 PM


All times are GMT -7. The time now is 09:30 AM.


Posting images or text grants license to OPF, yet of such remain with its creator. Still, all assembled discussion 2006-2017 Asher Kelman (all rights reserved) Posts with new theme or unusual image might be moved/copied to a new thread!