Doug Kerr
Well-known member
This is to summarize my findings regarding a peculiarity when using Photoshop to embed in image file metadata a copyright notice including the character '©'.
Three kinds of metadata
Let me first review the three major "kinds" of metadata provided for in modern image files:
• Exif metadata. This includes the familiar information about the technical circumstances of the shot - camera model, shutter speed, and so forth. It also provides for information such as what we may call a copyright notice. Its technical structure draws upon the "tag" structure found in the TIFF file format.
• IPTC IIM metadata. This is the original form of metadata standardized by the International Press Telecommunications Council. Its technical structure is similar to that of the Exif metadata. It also provides for what we may call a copyright notice.
• IPTC XML metadata. This is an advanced form of metadata standardized by IPTC. Its structure is based on the XML information formatting concept. It also provides for what we may call a copyright notice. (The actual data item has different formal names in all three places.)
An Exif file (that is the type we most commonly use, including for JPEG image data) can support all three types (simultaneously).
Character Sets
The applicable specification provides that, in Exif metadata, what we will call here the 'copyright notice' item shall be encoded in ASCII. To get a little ahead of the story, that means that the character '©', not being an ASCII character, cannot legitimately appear in the Exif Metadata.
As we so often find, many workers have, without any formal leave from the standards, "stretched" the prescription for text items in Exif metadata to be in ASCII to mean "ASCII or ISO-8859-1", the latter being essentially the "extended ASCII" character set widely used in Windows systems.
IPTC IIM metadata text may be encoded in any of several character sets, including ASCII or Unicode UTF-8. An indicator tells which character set is in use.
IPTC metadata is encoded in Unicode UTF-8.
Photoshop
Photoshop, in its File Info panel, permits the user to set various data items that will be embedded as metadata in the resulting image file. There is a single field for the "copyright notice". Photoshop will then embed it as the corresponding item in all three types of metadata mentioned above.
In all three places, this text string is encoded in Unicode UTF-8 form. In the two IPTC areas this practice is perfectly in keeping with the IPTC standards (in the IPTC IIM area, the proper character set indicator is provided).
But in the Exif metadata area, the copyright notice string is also encoded in Unicode UTF-8 form. This is not accommodated by the Exif specification.
So long as only ASCII characters appear in the string, this is only of academic interest: the Unicode UTF-8 representation of an ASCII character (such as 'a') is just the same as the ASCII (or ISO-8859-1) representation - a single byte, with the obvious value.
But the situation gets more complicated when the string includes a non-ASCII character, such as '©'. For that in particular, the Unicode UTF-8 representation is a sequence of two bytes, with hexadecimal values 0xC2 and 0xA9. If that sequence is examined by an application that presumes text strings in Exif metadata to have been encoded in ISO-8859-1, it will display it as '©' - not cool.
So, what should Photoshop do? The dilemma is that there is no legitimate way (that is, in conformity with the Exif specification) to embed a '©' in the Exif copyright notice item at all. But of course it would not have been practical for the Photoshop designers just to say "we cannot handle that - it is contrary to the specification".
But, given general practice in this area, it would probably have been more useful for Photoshop, when seeking to embed non-ASCII characters in Exif metadata, to use the ISO-8859-1 encoding.
Metadata reading applications
A "strict" Exif metadata-reading application, encountering the character '©' in ISO-8859-1 encoding in the Exif metadata, would reject it as a non-character (perhaps displaying instead a substitute character).
But in fact, most Exif-metadata reading applications take a practical view, and will display the character as '©'. But, faced with a Photoshop-generated file, they will display in this spot what they think they have seen in the file: '©'.
Some applications, developed by those aware of the Photoshop practice (BreezeBrowser is a good example), will render the byte sequence 0xC2 0xA9 as '©', essentially treating the incoming data as being in Unicode UTF-8.
What it the file was not generated by PhotoShop, and the user has placed in the copyright notice string the character 'Â' ('Copyright 2010 Â. Yert') (encoded as ISO-8859-1)?
Typical Exif readers render that byte sequence as intended.
BreezeBrowser renders the resulting byte sequence as 'Copyright 2010 ® Yert'. (I'll spare teh reader the analysis of how that happens.)
Photoshop renders it as intended (but then, if we let it rewrite the file, puts it into UTF-8 form.
(By the way, 'Â' in Unicode UTF-8 has a two-byte representation.)
What should happen?
• CIPA (the issuer of the DCF file specification, essentially the form of the Exif file format used by most digital cameras today) should amend the specification to provide for general text items in Exif metadata to be encoded in ISO-8859-1 form.
• Adobe should arrange Photoshop to encode text data in the Exif metadata area in ISO-8859-1 form.
• There should be peace between Israel and Palestine.
So, what should we do?
There is no foolproof solution at present to this problem. Perhaps best would be for those who embed a copyright notice in their files via Photoshop to not include the character '©'. It is not a mandatory part of the copyright notice as prescribed under US copyright law. (However, it does play a role in the international scheme of copyright protection).
Best regards,
Doug
Three kinds of metadata
Let me first review the three major "kinds" of metadata provided for in modern image files:
• Exif metadata. This includes the familiar information about the technical circumstances of the shot - camera model, shutter speed, and so forth. It also provides for information such as what we may call a copyright notice. Its technical structure draws upon the "tag" structure found in the TIFF file format.
• IPTC IIM metadata. This is the original form of metadata standardized by the International Press Telecommunications Council. Its technical structure is similar to that of the Exif metadata. It also provides for what we may call a copyright notice.
• IPTC XML metadata. This is an advanced form of metadata standardized by IPTC. Its structure is based on the XML information formatting concept. It also provides for what we may call a copyright notice. (The actual data item has different formal names in all three places.)
An Exif file (that is the type we most commonly use, including for JPEG image data) can support all three types (simultaneously).
Character Sets
The applicable specification provides that, in Exif metadata, what we will call here the 'copyright notice' item shall be encoded in ASCII. To get a little ahead of the story, that means that the character '©', not being an ASCII character, cannot legitimately appear in the Exif Metadata.
As we so often find, many workers have, without any formal leave from the standards, "stretched" the prescription for text items in Exif metadata to be in ASCII to mean "ASCII or ISO-8859-1", the latter being essentially the "extended ASCII" character set widely used in Windows systems.
IPTC IIM metadata text may be encoded in any of several character sets, including ASCII or Unicode UTF-8. An indicator tells which character set is in use.
IPTC metadata is encoded in Unicode UTF-8.
Photoshop
Photoshop, in its File Info panel, permits the user to set various data items that will be embedded as metadata in the resulting image file. There is a single field for the "copyright notice". Photoshop will then embed it as the corresponding item in all three types of metadata mentioned above.
In all three places, this text string is encoded in Unicode UTF-8 form. In the two IPTC areas this practice is perfectly in keeping with the IPTC standards (in the IPTC IIM area, the proper character set indicator is provided).
But in the Exif metadata area, the copyright notice string is also encoded in Unicode UTF-8 form. This is not accommodated by the Exif specification.
So long as only ASCII characters appear in the string, this is only of academic interest: the Unicode UTF-8 representation of an ASCII character (such as 'a') is just the same as the ASCII (or ISO-8859-1) representation - a single byte, with the obvious value.
But the situation gets more complicated when the string includes a non-ASCII character, such as '©'. For that in particular, the Unicode UTF-8 representation is a sequence of two bytes, with hexadecimal values 0xC2 and 0xA9. If that sequence is examined by an application that presumes text strings in Exif metadata to have been encoded in ISO-8859-1, it will display it as '©' - not cool.
So, what should Photoshop do? The dilemma is that there is no legitimate way (that is, in conformity with the Exif specification) to embed a '©' in the Exif copyright notice item at all. But of course it would not have been practical for the Photoshop designers just to say "we cannot handle that - it is contrary to the specification".
But, given general practice in this area, it would probably have been more useful for Photoshop, when seeking to embed non-ASCII characters in Exif metadata, to use the ISO-8859-1 encoding.
Metadata reading applications
A "strict" Exif metadata-reading application, encountering the character '©' in ISO-8859-1 encoding in the Exif metadata, would reject it as a non-character (perhaps displaying instead a substitute character).
But in fact, most Exif-metadata reading applications take a practical view, and will display the character as '©'. But, faced with a Photoshop-generated file, they will display in this spot what they think they have seen in the file: '©'.
Some applications, developed by those aware of the Photoshop practice (BreezeBrowser is a good example), will render the byte sequence 0xC2 0xA9 as '©', essentially treating the incoming data as being in Unicode UTF-8.
What it the file was not generated by PhotoShop, and the user has placed in the copyright notice string the character 'Â' ('Copyright 2010 Â. Yert') (encoded as ISO-8859-1)?
Typical Exif readers render that byte sequence as intended.
BreezeBrowser renders the resulting byte sequence as 'Copyright 2010 ® Yert'. (I'll spare teh reader the analysis of how that happens.)
Photoshop renders it as intended (but then, if we let it rewrite the file, puts it into UTF-8 form.
(By the way, 'Â' in Unicode UTF-8 has a two-byte representation.)
What should happen?
• CIPA (the issuer of the DCF file specification, essentially the form of the Exif file format used by most digital cameras today) should amend the specification to provide for general text items in Exif metadata to be encoded in ISO-8859-1 form.
• Adobe should arrange Photoshop to encode text data in the Exif metadata area in ISO-8859-1 form.
• There should be peace between Israel and Palestine.
So, what should we do?
There is no foolproof solution at present to this problem. Perhaps best would be for those who embed a copyright notice in their files via Photoshop to not include the character '©'. It is not a mandatory part of the copyright notice as prescribed under US copyright law. (However, it does play a role in the international scheme of copyright protection).
#
Best regards,
Doug
Last edited: