• Please use real names.

    Greetings to all who have registered to OPF and those guests taking a look around. Please use real names. Registrations with fictitious names will not be processed. REAL NAMES ONLY will be processed

    Firstname Lastname

    Register

    We are a courteous and supportive community. No need to hide behind an alia. If you have a genuine need for privacy/secrecy then let me know!
  • Welcome to the new site. Here's a thread about the update where you can post your feedback, ask questions or spot those nasty bugs!

Photoshop - character '©' in Exif metadata

Doug Kerr

Well-known member
This is to summarize my findings regarding a peculiarity when using Photoshop to embed in image file metadata a copyright notice including the character '©'.


Three kinds of metadata

Let me first review the three major "kinds" of metadata provided for in modern image files:

• Exif metadata. This includes the familiar information about the technical circumstances of the shot - camera model, shutter speed, and so forth. It also provides for information such as what we may call a copyright notice. Its technical structure draws upon the "tag" structure found in the TIFF file format.

• IPTC IIM metadata. This is the original form of metadata standardized by the International Press Telecommunications Council. Its technical structure is similar to that of the Exif metadata. It also provides for what we may call a copyright notice.

• IPTC XML metadata. This is an advanced form of metadata standardized by IPTC. Its structure is based on the XML information formatting concept. It also provides for what we may call a copyright notice. (The actual data item has different formal names in all three places.)

An Exif file (that is the type we most commonly use, including for JPEG image data) can support all three types (simultaneously).

Character Sets

The applicable specification provides that, in Exif metadata, what we will call here the 'copyright notice' item shall be encoded in ASCII. To get a little ahead of the story, that means that the character '©', not being an ASCII character, cannot legitimately appear in the Exif Metadata.

As we so often find, many workers have, without any formal leave from the standards, "stretched" the prescription for text items in Exif metadata to be in ASCII to mean "ASCII or ISO-8859-1", the latter being essentially the "extended ASCII" character set widely used in Windows systems.

IPTC IIM metadata text may be encoded in any of several character sets, including ASCII or Unicode UTF-8. An indicator tells which character set is in use.

IPTC metadata is encoded in Unicode UTF-8.

Photoshop

Photoshop, in its File Info panel, permits the user to set various data items that will be embedded as metadata in the resulting image file. There is a single field for the "copyright notice". Photoshop will then embed it as the corresponding item in all three types of metadata mentioned above.

In all three places, this text string is encoded in Unicode UTF-8 form. In the two IPTC areas this practice is perfectly in keeping with the IPTC standards (in the IPTC IIM area, the proper character set indicator is provided).

But in the Exif metadata area, the copyright notice string is also encoded in Unicode UTF-8 form. This is not accommodated by the Exif specification.

So long as only ASCII characters appear in the string, this is only of academic interest: the Unicode UTF-8 representation of an ASCII character (such as 'a') is just the same as the ASCII (or ISO-8859-1) representation - a single byte, with the obvious value.

But the situation gets more complicated when the string includes a non-ASCII character, such as '©'. For that in particular, the Unicode UTF-8 representation is a sequence of two bytes, with hexadecimal values 0xC2 and 0xA9. If that sequence is examined by an application that presumes text strings in Exif metadata to have been encoded in ISO-8859-1, it will display it as '©' - not cool.

So, what should Photoshop do? The dilemma is that there is no legitimate way (that is, in conformity with the Exif specification) to embed a '©' in the Exif copyright notice item at all. But of course it would not have been practical for the Photoshop designers just to say "we cannot handle that - it is contrary to the specification".

But, given general practice in this area, it would probably have been more useful for Photoshop, when seeking to embed non-ASCII characters in Exif metadata, to use the ISO-8859-1 encoding.

Metadata reading applications

A "strict" Exif metadata-reading application, encountering the character '©' in ISO-8859-1 encoding in the Exif metadata, would reject it as a non-character (perhaps displaying instead a substitute character).

But in fact, most Exif-metadata reading applications take a practical view, and will display the character as '©'. But, faced with a Photoshop-generated file, they will display in this spot what they think they have seen in the file: '©'.

Some applications, developed by those aware of the Photoshop practice (BreezeBrowser is a good example), will render the byte sequence 0xC2 0xA9 as '©', essentially treating the incoming data as being in Unicode UTF-8.

What it the file was not generated by PhotoShop, and the user has placed in the copyright notice string the character 'Â' ('Copyright 2010 Â. Yert') (encoded as ISO-8859-1)?

Typical Exif readers render that byte sequence as intended.

BreezeBrowser renders the resulting byte sequence as 'Copyright 2010 ® Yert'. (I'll spare teh reader the analysis of how that happens.)

Photoshop renders it as intended (but then, if we let it rewrite the file, puts it into UTF-8 form.

(By the way, 'Â' in Unicode UTF-8 has a two-byte representation.)

What should happen?

• CIPA (the issuer of the DCF file specification, essentially the form of the Exif file format used by most digital cameras today) should amend the specification to provide for general text items in Exif metadata to be encoded in ISO-8859-1 form.

• Adobe should arrange Photoshop to encode text data in the Exif metadata area in ISO-8859-1 form.

• There should be peace between Israel and Palestine.

So, what should we do?

There is no foolproof solution at present to this problem. Perhaps best would be for those who embed a copyright notice in their files via Photoshop to not include the character '©'. It is not a mandatory part of the copyright notice as prescribed under US copyright law. (However, it does play a role in the international scheme of copyright protection).

#​

Best regards,

Doug
 
Last edited:

Asher Kelman

OPF Owner/Editor-in-Chief
Doug,

In simple terms, what is in practice, wrong with adding a © notive in the copyright fields of File Info in PS or in the other catalog programs.

Does it get destroyed somehow or is it merely non-conforming like premarital sex in the Catholic Church!

Asher
 

Doug Kerr

Well-known member
Hi, Asher,

Doug,

In simple terms, what is in practice, wrong with adding a © notive in the copyright fields of File Info in PS or in the other catalog programs.

Does it get destroyed somehow or is it merely non-conforming like premarital sex in the Catholic Church!
The "only" problem is that many Exif metadata reading programs, examining a file created by Photoshop, present this character, as encoded by Photoshop, as '©', which some of us consider nikulturniy.

This typographic curiosity has none of the redeeming value of premarital sex.

Best regards,

Doug
 

Doug Kerr

Well-known member
HI, Rachel,

So Alt 0 1 6 9 doesn't work? I'm confused.
Let me give a simple example.

In Photoshop, you enter in the Copyright notice field of the File Info panel your copyright notice, using Alt+0169 to include the character '©'.

When you look at the finished file with many applications that read and display the Exif (and other) metadata, you will see '©' instead of '©'.

Here is an example with the Opanda Iexif metadata reader (which I use as a browser plugin to examine the metadata of posted images):

Photoshop_copyright_Opanda_01.gif


Best regards,

Doug
 

Doug Kerr

Well-known member
Just for completeness of the record, let me describe here a collateral issue of character encoding in Exif metadata which, to the best of my knowledge, does not lead to any widespread difficulty nor need for attention from us in our usual work.

At issue is the Exif metadata item UserComment.

The Exif specification (as well as the DCF specification, an elaborated form of the Exif specification that governs common digital camera image files) provide that this data item can be in either of three coded character sets:

• ASCII (meaning just that, not "ASCII or ISO-8859-1"
• JIS, referring to a 16-but Japanese-language character set
• Unicode (no initial mention of UTF-8 vs. UTF-16BE vs. UTF-16LE)

The character set used is declared by a text prefix to the data item itself, 'ASCII', 'JIS', or 'UNICODE', in ASCII, built out to 8 characters/bytes with NULs.

My experience is that typical Exif editors, when adding or modifying this items, will apply:

• The ASCII encoding if the string entered by the user contains only ASCII characters.

• The Unicode encoding if the string entered by the user contains any characters beyond ASCII.

Although not mentioned in the standard, evidently it is the practice to use UTF-16 encoding in the Unicode mode. In the case of characters in the Basic Multilingual Plane (all those we are likely to encounter), this uniformly uses a 16-bit number (recorded as two bytes) to represent a character.

With regard to the little-endian vs. big-endian distinction (the order of the two bytes representing a 16-bit number), that is declared in an Exif file on behalf of the whole file. In the header, there is a two-byte ASCII character bit order indicator, either 'II' (evocative of "Intel") for little-endian or 'MM' (evocative of Motorola") for big-endian.

Photoshop does not either display or allow us to enter or manipulate the Exif metadata item UserComment. Neither does it seem to tamper with existing data in that area.

For example, if the data item is declared to be encoded in ASCII but in fact contains a non-ASCII character (never mind how that got there), Photoshop does not, for example, rewrite the item in Unicode to attain orthodoxy.

Typical Exif metadata reading applications seem to interpret non-ASCII characters found in a UserComment declared as ASCII as if they were in ISO-8859-1 (a reasonable accommodation of something we should, however, not expect to encounter). I have not done a comprehensive survey in this regard.

As I said at the outset, I cannot just now imagine a situation in which the details of this are of any concern to our operations.

Best regards,

Doug
 
Top