• Please use real names.

    Greetings to all who have registered to OPF and those guests taking a look around. Please use real names. Registrations with fictitious names will not be processed. REAL NAMES ONLY will be processed

    Firstname Lastname

    Register

    We are a courteous and supportive community. No need to hide behind an alia. If you have a genuine need for privacy/secrecy then let me know!
  • Welcome to the new site. Here's a thread about the update where you can post your feedback, ask questions or spot those nasty bugs!

Not-so-confidential confidantes

Jerome Marot

Well-known member
Published: Wednesday, December 8, 2010 here.

(The original publication is: Crandall, David, Lars Backstrom, Dan Cosley, Siddharth Suri, Daniel Peter Huttenlocher, Jon M Kleinberg. 2010. "Inferring Social Ties from Geographic Coincidences." PNAS 107 (52): 22436-22441. DOI: 10.1073/pnas.1006155107, text here)

Comparing the locations of photos posted on the Internet with social network contacts, Cornell University computer scientists have found that as few as three "co-locations" for images at different times and places could predict with high probability that two people posting photos were socially connected. The results have implications for online privacy, the researchers said, but also suggest a quantitative answer to a very old psychological question: What can we conclude from observing coincidences?

"This is a kind of question that goes way back," said Jon Kleinberg, Cornell professor of computer science, who conducted the study with Dan Huttenlocher, dean of the Faculty of Computing and Information Science, and colleagues. "Online data gives us new ways to address it," he said.

"Inferring Social Ties from Geographic Coincidences," is reported online in the Proceedings of the National Academy of Sciences (Dec. 8, 2010). David Crandall, a former Cornell student and postdoctoral researcher now at Indiana University, is the lead author on the study.

The researchers used a database of some 38 million photos uploaded to the Flickr photo-sharing website by about a half million people. The time and place where photos were taken was provided by GPS-equipped cameras or by people who used Flickr's online-interface to indicate the location on a map. Anyone can read this information from a Flickr page.

Flickr also offers a social networking service, and computer analysis showed that when two people posted photos several times from the same locations (often famous landmarks) and at about the same times, this was a good predictor that those people would have a social network link.

"It's not that you know with certainty, but it's a high likelihood that these people know each other," Huttenlocher said. As expected, the probability increases as the analysis moves to smaller areas and shorter time spans.

Flickr is just a convenient place to study the phenomenon, the researchers said. The same conclusions might be drawn from credit card purchases, fare card transactions on the bus and subway, and cell phone records, they suggested.

"It's surprising – and not in a reassuring way – that so much information comes from so little," Kleinberg said. "You go through life and leave all sorts of records. You're conveying information you deliberately wrote but also conveying broader information. Our research is trying to provide a way of quantifying these risks."

"While it's obvious that a photo you post online reveals information about what is pictured in the photo, what is less obvious is that as you post multiple photos you are probably revealing information which may not be pictured anywhere," Huttenlocher added.
 

Michael Nagel

Well-known member
Although not the same field of work, but one of the first uses of 'big data' was during the cold war, where the KGB officer Yuri Totrov used behavioural patterns to identify CIA agents.
Here it was used to identify the occupation of individuals, but the method employed is actually similar to the one described for the pictures on flickr (salon, Schneier, Philly).

Plus ca change...
 

Jerome Marot

Well-known member
It is an interesting article, but there are essential differences. Yuri Totrov looked for specific indicators in the available data, indicator which the CIA failed to obfuscate. The method described in my article makes no assumption whatsoever and applies to the entire dataset of photographers. It calculates the probability that a relationship is true as a function of the number of coincident photographs. It finds out that a surprisingly small number of coincidences is sufficient for a high probability.

Human intelligence, as the KGB used, and big data operate on different modes. The KGB had to be relatively sure of their assumptions and treated only a small number of diplomats. Big data only needs to be true in the vast majority of cases and can afford many more false negatives, but operates on the data of billions of people.

That, and my article is related to photography... ;)
 

Doug Kerr

Well-known member
Hi, Jerome,

Published: Wednesday, December 8, 2010 here.

A very interesting topic. Thanks for presenting it.

Your note gives me a chance to comment that in the excerpt you quote, "coincidence" is used in its proper sense: to refer to two things that "coincide" - period. Of course the possible significance of a "coincidence" is another matter, the topic of the discussion.

Sadly, the term has, in popular use, come to mean "when two things coincide when that is not very 'likely' ". That leads to such statements as, "Well, it's probably not a coincidence that your husband and his secretary were both in Ralph's Restaurant yesterday at lunch time." If they were both there, then it is by definition a coincidence.

Best regards,

Doug
 
Top