The importance of metadata | Centro de Autonomía Digital

January 22, 2022

When we talk about privacy, the main focus is usually on encryption of regular data, but to what extent does metadata expose us? What risks are involved in being unaware of this type of data, and their disproportionate use?

To understand these implications it is necessary to understand metadata as a set of data that describes and gives information about other data, with the following characteristics:

It is usually present in the background - far from view.
It always accompanies data, which it describes or about which it provides information (such as the latitude and longitude of the place where a photograph was taken with your phone’s camera).
It can’t always be encrypted, since it is often necessary to be able to process it (for example, the email address of the recipient of a message).
Many times it’s simply not considered part of the data that exposes our privacy and, therefore, it is usually not encrypted.

In the context of this definition, metadata originates from our everyday actions. Some are generated in a totally conscious and intentional way, as in the case of the title, abstract and keywords of a paper, where we know that this information allows it to be indexed in databases. Others are generated automatically without us being aware of it, or without us knowing who collects it or how it’s used. To add more context, think of every time you go to the supermarket: more or less on the same date each month, or with the same frequency (every two weeks, for example), you regularly buy the same products and even use the same means of payment at the end of your purchase. In this simple action you have generated a large amount of metadata that allows an observer to identify you and build a detailed profile of you and granting enormous power to the supermarket, since now it knows your tastes and will know what products to offer you and on what date, since you are very likely to buy them the next time.

In a less specific approach, when you take a photo with your cellphone and you don’t realize that the GPS sensor is activated (even if it isn’t), you are associating your photo, among other things:

With the coordinates of your position at the time of taking the photograph.
With the date and time it was taken.
With the settings of your camera and therefore, of your device.

In an even more day-to-day approach, every time you turn on your TV to consume content from your streaming provider (Netflix, Disney plus, etc), you generate metadata which can be used to determine:

The time of day you are at home.
Your taste in television content.
If there are underage people in your home.
The type of TV you have.
Information about your internet connection.
Physical location, and much more.

But does this metadata represent some kind of threat to you?

At least for our privacy, yes. The research Evaluating the privacy properties of telephone metadata carried out by Stanford University, published in 2016 and in which more than 800 volunteers participated, where only metadata was collected from more than 250,000 calls and more than 1,200,000 text messages, the sensitivity of the metadata generated by calls and text messages was determined. The metadata made it possible to not only identify people and their movements and frequency of visits to one or another particular place, but the analysis of the metadata generated by calls and text messages also allowed the creation of detailed profiles of the participants with a very high degree of precision. As an example, during the period of their participation in the research, it was determined that one of the volunteers suffered from a heart condition. In another case, it was concluded that a volunteer was interested in purchasing an AR-type semi-automatic rifle. This is individual profiling information that when placed in the “right” hands – insurance companies or governments - becomes a true attack against our privacy and even our security.

These types of research results shed light on statements such as those of the former US president, Barack Obama in 2013 or the director of the NSA, James Clapper, implying that it is not necessary to access the content of our calls or messages to identify us and profile us in detail.

For this reason, transparency in national security programs about the tracking and reading of the metadata we generate is essential so that the citizens of any nation can understand the true reach that companies and governments have when they collect and analyze metadata, instead of making us believe that it is trivial information.

At this point, we wonder: what can we do about metadata in order to strengthen our already-violated privacy?

Necessarily, the answer points to training and socialization in massive events - such as Privacy Week -, about the implications of metadata for our privacy, but this shouldn’t be left as an individual responsibility about having the knowledge of what metadata we generate and how. Since, as mentioned before, there is metadata that is generated automatically and we cannot participate in that process, it is then necessary for the government to play a role to shield our fundamental right to privacy with regulations controlling who can access it, collect it, analyze it and for what purposes.