Research: Predicting crime with anonymized mobile data vs re-identifying users is easy

In the battle of research, one group claims that anonymized mobile data can predict crimes but another claims that re-identifying users via their location data is easy.

Javier Brosch

Often when talk turns to predicting crime, it also comes with comparisons to Minority Report, but that hasn’t stopped police departments from embracing predictive analytics technology. In fact, data mining to predict crime is increasingly popular such as the predictive policing system known as PredPol and predicting crime by using Twitter. “This is not Minority Report,” the professor who helped develop PredPol was quick to point out. “This is about predicting where and when crime is most likely to occur, not who will commit it.” Being that there are about 6.8 billion active mobile phones, then it shouldn’t be surprising that researchers have now used the data collected from those devices to predict crime.

Using anonymized and aggregated mobile network data from Telefonica Digital’s Smart Steps, which provides “hour by hour insight into the movement and behavior of crowds,” and combining it with datasets for the geolocation of historical criminal cases and London borough profiles, researchers came up with “a methodology to automatically predict with almost 70% of accuracy whether a given geographical area of a large European metropolis will have high or low crime levels in the next month.”

This is an example of Smart Steps data:

Telefonica Digitals Smart Steps example Once upon a crime: Towards crime prediction from demographics and mobile data

In the paper “Once upon a crime: Towards crime prediction from demographics and mobile data” (pdf), researchers explained how they exploited big data such as “aggregated and anonymized human behavioral data derived from mobile network activity to tackle the crime prediction problem.” Their findings “provide evidence that aggregated and anonymized data collected by the mobile infrastructure contains relevant information to describe a geographical area in order to predict its crime level.”

They mapped London crime hotspot predictions such as the one below:

Once upon a time crime hotspot predictions Andrey Bogomolov, Bruno Lepri, Jacopo Staiano, Nuria Oliver, Fabio Pianesi, Alex Pentland

They added, "The proposed approach could have clear practical implications by informing police departments and city governments on how and where to invest their efforts and on how to react to criminal events with quicker response times."

While that sounds good, when you talk about predicting crime, people tend to worry that the technology could potentially be abused and end up being applied to people and not geographic areas. Although we are assured that the mobile data was anonymized, other researchers have continued to find ways to de-anonymize data and link it back to real people. Too often, anonymized data doesn’t actually mean users are anonymous.

Such is the case in a new location privacy study, “Not so unique in the crowd: A simple and effective algorithm for anonymizing location data” (pdf). Singapore researchers found that “human mobility trajectories are highly re-identifiable and the privacy risk is high.”

The researchers looked at a “human mobility dataset for more than half a million individuals over a period of one week. The location of an individual is specified every fifteen minutes.” That gives each user’s path, or GPS coordinates, that the researchers refer to as a trajectory. Even if that data is anonymized, the longer a single user is tracked, the easier it is to re-identify that person. In fact, they found that “individuals are highly re-identifiable with only a few spatio-temporal points.”

They proposed a new anonymization method for trajectory data that reduces privacy risks.

Our method is based on the insight that the uniqueness of a user's trajectory increases with the length of the trajectory. Take, for example, the trajectory of a single user over a duration of 24 hours. For the trajectory to be not unique, there has to be at least one other user who has been in the same location as the first user for every point in time during that 24 hour interval. It is obvious that the chance of such a set of other users existing is low and that the chance is diminishing the longer the trajectory is.

On the other hand, for a short period of time, let's say a few hours, we can expect that there is a good chance of other users being in the same location, at least in a densely-populated urban environment. Instead of reducing the resolution of the location information, we disintegrate the trajectories into a set of shorter sub-trajectories for different time windows by "cutting" the original trajectories into shorter sub-trajectories that we expect to have lower uniqueness. Note that our method provides a simple mechanism to balance privacy and utility of the trajectories.

Not so unique in the crowd Yi Song, Daniel Dahlmeier, Stephane Bressan

If that method were used, does that mean the crime predictors could not track specific individuals and lead to Minority Report abuses? No, but it might be a step in the right direction.

Computerworld's IT Salary Survey 2017 results
Shop Tech Products at Amazon