We often hear about the pressure that traditional media is under today. Falling ad revenues, the rise of citizen journalism and reduced attention spans mean that deep investigative journalism is growing increasingly rare. So it is interesting to read a story where seemingly deep research has taken place. That was the case last year when two French journalists from Le Monde received access to a highly complex dataset. The two obtained data detailing over 100,000 clients and related bank accounts at the Swiss branch of HSBC. The data pointed to unethical and fraudulent practices.
The problem was that the complexity and mass of the data meant that traditional investigative approaches simply wouldn't work. Traditionally, reporters have to try and spot relationships between data in Excel files, conduct manual Internet searches and sometimes physically draw out connections between people and entities to get the right facts for their stories. It would have taken years for the two to unravel the data. This is where the International Consortium of Investigative Journalists (ICIJ) came in.
The ICIJ is a global network of more than 190 investigative journalists in more than 65 countries who work together on in-depth investigative stories. Founded in 1997 by American journalist Chuck Lewis, the ICIJ was launched as a project of the Center for Public Integrity to extend the center’s style of watchdog journalism, focusing on issues like cross-border crime, corruption and the accountability of power. Backed by the center and its computer-assisted reporting specialists, public records experts, fact-checkers and lawyers, ICIJ reporters and editors provide real-time resources and the latest tools and techniques to journalists around the world.
The leaked data included information from account holders in over 200 countries and had a collective account total of over $100 billion. The challenge was to find a solution to analyze and visualize that data without the need for data scientists. This is where a graph database solution came in.
“While working on stories like Offshore Leaks, I learned how important graph analysis is when investigating financial corruption,” said Mar Cabra, editor of the Data and Research Unit at the ICIJ. “Connections are key to understanding what the real story is: they show you who’s doing business with whom. We decided early on that we needed to use a graph-based approach for the HSBC Leaks.”
Cabra re-created the Excel files in a database and connected every name to one or several countries. Finally, the team turned the data into a graph format to explore the connections between nodes. The resulting graph database had more than 275,000 nodes with 400,000 relationships among them. A Web application was used as a user interface to visualize and provide access to the data for reporters. This enabled the journalists to identify the connections between people and bank accounts and, over time, find the connections and instances of fraud, corruption and tax evasion.
After Cabra’s team shared the tool on the ICIJ’s virtual newsroom, journalists worldwide tapped into the dataset and the graph analysis tool within their respected regions, querying data on a worldwide scale. By being able to easily visualize the networks around clients and accounts, they found many more connections than they had before, which led to new stories that later made front pages all around the globe. Prior to this, lone reporters had to establish connections by hand with the information of dozens of files, a time-consuming task that could yield inaccurate results.
The results of applying these technologies to the raw data speak for themselves — in February 2015, more than 50 news organizations worldwide (including Le Monde) revealed how HSBC had helped criminals, traffickers and tax evaders and profited from doing business with them, by helping shelter over 100,000 clients with accounts worth $100 billion in Switzerland.
This article is published as part of the IDG Contributor Network. Want to Join?