City of Chicago develops big data platform to improve the lives of citizens

Government data stockpiles continue to proliferate but their power often remains trapped in unexploited silos. The City of Chicago has unleashed the potential of its numerous data sources, from emergency calls to inspection records, in an in-house application that adds deep data analysis to a simple map interface build on MongoDB.

WindyGrid is a real-time situational awareness application used to see where and when events are happening in the city so they can respond faster and improve the quality of life of residents.

A selection of 36 data sources is integrated into the MongoDB database to understand all manner of issues in the city as they develop over time. They include 911 calls, the non-emergency 311 line, business licenses, building violations, Tweets, city traffic,  weather, emergency vehicles, and environmental complaints.

"It covers almost every single dimension of the city," says City of Chicago chief data officer Tom Schenk. "When you work with a department and they want to add their data, we say: 'That's no problem. What do you need to do? How do we use this to shape your operations?' We're very mindful of modifying the application so it's actually servicing and approving our operations."

The data is used to identify issues around everything from marathon routes, traffic accidents and criminal activity to managing disease outbreaks and opioid overdoses and plan how they should be managed by the government.

The city also used the application to improve food safety inspection. There are 16,000 restaurants in Chicago, but only 36 inspectors to check them. The predictive analytics in WindyGrid were used to plan which establishments to check.

The city's restaurants are all displayed as dots on a map, with information on the type of violation available by hovering over each. Rules can be added to reveal which failed their last inspection, and then display them on a heatmap to understand the areas where failures are most common.

[Read next: Ten of the best self-serve analytics and business intelligence tools for enterprises]

Additional data sources such as weather, burglaries, sanitation issues, complaints, and alcohol or tobacco licenses in the vicinity can all be incorporated to understand how different factors influence inspection results.

Rudimentary machine learning then helps then categories which restaurants are most likely to fail. The sanitation manager can use this information to prioritise resources and send their inspectors to only places most at risk.

To measure the impact that the predictions had on finding critical violations a double-blind study was conducted to measure the impact that the predictions had on the speed at which the team could discover critical violations.

“That increased by 25 percent," says Schenk. "Over an eight-week period we were able to shave down the average time it takes to find a critical violation by over a week."

Such testing is essential to ensure that the plans built on predictive analytics are effective.

"We fail every single time the first time around,” admits Schenk. "Inevitably we misunderstand something and our experiment doesn’t go well, so we go back we tweak it, we understand what the miscommunication was and we go back to it."

The system has also been used to mitigate public health risks. In summer mosquitoes arrive in Chicago keen to spread West Nile Virus, a disease that can cause headaches, vomiting, diarrhoea, and even death. WindyGrid was used to see which of the 189 mosquito traps in the city caught the most mosquitoes on a bubble map and used that to prepare a plan on to how to limit their spread.

Developing WindyGrid

The idea of WindyGrid emerged when Chicago was slated to host both the NATO and G8 summits in 2012. The incumbent Mayor Rahm Emanuel needed to ensure the city's high-profile guests were taken care of by fully understanding what was happening in the city. WindyGrid was developed to provide this information in a single central location.

MongoDB offered a database that let the government easily add new data without spending too much time on data modelling and capture it on a map using latitude and longitude GPS coordinates.

The option of using a relational database was rejected due to the extra data modelling that would multiply the time spent adding additional data sources.

"It's very important for us to be very lightweight, to be able to iterate and experiment," says Schenk. "We're able to play around with MongoDB without having to go through a procurement process, which of course is a killer.

"It allows us to experiment with open source technology, and if it works - as it has – we’re able to scale it out."

The first iteration took six months to implement and they’ve since built a second version. There have been bumps along the way, but Schenk has been able to count on the support of the MongoDB development team to deal with them.

"At one point Mongo actually released a Chicago-specific patch," he says. "We discovered a bug very early on and they released a patch just for us to be able to get past a bit of a hitch that we had. They later incorporated that into future releases and it's part of our environment now."

[Read next: MongoDB launches 'Stitch' backend as a service tool to simplify app development]

The City of Chicago also collects information on every single taxi, Uber and Lyft ride in the city. Schenk next wants to add that transit data to WindyGrid to understand where trips begin and end and manage the transport infrastructure around them. Privacy concerns are mitigated through authentication and user control levels and avoiding collecting information about the individual citizens.

The code driving the predictions has been released open source to let others take advantage and share their own improvements with other users.

The food inspection model has been piloted in five further US cities, and Chicago has recently signed up to a data alliance with London. Schenk also chairs a group of 20 cities called the Civic Analytics Network established to collaborate and share data learnings for mutual benefit.

"We know that if we work together as a network we can do that," he says. "By using and producing open source solutions it makes it that much easier to pick up projects from somewhere else and then to adopt them."

Copyright © 2017 IDG Communications, Inc.

Shop Tech Products at Amazon