Skip the navigation

UN tackles socio-economic crises with big data

June 3, 2013 06:00 AM ET

Going forward, Baron says OAAI is seeking additional funding to continue the program, which is focused on low-income students who often drop out of school for economic reasons.

Bringing Social Issues to Light

Socially sensitive issues such as domestic violence, foeticide and child sexual abuse are taboo as topics of discussion in much of India. But these are precisely the issues that Bollywood actor and filmmaker Aamir Khan took on as topics of his TV show Satyamev Jayate, which translates to Truth Alone Prevails. The show's goal was to prompt discussion about these rarely talked about societal problems and to learn more about how Indian people thought and felt about them. It would be a first step toward resolving them.

To achieve this goal, Persistent Systems Ltd., a 2013 Computerworld Honors Laureate, set about monitoring and analyzing a massive amount of data it collected from social media channels immediately after each 90-minute episode of the program aired.

"The show is a cross between Oprah and 60 Minutes," explains Mukund Deshpande, head of BI/analytics at Persistent Systems. "The goal was to use social media to connect directly with people and close the loop as a way to have a conversation with viewers."

The show was carried on 13 TV channels in India, and each episode was posted to YouTube within 30 minutes of its airing. Each show immediately elicited millions of messages on Facebook, Twitter and other online discussion forums. The challenge, says Deshpande, was to make sense of long, complex messages that were very emotional and often contained stories of people's personal encounters with abuse.

This created a big-data problem both in terms of volume and network performance. The show was flooded with a staggering 1.09 billion impressions across social channels. All structured and unstructured data was analyzed in real time to convey the show's impact on legislation, society and individuals, which was displayed on a so-called impact dashboard.

Persistent Systems designed and developed the custom end-to-end analytics process in three weeks. The project was implemented using the latest distributed computing technology and Hadoop.

Adding to the unstructured data challenge, social media responses were in "Hinglish" (Hindi words in Roman script embedded in English). This ruled out using existing tools to handle messages, which is why developers created a customized system to understand response sentiment.

Deep analytics extracted valuable insights, Deshpande says. The new system aggregated all unstructured data then automatically filtered it to weed out spam and unrelated messages. Valid messages were tagged and rated. Short messages praising the show were rated lower than longer messages and personal stories. Final selection was done manually using triangulation to determine the top content.

Deshpande says that social scientists have expressed interest in using a similar process to conduct a new kind of social research. "Usually, they form a small group of people and study them intensely for three to six months," he notes. "But what we have here is exactly the opposite of that. We don't have rich data about a small number of individuals but data about millions of people, including their age and gender and how they feel about particular issues. It would be a new way to do social science research."

Read more about Management in Computerworld's Management Topic Center.



Our Commenting Policies