Overcoming the big performance challenge of Big Data in finance

In the world of enterprise computing, Big Data isn't a choice anymore -- it's the new reality.

Digital content is expected to grow to 2.7 zetabytes this year, according to a recent IDC projection. Ninety percent of that information will be unstructured web-based content such as videos, text files and social media.

Businesses use Big Data tools to analyze and derive value from those massive data sets. They gain competitive advantage in the process, however, that competitive advantage can only be realized if data can be processed intelligently, efficiently, and results delivered in a timely manner.

Big Data means big money

The financial industry is and continues to be flooded with data. Trading volume for options grew to 4.55 billion contracts in 2011, an increase of nearly 17 percent from 2010, and the ninth straight year options activity increased. The American Bankers' Association estimated that around the world, there are 10,000 credit card transactions per second. Forrester projects that 66 million U.S. households will use online banking by 2014.

These are just a few examples of the never-ending examples of Big Data that are being generated in the financial industry that needs to be analyzed.  Being able to intelligently process this data and act on it quickly can be worth billions of dollars.  Investment firms and financial services companies use Big data in different ways.  To name a few use cases:

  • Banks and personal-finance websites collect and analyze customer data in order to deliver personalized products and services, leading to higher customer satisfaction and retention.
  • Analytics are now being used to improve the recovery of bad debt, taking into account specific customer circumstances, improve recovery rates and also reduce recovery costs.
  • Payment platforms and firms are using Big Data capabilities to better detect fraudulent activity, moving away from traditional sampling techniques to being able to process all transactions and thereby more accurately assessing risk.
  • Enterprises are using Big Data to see how their IT systems are performing and behaving, analyzing and indexing the data generated by all the IT infrastructure, allowing improved uptimes and overall operational efficiencies.

In memory we trust

Most Big Data deployments today use traditional batch processing techniques to facilitate the parallel processing required to process the huge amounts of data needing to be analyzed.  Processing of Big Data is typically divided up into small tasks, and these small tasks are distributed across hundreds or thousands of servers for computation, and then combined back together again to form a final result.  This approach is similar to "Grid" architectures that many financial firms have experience in deploying.  The approach works and is scalable, however it is inherently limited in its ability to generate real-time results.

To facilitate speed and real-time results, a new approach is emerging where subsets of the Big Data are held and processed inside a server's fast local memory rather than having to access the same information from slow disk storage or from another server. Such "in-memory computing" enables dramatically faster computation and analysis. In the world of finance where speed is everything, the use of in-memory computing can make or break a firm's competitive advantage.    

The use of in-memory computing does have its share of challenges. In the financial world, many key applications, from trading platforms to financial services websites, are built using the Java language and therefore use a Java Virtual Machines (JVM) to run.  Traditional JVMs, however, were not designed to scale to the memory sizes required for real-time Big Data analytics.  Above a certain memory size, e.g. 5 GBytes, applications running on traditional JVMs will pause or have performance "hiccups" caused by Java's garbage collection process.  Garbage collection happens inside the JVM and is responsible for collecting memory from 'dead' objects that the program is no longer using.  It happens at unpredictable intervals and stops the Java program from executing until it is completes.  The "stop-the-world" garbage collection pauses are often considered the Achilles Heel of Java because of the poor response or query times that may result.  In-memory computing may be the right approach in order to perform real-time big data analysis, but only if this garbage collection problem can be solved.

Keep the world turning

Programmers can attempt to mitigate the duration of those problematic garbage collection pauses by keeping memory small. But that largely defeats the purpose of using in-memory computing. Commodity servers can now be purchased with a terabyte of local memory (and this amount continues to double every 18 months), and instead of limiting memory size, financial firms should be able to architect their big data systems to use a lot of memory and perform consistently.

Pauseless execution, in which the JVM collects garbage all the time, is the antidote for these nasty stop-the-world pauses. A pauseless-execution JVM enables elastic use of memory, scaling easily and performing flawlessly regardless of the amount of memory being used. A pauseless-execution JVM used with in-memory computing can utilize hundreds of gigabytes (and beyond) of the Java heap, without the threat of freezing, and therefore is a key enabler to realize the goal of real-time big data analysis.

Getting bigger all the time

Companies serving the Big Data market are poised to grow at a compounded annual rate of 58 percent over the next five years, according to a 2011 Wikibon survey, making it a $50 billion industry by 2017. There is no question that Big Data is a big market.  When combined with in-memory computing, big data holds big promise for real-time analytics - but only if your underlying system platforms are architected appropriately.

Scott Sellers provides strategic leadership and visionary direction as the CEO and co-founder of Azul Systems, which delivers high-performance and elastic Java Virtual Machines (JVMs) so enterprises can effectively scale while meeting business objectives.

Join the discussion
Be the first to comment on this article. Our Commenting Policies