Tokutek boosts MySQL scalability for big data applications

Company claims new data structure delivers better performance than conventional database tools

The relative inability of MySQL database technology to deal with large data sets is pushing companies to consider so-called NoSQL alternatives such as Hadoop when they need to analyze massive amounts of data.

One company that's hoping to stem the tide is Tokutek, a Lexington, Mass.-based vendor that for two years has been quietly pitching technology that's designed to scale MySQL well beyond its current limitations. The technology was developed at MIT, Rutgers and SUNY Stony Brook.

On Tuesday, Tokutek introduced a new version of its TokuDB database storage technology that features hot indexing and hot column-addition capabilities that it claims boost the usability of MySQL for "big data" jobs. The new features in TokuDB v5.0 are designed to allow companies to add indexes and columns without having to bring down the database.

"Big data" is a term that's increasingly being used to describe very large volumes of unstructured and structured content -- usually in amounts measured in terabytes or petabytes -- that a growing number of companies want to harness and analyze.

Conventional relational database management technologies, which use indexing for speedy data retrieval and complex query support, have been hard pressed to keep up with the data insertion speeds required for big data analytics. This is especially true of MySQL whose ability to quickly accept new data begins to fall off once a database gets bigger than about half a terabyte.

"The Achilles' heel of relational database management systems has been indexing," said John Partridge, CEO of Tokutek. Typically, MySQL starts hitting disk when table sizes start exceeding more than 100 million rows or so, he said. "So if you live outside of that 100 million-row size, you need to really look at TokuDB," he said.

TokuDB's query acceleration and index scalability for MySQL come from its data structure, Partridge said. Commercial RDBM products, such as those from Oracle, SQL Server and MySQL, all use a 1970s data structure called B-tree for storing and organizing data on disk.

The structure is optimized for quick data retrieval and complex querying but can be a problem in situations were large volumes of rapidly arriving data -- such as clickstream data, log files and social media data -- need to be stored and quickly queried.

In contrast, TokuDB is based on a data structure called Fractal Tree, which supports the same query performance as B-tree but features data insertion speeds that are two orders of magnitude faster than MySQL storage engines, Partridge said. TokuDB's Fractal Tree indexes data at near disk bandwidth rates and is designed specifically for large multiterabyte databases, he said. Tokutek claims that TokuDB was 19 times faster than MySQL's InnoDB in a benchmarking test involving the insertion of 1 billion rows into a table.

Jawa, a 200-employee Phoenix-based company that delivers a range of mobile and gaming applications, is an early user of TokuDB. The company uses the technology for analyzing millions of log files every day.

"Our in-house expertise is primarily with MySQL, so we were looking for a product that would allow us to leverage those skill sets," said Ernie Souhrada, chief IT architect at Jawa.

One of the reasons the company chose Tokutek's technology was because TokuDB is able to maintain a more constant insertion rate over a large number of rows compared to InnoDB, he said. "We were also looking at TokuDB as a replacement for traditional MySQL/InnoDB in our main OLTP systems," because it offered an easy migration path, said Souhrada.

Tokutek's pricing model also made TokuDB a substantially cheaper alternative to an in-memory database technology that Jawa was also considering, he said. Tokutek's pricing is based on increments of 100GB, so in addition to being cheaper, it's also more granular, he said.

The price is $2,500 per year for every 100GB of data up to a maximum of 5 terabytes. TokuDB is available for free for up to 50GB of data.

"The best advice I can give to anyone considering TokuDB is to benchmark it with as close to a real-world scenario as you can," Souhrada said. "Remember that it isn't InnoDB, so it won't behave the same way," he said.

Profile Technology, a U.K.-based company that develops applications for Facebook, MySpace, Bebo and others is another user of TokuDB. The company offers an advanced search capability on Facebook that requires it to maintain regularly updated public information on more than 420 million Facebook profiles.

Profile Technology started using TokuDB when its MyISAM and InnoDB storage engines began faltering under the load and database updates began taking increasingly longer. "With over 420 million profiles, we have to run searches on a database of similar scale to Facebook itself," said Chris Claydon, managing director of Profile Technology.

"However, the searches we run on it are far more complex and powerful than Facebook's own search tools allow," he said. "TokuDB was one of a number of technologies that we combined in order to [run] our entire operation from a single high-speed database server," for better cost-efficiencies.

Joseph Martins, managing director of IT consultancy Data Mobility Group, said that while Tokutek technology can be very useful for MySQL shops, the company's performance claims still need to be fully vetted. "Everything sounds great, but we need to have more data" from actual customer installations before the performance claims can be verified, he said.

Jaikumar Vijayan covers data security and privacy issues, financial services security and e-voting for Computerworld. Follow Jaikumar on Twitter at @jaivijayan, or subscribe to Jaikumar's RSS feed . His e-mail address is jvijayan@computerworld.com.

Join the discussion
Be the first to comment on this article. Our Commenting Policies