MarkLogic ties its database to Hadoop for 'Big Data' support
The XML-powered data store specializes in handling unstructured information
IDG News Service - You can add MarkLogic to the growing list of database vendors rushing to embrace the open-source Hadoop programming framework for large-scale data processing.
MarkLogic 5, which became generally available on Tuesday, includes a Hadoop connector that will allow customers to "aggregate data inside MarkLogic for richer analytics, while maintaining the advantages of MarkLogic indexes for performance and accuracy," the company said.
MarkLogic is a "real, enterprise-class database, but it uses XML and XQuery instead of SQL, so it's well-suited for certain classes of applications," said analyst Curt Monash of Monash Research. "They have a nice scale-out story and they're dotting some i's and crossing some t's on industrial-strength performance."
The database's calling card has been its ability to manage, index and serve up large amounts of unstructured data, from text documents to media files.
It makes sense for MarkLogic to support Hadoop, Monash said.
"There are some multi-structured data use cases that are an obvious fit for MarkLogic over Hadoop and vice versa," he said. "Any integration lets you straddle them and get broader reach."
For example, an insurance company may have a set of documents numbering in the billions that it wants to pull up one by one and perform analytics on each, he said. "That would be a great use case for the combination," with MarkLogic handling the first part and Hadoop the second, he said.
The Hadoop tie-in reflects the broader trend around "Big Data," an industry buzzword that refers to the ever-increasing amount of unstructured information from sources apart from traditional enterprise applications, such as social networking sites and sensors.
Meanwhile, another new feature in MarkLogic 5 tries to make the most of the mix of storage customers might have, said CTO Ron Avnur. "We realized people have rotational drives and network-attached storage, and are starting to play more seriously with solid-state. These have different performance profiles."
System administrators will tell MarkLogic where and what the options for storage are, and the system will "do all the optimization." In this way, more frequently used data can be kept in flash and older or less frequently accessed information held elsewhere.
The new release also adds dashboards for overseeing multiple MarkLogic clusters. Customers may have development, test and production systems, and "they want to understand what's going on across those," Avnur said.
Also new are tie-ins to the Nagios open-source monitoring framework and Hewlett-Packard's Operations Manager software, as well as an API (application programming interface) that can be used to integrate with other management systems.
In addition, MarkLogic 5 features the ability to keep a "hot copy" of the database in another data center for quick failover in the event of a disaster, as well as a journal-archiving function that allows a database to be restored to a particular point in time.
The company is also rolling out a new version of its developer edition, with the chief change being that customers can now use it in production. It's limited to a single two-CPU node and 40GB of data.
The company is small compared to database giant Oracle, with $50 million in revenue through the end of last year, but is growing quickly, according to Bill Veiga, vice president of solutions marketing.
It has 275 distinct customers and more than 500 implementations, Veiga added.
Chris Kanaracus covers enterprise software and general technology breaking news for The IDG News Service. Chris's e-mail address is Chris_Kanaracus@idg.com


- Excel 2010 Cheat Sheet
- Register for this Computerworld Insider Cheat Sheet and gain access to hundreds of premium content articles, guides, product reviews and more.
- Establishing a Strategy for Database Security is No Longer Optional
- The options for securing increasingly valuable databases are very broad and deep, and can be confusing. This research provides an overview of three...
- Driving Secure Enterprise File Sharing and Syncing in the Enterprise
- GroupLogic's new activEcho is the industry's only secure Enterprise File Sharing and Synching solution that balances the need for simplicity for the end...
- The Enterprise File Sharing Option
- Enterprises and IT departments need to address several critical security issues when considering file sharing and syncing products. Many of today's solutions do...
- Activities Streams Base An Integrated Social Layer
- The enterprise social software market is exploding thanks to converging trends of consumerization, cloud, and mobile. In this must-read report, "The Forrester Wave:...
- Converged Infrastructure for Dummies
- As you know, everything is mobile, connected, interactive, and immediate. This is exactly why organizations need a highly agile IT infrastructure in order... All Applications White Papers
- Delivery Management -- Extending Lifecycle Management
- Date: Wednesday, June 20, 2012, 1:00 PM EDT
Siloed organizations continue doing the wrong things and doing things wrong, leading to increased costs,... - Leverage automation today to reduce IT complexity
- Date: Tuesday, June 5, 2012, 2:00 PM EDT
Whether your B2B complexity is caused by multiple technologies due to M&A, business or application specific... - BMC Control-M - Single Point of Control Demo
- With BMC Control-M, you schedule and manage everything - down to the very last platform and application - from one simple interface. It's...
- Operational Analytics - Changing the Competitive Dynamics of the Business
- Date/Time: June 5, 2012, 11:00 a.m., EDT, 4:00 p.m. BST / 3:00 p.m. UTC
Please join us for this webcast, as Dr. Barry... - Oracle Database Appliance Best Practices
- Business users increasingly demand 24x7 availability of their data while IT departments face the challenge of ensuring maximum availability while operating with limited... All Applications Webcasts