MongoDB competes on speed and flexibility
IDG News Service - While debate rages on over the value of nonrelational, or NoSQL, databases, two case studies presented at a New York conference this week point to the benefits of using the MongoDB non-SQL data store instead of a standard relational database.
Representatives from both The New York Times and social networking service Foursquare, speaking at the MongoNYC conference held Wednesday in New York, explained why they used MongoDB. They praised MongoDB's ability to scale up and ingest lots of data, as well as its ease of reconfiguration.
"SQL databases have grown into these weird monstrosities. They don't really map to the problems you actually have, so you try to work around their warts," said Harry Heymann, the Foursquare engineer who oversees the company's servers, during his presentation. "MongoDB is a practical database for problems that engineers in the real world have. It was developed by people who built large-scale Web apps."
For The New York Times, MongoDB has been "awesome for flexible research and development," said Jake Porway, a New York Times data scientist, in his talk. Porway works for the news organization's research and development group, which looks at ways digital technology can enhance the presentation of news.
Porway also praised MongoDB's ability to ingest large amounts of data. "Mongo eats this data up," he said.
The New York Times used MongoDB, the open-source NoSQL data store developed by 10Gen, for its experimental Cascade data visualization tool.
Cascade visually demonstrates how links to New York Times stories get copied by multiple Twitter users, showing how messages get passed from one user to the next. "This was an exploratory tool that helps us understand how people share" information, Porway said.
Cascade depicts the number of people who pass a story link on to others, as well as how long it takes to pass this data around.
The New York Times posts 600 pieces of content every day, often putting links to those pieces of content on Twitter. Links to these stories get rebroadcast across Twitter an average of about 25,000 times a day, Porway said. The Cascade system saves all the Twitter messages, as well as the number of times each story link was forwarded and clicked on. All told, it produces about 100 GB of data each month.
"This allows us to [answer] questions that are really big, like what is the best time of day to tweet? What kinds of tweets get people involved? Is it more important for our automated feeds to tweet, or for our journalists?" Porway said.
The three-dimensional visualizations can show huge spikes in activities, which the user can then dig into to find more details, such as the actual messages.


- Excel 2010 Cheat Sheet
- Register for this Computerworld Insider Cheat Sheet and gain access to hundreds of premium content articles, guides, product reviews and more.
- Measuring the Business Value of CI in the Data Center
- One of the key strategies that IT teams are pursuing to reduce capital costs while boosting asset utilization and employee productivity is the...
- The Different Types of UPS Systems
- There is much confusion in the marketplace about the different types of UPS systems and their characteristics. Each of these UPS types is...
- SAS High Performance Analytics
- This paper explains how you can shrink decision times from days to seconds to quickly respond to changing business conditions.
- Drive Your Business with Predictive Analytics
- Predictive analytics has the power to significantly improve the bottom line. From better targeting and risk assessment to streamlining operations and optimizing business...
- The Analytical SMB: More Data, More Users, Less Time
- This Aberdeen Research Brief examines the key trends in business analytics and the tangible business impact effective analytics can have for SMBs. All Databases White Papers
- Oracle Database Appliance Best Practices
- Business users increasingly demand 24x7 availability of their data while IT departments face the challenge of ensuring maximum availability while operating with limited...
- Accelerate Document Processing and Wow Your Customers
- Learn how intelligent imaging and BPM solutions, coupled with pragmatic best practices and methodology, can improve productivity, lower cost, increase accuracy, reduce cycle...
- Distributed Database Security with Real-time Monitoring
- View this demo and learn how IBM InfoSphere Guardium database activity monitoring can help protect your sensitive data in distributed DBMS environments with...
- InfoSphere Warehouse Packs Demo
- These flash modules make warehousing more tangible and relevant to business users through detailed explanations of the InfoSphere Warehouse Packs.
- Delivery Management -- Extending Lifecycle Management
- Date: Wednesday, June 20, 2012, 1:00 PM EDT
Siloed organizations continue doing the wrong things and doing things wrong, leading to increased costs,...
All Databases Webcasts