Zoom zoom: Upstarts speed past big BI vendors in data warehouse loading speeds
They're boasting load speeds of 4TB an hour or more
Computerworld - One of the more prosaic parts of data warehousing is, well, getting the data into the warehouse.
This has long been handled by vendors that are expert in the field of extract, transform and load (ETL). Even there, innovation focused more on the problem of transforming the data. Loading the data seemed a piece of cake by comparison.
That is, until business intelligence (BI) and analytics started becoming a round-the-clock affair. Also, today's biggest BI users -- banks, telecommunications providers, Web advertisers -- operate data warehouses larger than a petabyte in size and import huge swaths of data -- 50TB of data per day, as in the case of one of Teradata Inc.'s customers.
BI and ETL vendors are responding. The past several months have seen a number of start-ups and lesser-known firms touting screaming-fast data-loading speeds, both in the lab and in the field.
- Database start-up Greenplum Inc. said it has a customer routinely loading 2TB of data in half an hour, for an effective throughput of 4TB per hour.
- Rival database start-up Aster Data Systems Inc. claimed that its nCluster technology can enable customers to reach almost 4TB (specifically, 3.6TB) per hour.
- Data-integration vendor Syncsort Inc. said third-party-validated lab tests show its software can load 5.4TB of data into a Vertica Systems Inc. columnar data warehouse in under an hour.
- Not to be outdone, semantic data integration start-up Expressor Software Corp. claimed that in-house tests show its data-processing engine able to scale to nearly 11TB per hour.
"If they are really performing at this rate, it's quite significant and really impressive," said Jim Kobielus, an analyst at Forrester Research Inc., since "anything above a terabyte per hour is good."
Blazing past the incumbent BI and ETL vendors
What about the established firms? SAS Institute Inc. and Sun Microsystems Inc. two years ago demonstrated a SAS data warehouse running on Sun Microsystems hardware with StorageTek arrays that pushed through 1.7TB in 17 minutes, or the equivalent of nearly 6TB per hour.
But apart from SAS, other big-name vendors have posted data-integration performance benchmarks that fall well short of these upstarts.
- Three years ago, Informatica Corp. claimed its PowerCenter 8 software loaded data at a rate of 1.33 TB per hour. The company, which decline to comment today, hasn't posted any updated performance benchmarks.
- Oracle Corp. and Hewlett-Packard Co. last fall released the BI-oriented HP Oracle Database Machine, which they said loads data at up to 1TB per hour.
- Microsoft Corp. claimed at the launch of SQL Server 2008 a year ago that its SQL Server Integration Services 2008 had loaded the equivalent of 2.36TB in an hour.


- Excel 2010 Cheat Sheet
- Register for this Computerworld Insider Cheat Sheet and gain access to hundreds of premium content articles, guides, product reviews and more.
- Thinking Outside The Data Warehouse
- This high level, business problem focused eBook uses 5 customer scenarios to show how people and organizations are tackling real issues using IBM...
- Using BD for Smarter Decision Making
- This paper looks at new developments in business analytics and discusses the benefits analyzing big data bring to the business.
- Measuring the Business Value of CI in the Data Center
- One of the key strategies that IT teams are pursuing to reduce capital costs while boosting asset utilization and employee productivity is the...
- Switching Schedulers - Not As Complicated As You Think
- Changing or consolidating job schedulers may seem daunting. However, the benefits of switching to enterprise workload automation outweigh the risks. Read how BMC...
- Capture-Enabled Business Process Management
- Organizations today must deal with a vast amount of incoming information from many different sources. Efficient, automated business processes are critical to managing... All BI and Analytics White Papers
- InfoSphere Warehouse Packs Demo
- These flash modules make warehousing more tangible and relevant to business users through detailed explanations of the InfoSphere Warehouse Packs.
- Delivery Management -- Extending Lifecycle Management
- Date: Wednesday, June 20, 2012, 1:00 PM EDT
Siloed organizations continue doing the wrong things and doing things wrong, leading to increased costs,... - Leverage automation today to reduce IT complexity
- Date: Tuesday, June 5, 2012, 2:00 PM EDT
Whether your B2B complexity is caused by multiple technologies due to M&A, business or application specific... - BMC Control-M - Single Point of Control Demo
- With BMC Control-M, you schedule and manage everything - down to the very last platform and application - from one simple interface. It's...
- BMC Control-M - Single Point of Control Demo
- With BMC Control-M, you schedule and manage everything - down to the very last platform and application - from one simple interface. It's... All BI and Analytics Webcasts