Auto market researcher revs up Oracle grid for massive data warehouse
R.L. Polk plans to bring all 2.5 petabytes of data into Oracle 10g
Computerworld - Like a muscle car driving 55 mph on the freeway, R.L. Polk & Co.'s new grid-based data warehouse boasts gobs of untapped power under the hood, according to Kevin Vasconi, the company's CIO.
In May, the Southfield, Mich.-based automotive industry market research company finished moving its main 4TB customer-facing data warehouse to an Oracle 10g grid comprised of Dell PowerEdge servers running Linux.
The move has helped R.L. Polk save money and improve data redundancy, availability and access time. It also supports Polk's new service-oriented architecture, which is improving customer service, Vasconi said.
"We are getting more bang for our buck," he said. The data warehouse is doing 10 million transactions a day "without any issues."
Encouraged by the experience so far, R.L. Polk is bringing onto the grid other databases, both domestic and overseas, that total 2.5 petabytes of actively managed data. It's a process that will take at least 18 months, Vasconi said. And the amount of data is expected to grow 30% per year for the foreseeable future.
Founded in 1870 -- the same year the automobile's predecessor, a motorized handcart, was invented in Germany -- R.L. Polk started as a publisher of business directories. It became a car information supplier in 1921 and began using computer punch cards in 1951. The company is best known to consumers for its Carfax database of car histories.
Before its recent move to Oracle grid technology, R.L. Polk stored most of its data on Oracle 9 or 10 databases running Sun Solaris servers, connected to EMC gear running in storage-area networks.
Now, R.L. Polk's grid is comprised of 100 two- and four-way servers all running Red Hat Enterprise Linux. It also serves up applications and powers the rule processing engine. It can "easily double" to 200 servers, providing room for growth.
Only a tiny portion of the grid – four four-way servers – is apportioned now to the data warehouse. Much of it is devoted to running R.L. Polk's new Web-based applications, which both import data into the data warehouse from 260 discrete sources, such as car dealers or state licensing boards, and streams it out to paying customers, such as carmakers, car dealers and parts suppliers.
The data warehouse serves as R.L. Polk's "single source of truth" on a massive database that includes 500 million individual cars, or almost 85% of all cars in the world as of 2002. It also includes data on 250 million households and 3 billion transactions.
R.L. Polk cleanses the names and addresses of all incoming records, adds location data such as latitude and longitude, and, in the case of the 17-digit vehicle identification numbers unique to every car, extrapolates each car's individual features and styling. It's a complicated process, but as his team continues to tweak the Oracle grid engine, Vasconi expects to be able to shorten the importation time to less than 24 hours.
Looking forward, Vasconi said data already stored on vehicles' on-board computers -- such as engine-trouble history, GPS-based location history, average speeds and so on -- will soon be imported into the data warehouse, too, if privacy issues can be resolved.
"The car is a gold mine of consumer information," Vasconi said.
Read more about BI and Analytics in Computerworld's BI and Analytics Topic Center.



- Excel 2010 Cheat Sheet
- Register for this Computerworld Insider Cheat Sheet and gain access to hundreds of premium content articles, guides, product reviews and more.
- X-Ray of the PCI Process-4 Proactive Steps
- This white paper from Forrester Research Inc., helps break PCI into understandable components. Security and risk professionals will gain knowledge and insight into...
- Forrester: Economic Impact of Switching to Google Apps
- Content provided by Google
Read this Forrester report on the "total economic impact" of Google Apps, and learn how switching to Google Apps creates... - Intelligent Systems: Unlocking Hidden Business Value with Data
- An intelligent system enables data to flow across an enterprise infrastructure, spanning the devices where valuable data is gathered from employees and customers,...
- Concepts of NonStop SQL/MX
- For DBAs and developers who are familiar with Oracle solutions and want to learn about NonStop SQL/MX, this whitepaper provides an overview of...
- HP Advanced Information Services for SAP In-Memory Appliance (SAP HANA)
- Organizations are eager to connect the vast amounts of data available within and outside their businesses to compete more effectively and make better... All BI and Analytics White Papers
- Quantifying the Business Value of VMware View - Webcast
- Many enterprises have discovered that the use of virtualization to support desktop workloads creates a range of significant benefits. These benefits include price...
- Good to Great - How to Take Business Analytics to the Next Level
- By attending this webcast you will learn how you can implement an effective BA strategy that will deliver maximum strategic value to your...
- Supporting Mobile Productivity With A Limited IT Budget
- Join us and hear from Kaseya mobile IT management experts as we discuss core strategies for supporting the mobile revolution on a shoestring...
- User Experience Monitoring
- In this webinar, you will learn hints & tips for improving end-user response times from Forrester Research analyst, Jean-Pierre Garbani.
- Hints & Tips Cisco
- Overwhelmed by tracking your Vblock, Flexpod or Cisco UCS performance? Spend one hour with Nimsoft to learn how you can eliminate the overhead... All BI and Analytics Webcasts