QuickStudy: Data Cubes
Computerworld - When we try to extract information from a stack of data, we need tools to help us find what's relevant and what's important and to explore different scenarios. A report, whether printed on paper or viewed on-screen, is at best a two-dimensional representation of data, a table using columns and rows. That's sufficient when we have only two factors to consider, but in the real world we need more powerful tools.
But data cubes aren't restricted to just three dimensions. Most online analytical processing (OLAP) systems can build data cubes with many more dimensionsMicrosoft SQL Server 2000 Analysis Services, for example, allows up to 64 dimensions. We can think of a 4-D data cube as consisting of a series of 3-D cubes, though visualizing such higher-dimensional entities in spatial or geometric terms can be a problem.
In practice, therefore, we often construct data cubes with many dimensions, but we tend to look at just three at a time. What makes data cubes so valuable is that we can index the cube on one or more of its dimensions.
Relational or Multidimensional?
Since data cubes are such a useful interpretation tool, most OLAP products are built around a structure in which the cube is modeled as a multidimensional array. These multidimensional OLAP, or MOLAP, products typically run faster than other approaches, primarily because it's possible to index directly into the data cube's structure to collect subsets of data.
However, for very large data sets with many dimensions, MOLAP solutions aren't always so effective. As the number of dimensions increases, the cube becomes sparserthat is, many cells representing specific attribute combinations are empty, containing no aggregated data. As with other types of sparse databases, this tends to increase storage requirements, sometimes to unacceptable levels. Compression techniques can help, but using them tends to destroy MOLAP's natural indexing.
Data cubes can be built in other ways. Relational OLAP uses the relational database model. The ROLAP data cube is implemented as a collection of relational tables (up to twice as many as the number of dimensions) instead of as a multidimensional array. Each of these tables, called a cuboid, represents a particular view.
Because the cuboids are conventional database tables, we can process and query them using traditional RDBMS techniques, such as indexes and joins. This format is likely to be efficient for large data collections, since the tables must include only data cube cells that actually contain data.
However, ROLAP cubes lack the built-in indexing of a MOLAP implementation. Instead, each record in a given table must contain all attribute values in addition to any aggregated or summary values. This extra overhead may offset some of the space savings, and the absence of an implicit index means that we must provide one explicitly.
From a structural perspective, data cubes are made up of two elements: dimensions and measures. I've already explained dimensions; measures are simply the actual data values.
It's important to keep in mind that the data in a data cube has already been processed and aggregated into cube form. Thus we normally don't perform calculations within a data cube. This also means that we're not looking at real-time, dynamic data in a data cube.
The data contained within a cube has already been summarized to show figures such as unit sales, store sales, regional sales, net sale profits and average time for order fulfillment. With this data, an analyst can efficiently analyze any or all of those figures for any or all products, customers, sales agents and more. Thus data cubes can be extremely helpful in establishing trends and analyzing performance. In contrast, tables are best suited to reporting standardized operational scenarios.
Building a Data Cube This example uses sales figures from XYZ Co., which makes many kinds of widgets. For each sales transaction, we know four pieces of data:
Which types of widget were involved (style, color, size and so on)
Store or sales agent
Geographic region or territory
In a real-world situation, we would also know many other data items, including:
Cost to XYZ for each widget
Method and cost of shipping
Any of these pieces of data can function as a dimension in a data cube. We can take any two dimensions and produce a 2-D table 1. Thus we can correlate or track sales against individual stores or sales agents. Add in a third factor, such as price, and we can produce a 3-D data cube 2 that allows us to see how much each store or sales agent is selling in addition to which type of widget. Swap in geography 3, and we can now see who is selling where.
Are there technologies or issues you'd like to learn about in QuickStudy? Send your ideas to firstname.lastname@example.org.
To find a complete archive of our QuickStudies, go online to computerworld.com/quickstudies.
Read more about Business Intelligence/Analytics in Computerworld's Business Intelligence/Analytics Topic Center.
- Best iPhone, iPad Business Apps for 2014
- 14 Tech Conventions You Should Attend in 2014
- 10 Desktop Apps to Power Your Windows PC
- How to Add New Job Skills Without Going Back to School
- Slideshow: 7 security mistakes people make with their mobile device
- iOS vs. Android: Which is more secure?
- 11 sure signs you've been hacked
- Four Myths of High-Productivity App Dev Debunked Debunk the main myths surrounding high-productivity application development and how both platforms have overcome them.
- The value of smarter oil and gas fields With global energy requirements continuing to rise, the exploration, development and production of new oil and gas resources are shifting to increasingly challenging...
- Smarter Environmental Analytics Solutions: Offshore Oil and Gas Installations Example This IBM Redbooks® Solution Guide describes a solution for implementing smarter environmental monitoring and analytics for oil and gas industries. The solution implements...
- Piecing Together the Business Intelligence Puzzle Business intelligence (BI) technology collects and analyzes company data, delivering relevant information to corporate decision-makers in an effort to produce favorable outcomes.
- Live Webcast Increasing the Value of Your Reports and Dashboards Learn how incorporating other analytical capabilities such as predictive modeling and visualization can increase the value of your reports and dashboards by providing...
- The Software-Defined Data Center: Is your ADC ready? Data center transformation is accelerating beyond virtualization to next-generation cloud architectures and software-defined data centers, bringing new challenges for application performance, scalability and...
- Application Acceleration: Optimize the End-User Experience Watch this on-demand webcast and learn how you can optimize your web content, accelerate performance across any device and browser combination, and offload... All Business Intelligence/Analytics White Papers | Webcasts
By Rob F. Walker, Ph.D.
In the previous installment, we looked at and discussed strategies for business simulation and the infrastructure needed to make such initiatives successful. Now, we¿re ready to discuss some practical examples of business simulation. Imagine a mail order company selling products together with the necessary financing. more