When we try to extract information from a stack of data, we need tools to help us find what's relevant and what's important and to explore different scenarios. A report, whether printed on paper or viewed on-screen, is at best a two-dimensional representation of data, a table using columns and rows. That's sufficient when we have only two factors to consider, but in the real world we need more powerful tools.
Data cubes are multidimensional extensions of 2-D tables, just as in geometry a cube is a three-dimensional extension of a square. The word cube brings to mind a 3-D object, and we can think of a 3-D data cube as being a set of similarly structured 2-D tables stacked on top of one another.
But data cubes aren't restricted to just three dimensions. Most online analytical processing (OLAP) systems can build data cubes with many more dimensions—Microsoft SQL Server 2000 Analysis Services, for example, allows up to 64 dimensions. We can think of a 4-D data cube as consisting of a series of 3-D cubes, though visualizing such higher-dimensional entities in spatial or geometric terms can be a problem.
In practice, therefore, we often construct data cubes with many dimensions, but we tend to look at just three at a time. What makes data cubes so valuable is that we can index the cube on one or more of its dimensions.
Relational or Multidimensional?
Since data cubes are such a useful interpretation tool, most OLAP products are built around a structure in which the cube is modeled as a multidimensional array. These multidimensional OLAP, or MOLAP, products typically run faster than other approaches, primarily because it's possible to index directly into the data cube's structure to collect subsets of data.
However, for very large data sets with many dimensions, MOLAP solutions aren't always so effective. As the number of dimensions increases, the cube becomes sparser—that is, many cells representing specific attribute combinations are empty, containing no aggregated data. As with other types of sparse databases, this tends to increase storage requirements, sometimes to unacceptable levels. Compression techniques can help, but using them tends to destroy MOLAP's natural indexing.
Data cubes can be built in other ways. Relational OLAP uses the relational database model. The ROLAP data cube is implemented as a collection of relational tables (up to twice as many as the number of dimensions) instead of as a multidimensional array. Each of these tables, called a cuboid, represents a particular view.
Because the cuboids are conventional database tables, we can process and query them using traditional RDBMS techniques, such as indexes and joins. This format is likely to be efficient for large data collections, since the tables must include only data cube cells that actually contain data.
However, ROLAP cubes lack the built-in indexing of a MOLAP implementation. Instead, each record in a given table must contain all attribute values in addition to any aggregated or summary values. This extra overhead may offset some of the space savings, and the absence of an implicit index means that we must provide one explicitly.
From a structural perspective, data cubes are made up of two elements: dimensions and measures. I've already explained dimensions; measures are simply the actual data values.
It's important to keep in mind that the data in a data cube has already been processed and aggregated into cube form. Thus we normally don't perform calculations within a data cube. This also means that we're not looking at real-time, dynamic data in a data cube.
The data contained within a cube has already been summarized to show figures such as unit sales, store sales, regional sales, net sale profits and average time for order fulfillment. With this data, an analyst can efficiently analyze any or all of those figures for any or all products, customers, sales agents and more. Thus data cubes can be extremely helpful in establishing trends and analyzing performance. In contrast, tables are best suited to reporting standardized operational scenarios.
|Building a Data Cube|
Kay is a Computerworld contributing writer in Worcester, Mass. You can reach him at firstname.lastname@example.org.
Are there technologies or issues you'd like to learn about in QuickStudy? Send your ideas to email@example.com.
To find a complete archive of our QuickStudies, go online to computerworld.com/quickstudies.