QuickStudy: Data Cubes
Computerworld - When we try to extract information from a stack of data, we need tools to help us find what's relevant and what's important and to explore different scenarios. A report, whether printed on paper or viewed on-screen, is at best a two-dimensional representation of data, a table using columns and rows. That's sufficient when we have only two factors to consider, but in the real world we need more powerful tools.
But data cubes aren't restricted to just three dimensions. Most online analytical processing (OLAP) systems can build data cubes with many more dimensionsMicrosoft SQL Server 2000 Analysis Services, for example, allows up to 64 dimensions. We can think of a 4-D data cube as consisting of a series of 3-D cubes, though visualizing such higher-dimensional entities in spatial or geometric terms can be a problem.
In practice, therefore, we often construct data cubes with many dimensions, but we tend to look at just three at a time. What makes data cubes so valuable is that we can index the cube on one or more of its dimensions.
Relational or Multidimensional?
Since data cubes are such a useful interpretation tool, most OLAP products are built around a structure in which the cube is modeled as a multidimensional array. These multidimensional OLAP, or MOLAP, products typically run faster than other approaches, primarily because it's possible to index directly into the data cube's structure to collect subsets of data.
However, for very large data sets with many dimensions, MOLAP solutions aren't always so effective. As the number of dimensions increases, the cube becomes sparserthat is, many cells representing specific attribute combinations are empty, containing no aggregated data. As with other types of sparse databases, this tends to increase storage requirements, sometimes to unacceptable levels. Compression techniques can help, but using them tends to destroy MOLAP's natural indexing.
Data cubes can be built in other ways. Relational OLAP uses the relational database model. The ROLAP data cube is implemented as a collection of relational tables (up to twice as many as the number of dimensions) instead of as a multidimensional array. Each of these tables, called a cuboid, represents a particular view.
Because the cuboids are conventional database tables, we can process and query them using traditional RDBMS techniques, such as indexes and joins. This format is likely to be efficient for large data collections, since the tables must include only data cube cells that actually contain data.
However, ROLAP cubes lack the built-in indexing of a MOLAP implementation. Instead, each record in a given table must contain all attribute values in addition to any aggregated or summary values. This extra overhead may offset some of the space savings, and the absence of an implicit index means that we must provide one explicitly.
From a structural perspective, data cubes are made up of two elements: dimensions and measures. I've already explained dimensions; measures are simply the actual data values.
It's important to keep in mind that the data in a data cube has already been processed and aggregated into cube form. Thus we normally don't perform calculations within a data cube. This also means that we're not looking at real-time, dynamic data in a data cube.
The data contained within a cube has already been summarized to show figures such as unit sales, store sales, regional sales, net sale profits and average time for order fulfillment. With this data, an analyst can efficiently analyze any or all of those figures for any or all products, customers, sales agents and more. Thus data cubes can be extremely helpful in establishing trends and analyzing performance. In contrast, tables are best suited to reporting standardized operational scenarios.
Building a Data Cube This example uses sales figures from XYZ Co., which makes many kinds of widgets. For each sales transaction, we know four pieces of data:
Which types of widget were involved (style, color, size and so on)
Store or sales agent
Geographic region or territory
In a real-world situation, we would also know many other data items, including:
Cost to XYZ for each widget
Method and cost of shipping
Any of these pieces of data can function as a dimension in a data cube. We can take any two dimensions and produce a 2-D table 1. Thus we can correlate or track sales against individual stores or sales agents. Add in a third factor, such as price, and we can produce a 3-D data cube 2 that allows us to see how much each store or sales agent is selling in addition to which type of widget. Swap in geography 3, and we can now see who is selling where.
Are there technologies or issues you'd like to learn about in QuickStudy? Send your ideas to firstname.lastname@example.org.
To find a complete archive of our QuickStudies, go online to computerworld.com/quickstudies.
Read more about Business Intelligence/Analytics in Computerworld's Business Intelligence/Analytics Topic Center.
- Software Asset Management: Ensuring Today's Assets Today's trends like BYOD and SaaS are new and exciting in terms of how they will help make our jobs more productive but...
- Trends Shaping Software Management: 2014 Most IT executives recognize the relationship between mobile computing and worker productivity, and have long issued notebook computers and other mobile devices to...
- Software Asset Management: Pay Attention or Pay Up There is a wide range of options for managing software assets, from in-house solutions to the cloud to managed services providers. Read this...
- 13 Reasons to Move to Adobe Creative Cloud One of the big advantages Adobe Creative Cloud for teams offers over Adobe Creative Suite 6 perpetual software is the ability to continually...
- Capturing Data in Motion: Delivering Real-Time Insight from Data Streams This webcast will help organizations of all types and sizes learn about a technology and business strategy for tapping into the wealth of...
- The Next Generation of Big Data: New IBM Information Management Cloud Solutions Learn about IBM's new and expanded Information Management capabilities now delivered in the cloud, including: Hadoop based analytics, stream processing, in-memory computing, data... All Business Intelligence/Analytics White Papers | Webcasts
Our new bimonthly Internet of Things newsletter helps you keep pace with the rapidly evolving technologies, trends and developments related to the IoT. Subscribe now and stay up to date!