QuickStudy: Data Cubes
Computerworld - When we try to extract information from a stack of data, we need tools to help us find what's relevant and what's important and to explore different scenarios. A report, whether printed on paper or viewed on-screen, is at best a two-dimensional representation of data, a table using columns and rows. That's sufficient when we have only two factors to consider, but in the real world we need more powerful tools.
But data cubes aren't restricted to just three dimensions. Most online analytical processing (OLAP) systems can build data cubes with many more dimensionsMicrosoft SQL Server 2000 Analysis Services, for example, allows up to 64 dimensions. We can think of a 4-D data cube as consisting of a series of 3-D cubes, though visualizing such higher-dimensional entities in spatial or geometric terms can be a problem.
In practice, therefore, we often construct data cubes with many dimensions, but we tend to look at just three at a time. What makes data cubes so valuable is that we can index the cube on one or more of its dimensions.
Relational or Multidimensional?
Since data cubes are such a useful interpretation tool, most OLAP products are built around a structure in which the cube is modeled as a multidimensional array. These multidimensional OLAP, or MOLAP, products typically run faster than other approaches, primarily because it's possible to index directly into the data cube's structure to collect subsets of data.
However, for very large data sets with many dimensions, MOLAP solutions aren't always so effective. As the number of dimensions increases, the cube becomes sparserthat is, many cells representing specific attribute combinations are empty, containing no aggregated data. As with other types of sparse databases, this tends to increase storage requirements, sometimes to unacceptable levels. Compression techniques can help, but using them tends to destroy MOLAP's natural indexing.
Data cubes can be built in other ways. Relational OLAP uses the relational database model. The ROLAP data cube is implemented as a collection of relational tables (up to twice as many as the number of dimensions) instead of as a multidimensional array. Each of these tables, called a cuboid, represents a particular view.
Because the cuboids are conventional database tables, we can process and query them using traditional RDBMS techniques, such as indexes and joins. This format is likely to be efficient for large data collections, since the tables must include only data cube cells that actually contain data.
However, ROLAP cubes lack the built-in indexing of a MOLAP implementation. Instead, each record in a given table must contain all attribute values in addition to any aggregated or summary values. This extra overhead may offset some of the space savings, and the absence of an implicit index means that we must provide one explicitly.
From a structural perspective, data cubes are made up of two elements: dimensions and measures. I've already explained dimensions; measures are simply the actual data values.
It's important to keep in mind that the data in a data cube has already been processed and aggregated into cube form. Thus we normally don't perform calculations within a data cube. This also means that we're not looking at real-time, dynamic data in a data cube.
The data contained within a cube has already been summarized to show figures such as unit sales, store sales, regional sales, net sale profits and average time for order fulfillment. With this data, an analyst can efficiently analyze any or all of those figures for any or all products, customers, sales agents and more. Thus data cubes can be extremely helpful in establishing trends and analyzing performance. In contrast, tables are best suited to reporting standardized operational scenarios.
Building a Data Cube This example uses sales figures from XYZ Co., which makes many kinds of widgets. For each sales transaction, we know four pieces of data:
Which types of widget were involved (style, color, size and so on)
Store or sales agent
Geographic region or territory
In a real-world situation, we would also know many other data items, including:
Cost to XYZ for each widget
Method and cost of shipping
Any of these pieces of data can function as a dimension in a data cube. We can take any two dimensions and produce a 2-D table 1. Thus we can correlate or track sales against individual stores or sales agents. Add in a third factor, such as price, and we can produce a 3-D data cube 2 that allows us to see how much each store or sales agent is selling in addition to which type of widget. Swap in geography 3, and we can now see who is selling where.
Are there technologies or issues you'd like to learn about in QuickStudy? Send your ideas to email@example.com.
To find a complete archive of our QuickStudies, go online to computerworld.com/quickstudies.
Read more about Business Intelligence/Analytics in Computerworld's Business Intelligence/Analytics Topic Center.
- AWS and Jaspersoft Disrupt BI TIBCO Jaspersoft for AWS and Amazon Redshift disrupt business intelligence with cost-effective analytics in the cloud. Available by the hour in a pay-as-you-go...
- Cloud BI Demo: Jaspersoft for AWS Watch a 5-minute overview demo of Jaspersoft for AWS and see how to easily and affordably build business intelligence solutions as well as...
- Big Data, Big Mess: Sound Risk Intelligence Through Complete Context This paper examines the insecurity of the small businesses in the supply chain and offers tips to close those backdoors into the enterprise.
- Automating Cost Transparency By making all of the costs of running IT transparent, IT can change the way business units consume IT resources, drive down total...
- Cloud BI in Action: Recorded Webinar of Customer, Kony, Inc. See how Kony, Inc., a leading enterprise mobility company, is using TIBCO Jaspersoft for Amazon Web Services and Redshift to achieve embedded analytics...
- Cloud BI Overview: Jaspersoft for AWS Check out this overview of Jaspersoft for AWS, to easily and affordably build business intelligence solutions as well as embed visualizations and analytics... All Business Intelligence/Analytics White Papers | Webcasts
Our new bimonthly Internet of Things newsletter helps you keep pace with the rapidly evolving technologies, trends and developments related to the IoT. Subscribe now and stay up to date!