October 8, 2002

Relating to OLAP

OLAP and ROLAP are a continuum, not competitors

By Joy Mundy

Virtually all data warehouses use a relational data store. As a relational designer until a few years ago, I assumed that online analytic processing (OLAP) was merely a technology for small-scale applications. I now believe that perception is outdated and will become only more so as OLAP servers evolve to be a major component of the data warehouse.

Desktop vs. Server-Side OLAP

The phrase online analytic processing has become synonymous with "slice and dice." In desktop OLAP, a SQL query executes against the database and returns a result set to the desktop, which you can then pivot and manipulate locally. Desktop OLAP is useful but not very scalable. Server-side OLAP, where the tool issues a non-SQL query against a much larger remote data store, is more scalable and supports deeper analytics than its desktop cousins. Four products — Microsoft SQL Server Analysis Services, Hyperion Essbase in its various incarnations, Oracle Express, and MicroStrategy — dominate this market, along with a dozen or so smaller players.

An OLAP server enables intuitive data browsing and querying, supports analytic complexity, and provides great query performance through transparent navigation of pre-computed aggregations. The requirement to support analytic complexity on the server implies a language other than SQL, such as MDX or Calc Scripts. The most effective designs store the definition of these complex calculations on the server, where they're transparently available to all users. The recommended architecture for most purposes feeds the OLAP server from a dimensional data warehouse in the relational DBMS.

If OLAP technology delivers complex analytics and great query performance, why doesn't it dominate the market? The main reasons are market fragmentation, scalability, price, and flexibility. Market fragmentation has ill-served the customer. Until recently there has been no agreement even on client-access APIs. This situation is changing with the recent widespread adoption of XML for Analysis by most of the OLAP server market.

Historically, OLAP systems haven't been as scalable as relational systems. Concern about scalability remains, but it's become more a matter of perception than reality; terabyte-scale OLAP case studies and references are available from several vendors (notably MicroStrategy and Microsoft). Price is a harder objection to refute because an OLAP server is almost always added to a relational environment. OLAP systems are indeed flexible, as long as you're willing to adjust your relational thinking a little. I will probe these flexibility issues in a little more depth in the following paragraphs.

Dimensional Similarities

An experienced dimensional data modeler should feel comfortable reading the documentation for any of the OLAP server products. They all talk about dimensions, hierarchies, and facts or measures. Most use the concept of a cube, which is directly analogous to a relational fact table schema with associated dimensions and aggregate tables.

OLAP dimensions consist of hierarchies, or summarization paths. The relationship between a data hierarchy and an OLAP dimension is very important. Many OLAP products will let you define multiple hierarchies on a dimension, such as a fiscal and a standard calendar. If you've built your OLAP application from a relational dimensional data warehouse, you should have already implemented surrogate keys, which provide the same benefits for OLAP as for relational.

Like the relational database, OLAP servers have physical and calculated facts. However, the analytic engine in OLAP servers supports a much wider variety of complex calculated facts defined in the server, and this is often the key defining benefit of an OLAP implementation.

Dimensional Differences

The key difference between OLAP dimensions and simple relational dimensions is the central role played by hierarchies in OLAP implementations. An OLAP dimension is strongly structured around its hierarchies, and the metadata of a cube definition includes the hierarchical levels. This is one of the great strengths of an OLAP implementation. An OLAP query tool picks up the information about the hierarchical structure from the OLAP server and presents the structure to the user in an intuitive way.

Equally important, OLAP tools use strong hierarchies to define aggregations. An OLAP server enforces a kind of referential integrity between hierarchical levels in a dimension. A pure relational system does not enforce hierarchical integrity, especially if you use star rather than snowflake dimension designs.

Another difference between OLAP and relational is the ease of implementation of different types of changing dimensions. Type 2 dimensions (tracking history by adding a new dimension member) are very easy to implement in an OLAP system, assuming you have already implemented them in the relational database. By contrast, Type 1 dimensions (restating history by updating in place) are a difficult problem for OLAP. If you've ever built and maintained an aggregate table on a relational schema with a Type 1 dimension, you'll identify the problem immediately. When a customer moves from the West to the East, aggregates need to be updated. You'll either have to drop and recreate all aggregate tables, or figure out which sections of which aggregates are affected and fix them. OLAP servers tackle this problem for you, but there's no getting around the fact that significant processing is required. Type 1 changing dimensions can be expensive in an OLAP server.

Arbitrary Ranges of Time

There are some things that are utterly trivial to accomplish on a pure relational schema, yet are tricky in the OLAP world. An example is to issue a query that returns a total over an arbitrary time span. A query such as total sales for Q1 2002 is simple to formulate and should return from an OLAP server nearly instantaneously.

But woe betides the user who wants total sales for an arbitrary period such as January 3 through March 12, 2002, for which no predefined hierarchy exists. Part of the blame belongs to the client query tools, some of which won't even let you formulate the query except by bringing back daily data for January, February, and March. If your business users really need to do this a lot, you should include this functionality in your product review matrix.

Great Advantages

I just discussed some things that are really easy in the pure relational world but are difficult for some OLAP servers to handle. But OLAP is brimming with advantages compared to relational systems. Here's my list of OLAP's advantages:

	It provides an intuitive user interface for browsing data.
	It gives you spectacular query performance, primarily owing to the intelligent navigation of aggregates and partitions.
	Parent-child dimension structures are easy and intuitive to implement.
	It gives you server-defined rules for handling semiadditive and nonadditive measures.

As a simple example, consider inventory balances: The inventory balance for January and February is certainly not the sum of inventories in January and February. You can train users not to sum inventory balances over time, but will they always remember? Will all users use the same aggregation method, such as ending or average balance? OLAP systems can handle this problem transparently.

An OLAP system lets you have server-defined calculations of great complexity. SQL's limitations as an analytic language were outlined in a previous column. SQL is not an analytic or report-writing language: You need an analysis server to support statistics, data mining algorithms, or even simple rule-based business calculations such as allocations and distributions. The OLAP server acts as a friendly interface to the data cube, letting users consume server-defined analytics without worrying about how and where they are defined and computed.

Server-defined, high-performance queries and calculations can be performed over multiple fact tables or cubes. Combining data from multiple fact tables is a difficult problem in the pure relational world, but can be made easy and intuitive in certain OLAP servers.

Calculations can be defined once and used many times. The more calculations you can define on a central server, the more flexibility your users have in accessing the data. Even a simple slice-and-dice tool can use complex analytics previously defined on the OLAP server. This capability is not generally found in relational environments. And of course, power users can define complex calculations on the server so all users benefit.

OLAP is just plain fun. Most of the time.

Designed for Analysis

Recent trends in the OLAP market are toward lower cost, improved performance and scalability, increased functionality in the core analytic space, and extensions to neighboring spaces such as data mining. These trends will continue over the next few years as the major database vendors bet more heavily on OLAP servers and more tightly integrate those servers with other data management and analytic software.

OLAP servers present dimensional data in an intuitive way, enabling a broad range of analytic users to slice and dice data to uncover interesting information. OLAP is a sibling of dimensional models in the relational database, with intelligence about relationships and calculations defined on the server, that enable faster query performance and more interesting analytics from a broad range of query tools. You shouldn't think of an OLAP server as a competitor to a relational data warehouse, but rather an extension. Let the relational database do what it does best: provide storage and management. Don't torture yourself forcing the RDBMS — and its clunky query language SQL — do something they were not designed for: analysis.

Guest columnist Joy Mundy [joy@microsoft.com] evangelizes data warehousing and business intelligence best practices for Microsoft SQL Server.