Briefing Paper: Web OLAP (OLAP & thin-client computing)

Page: 3912
Issue Date: 06/13/2001
Category: Query Tools

Traditionally, online analytical processing (OLAP) tools have been firmly rooted in the world of Windows client-server technology. Hence, these tools suffer from the same problems and limitations of other traditional client-server applications, namely a relatively large (fat) client footprint and sometimes difficult deployment and maintenance.

The World Wide Web (Web) has now introduced thin-client computing to the OLAP world. The architecture is still technically 'client-server' because it involves a client machine where data is delivered and a server machine where data is stored and processed. Web-enabled online analytical processing (Web OLAP) applications are maturing fast. The latest wave of thin-client applications provide virtually all the functionality of their traditional client-server counterparts including: drill down, pivot, interactive reporting, and graphing capabilities. While the Web 'client' may be slightly less sophisticated than the functionality delivered by traditional client-server tools, the cost of deployment to local and remote sites is lessened, and for applications that fit a certain profile, development time can also be extremely rapid.

Vendor response
Vendor implementations of Web OLAP technology come in a variety of flavors, and depend heavily on the architectural goals of the OLAP vendor. For example:

* A vendor that has elected to provide limited OLAP functionality over the Web may think that distribution of HTML (hypertext markup language) reports provides a large amount of the functionality needed. These HTML reports provide some degree of user interaction and can often be 'pushed' or emailed to selected individuals.

* A vendor that believes that the user interface is an important differentiator may opt for Java applets to maintain the same level of functionality and appearance over the Web. But this alternative represents a more difficult development path as the release of Java came after the development of many OLAP tools; Java development tools were simply unavailable when these applications were built.

* A vendor with a significant investment in legacy code - such as Visual Basic - may wish to reuse large portions of the legacy code in the form of a server-side object or control. This object or control would then be invoked at runtime from the client.

However, thin-client computing is not a straightforward two-tier client-server architecture; Web applications run from an HTML browser and provide a data delivery mechanism requiring no other software on the client machine. In most cases, processing that cannot be accomplished using HTML and an HTTP (hypertext transfer protocol) server requires a middle processing tier.

OLAP vendors therefore face some hard decisions in making the move to thin-client computing. These decisions are partly a reflection of their previous investments in legacy technology, and the need to re-architect their tools and applications to function in the Web world. But despite the costs, the benefits of moving their products to the Web have been too great to ignore.

The Web browser is now being positioned as the primary OLAP client for most decision support applications. However, Web browsers are radically different to the traditional fat clients that have thus far been used to provide OLAP functionality.

Web browsers use HTTP to communicate with Web servers. HTTP is a 'stateless' protocol without persistent connections; the browser opens a connection using the URL (uniform resource locator) provided, retrieves the HTML page and the components of the page (such as graphics and text), and closes the connection. OLAP however presents a challenge to this mode of operation. Because browsing OLAP database records requires a persistent connection and the ability to retain state, where the application is within a data set, it is difficult (neigh impossible) to browse a data set using this protocol.

OLAP vendors have subsequently addressed the problem of database access via the Web by extending the capabilities of HTML in a way that is 'transparent' to the Web browser. These solutions retrieve data from the database, format the data into an HTML document, and return the HTML page to the Web browser.

One solution is to CGI (common gateway interface) scripts. However, CGI suffers from the same shortcomings as HTML, in that it cannot maintain a connection to the database and does not provide a persistent database state. Additionally, CGI programs are executed each time they are invoked, so they must incur the overhead of being loaded and run by the operating system each time they are used. Though there are tricks and techniques to avoid some of these problems, CGI is not the most efficient and useful method of extending the features of HTML.

A more common solution is to provide additional functionality to HTML in the form of HTML extensions that are parsed and executed by the HTTP server. The prominent HTTP server vendors (Microsoft and Netscape) both support this functionality through their respective APIs: Microsoft's Active Server Pages and Netscape's NSAPI. The use of such extensions to provide access to server-side controls represents virtually limitless possibilities. They also provide HTML page developers with access to programming languages such as C and Visual Basic.

A significant development in the Web browser world was undoubtedly Netscape's decision to support Java as applets (albeit in a 'security sandbox'), thereby providing the full functionality of an object-oriented programming language in HTML. Initially, the GUI provided by Java lacked all the richness of the Windows graphical user interface (GUI), but later releases have closed the gap; Java applets now provide a robust, efficient approach to extending HTML, but does present a number of restrictions in terms of platform/browser-specific support.

The need for a 'middle-tier'
Effective thin-client computing is dependent on a mid-tier component to manage processing requests. By providing the capability to process the complex logic required of the OLAP tool on a mid-tier server, it is possible to drastically reduce the amount of code that must be downloaded and processed on the client. Thus the thin-client paradigm is realized through an architecture that places the bulk of the processing on a middle-server tier, and requires only the client to perform simple HTML rendering.

In many (but not all) OLAP implementations, a scalable middle-tier is a must. In this manner, the middle tier becomes not simply the second tier, but also part of a multi-tier application architecture. IT departments can scale additional user load by spawning additional instances of middleware components (described below). With additional middleware instances in operation, as requests are received there are more instances running to process the results. The number of tiers can be increased to a number appropriate to support the user load. Not all middleware vendors provide the ability to scale the middle tier to a user load, however. Note that without this capability, the middleware can be a bottleneck in the processing cycle.

Vendor implementations of Web OLAP technology come in various flavors, all of which represent variations of a thin-client computing architecture. The predominate approaches used to bring client-server tools to the Web are:

* Pure HTML.

* Java applets.

* The use of middleware & software components (Active X or CORBA).

Many of the early attempts at Web OLAP took the position that most Web users simply want the ability to share analytical results with other users. For these vendors, complete OLAP functionality for every user was unnecessary - instead, distributing results or output remotely was their primary goal. Generating report output in the form of HTML accomplished this goal quickly, with little pain and as cost effectively as possible.

HTML report output can provide a relatively consistent report format across different platforms. And in some cases, a certain degree of user interaction is possible. For example, HTML report developers users can insert a button or hot link into the HTML report output to provide the ability to drill-down.

But using HTML also has several limitations:

* The presentation of the report can vary slightly among platforms and Web browsers.

* It does not provide the flexible and dynamic GUI controls that OLAP users have come to expect from the current OLAP offerings.

* Interactivity is limited - standard OLAP functions (drill up, drill down, and pivot) are usually limited, and complex functions often missing.

* Most significantly, remote Web users are limited to output provided by other users; the true user empowerment of ad hoc analysis to enterprise data cannot be supplied with HTML alone.

* HTML is a text-rendering language with additional features such document links, form input, and other basic functions. It was never intended to provide explicit document formatting capabilities or the fine-grained control of a programming language.

Java applets
Most popular Web browsers provide the ability to download Java applets as part of an HTML page. These applets provide virtually all the functionality of popular OLAP tools running under Windows or other popular GUIs. While early versions of the Java GUI provided a limited GUI, this library was aggressively improved in later revisions.

To provide functionality and control comparable to that of traditional client-server OLAP tools requires a rather large applet. But because users must download the entire applet before it can begin executing, network bandwidth and download speed can be a problem.

For complex applications such as OLAP tools, distributing complexity and logic across a middle tier allows for smaller client-side applications. Having to code less logic into a Java applet leads to a smaller applet download at runtime, and a faster overall execution. For this reason, Java applet OLAP solutions use a middleware server, or execute distributed software components on the middle tier, to avoid the overhead of downloading an entire applet. Furthermore, distributed objects spread logic and complexity across available resources, theoretically making applications scale better. But good scaling implies adequate resource on the middle-tier server and adequate bandwidth.

The shortcomings of HTML are fuelling the drive to use middleware and software components to provide improved Web OLAP functionality. In fact, the increasing move toward component middleware is largely driven by the pressure on the OLAP vendors to attain client-server functionality in their Web OLAP tools - and not necessarily by the integration and reuse benefits of component architecture itself.

Middleware - the 'glue' between BI tools and data sources
Web OLAP, with its need for persistent database connections and complex processing requirements, is a prime candidate for middleware in all its forms. Middleware, in the form of software components or an application server, is adding capabilities to Web OLAP applications that would otherwise have been impossible with Web browsers.

Middleware is also the 'glue' that connects Web OLAP applications to the data they are tasked with analyzing. For example, middleware application servers establish a network connection to a client and then use a pre-established protocol to transfer messages and data.

Middleware provides two main services for Web OLAP applications:

* It provides features and functionality that HTML and Java applets do not provide.

* It reduces the amount of application code that must be downloaded to the client by placing and executing program logic on the server.

Middleware can extend static HTML to provide capabilities closer to those of a conventional programming language than a text markup language. When processing HTML, you can place middleware between the HTTP server and a database. The middleware can parse the HTML and strip specific statements from the text. It also processes these statements and can be used to maintain persistent database connections and program variables. In combination, these capabilities provide the ability to iterate through a set of rows returned from a database.

OLAP vendors have specific needs for middleware that can go well beyond extending HTML to provide database connectivity. OLAP vendors use middleware to filter and aggregate the rows returned from a query, format text, and insert graphs into the HTML to be returned to the browser. More specifically, in many cases, they use middleware to understand and use the multidimensional database being analyzed.

Software components - a more 'elegant' approach
Thin-client computing virtually requires a middle tier for database access and is a steady, powerful driver for the use of software components. Software components are considered 'middleware' - but there is a difference. Middleware performs many of the same functions as a software component, but with a less sophisticated (and some would say less elegant) interface. For example, middleware application servers establish a network connection to a client and then use a pre-established protocol to transfer messages and data. In contrast, a software component would use a facility similar to a function call or a procedure invocation, with parameters passed to the middleware software component and then data returned via a pointer or buffer.

Importantly, software components provide a natural programming interface to the middleware, albeit through what is considered by some people to be a complex interface. Tools and manual procedures help take existing code and turn it into a distributed software component that resides on a middle tier.

Increasingly, OLAP vendors are now deploying component architectures based on a middle-tier using two main competing software component models:

* Microsoft's Distributed Common Object Model (DCOM) and ActiveX controls.

* Common Object Request Broker Architecture (CORBA).

Several tools are available that build components for both standards - such as IBM's Component Broker Toolkit, Iona's Orbix for CORBA, and Microsoft's ActiveX SDK for DCOM/ActiveX. And a number of vendors (including OLAP vendors) are shipping products that use software components of either DCOM/ActiveX or CORBA implementations as a middle tier.

DCOM/Active X
Microsoft has created and endorses the Distributed Component Object Model (DCOM), which is currently specific to Windows and the Intel platform, although Microsoft has announced plans to support it on Unix and other platforms. This model allows controls such as ActiveX, to be loaded and run on the client, giving the browser Windows-like GUI functionality - functionality that may not currently be available using HTML.

The DCOM model benefits from the considerable weight of the ActiveX control. Based on the old object linking and embedding (OLE) standard used widely across Microsoft products, ActiveX controls can provide extensive functionality to an application with minimal programming effort. With the installed base of Windows as a potential sales platform, the number of developers for these controls is huge, as is the number of available controls.

But the critics of DCOM/ActiveX are numerous and vocal. Microsoft's arguments to the contrary, DCOM is specific to the Microsoft platform - that is, an ActiveX control can run only on Windows platforms. This limitation presents serious problems in an Internet environment where the target platform is completely unknown, but presents less of a problem in an intranet environment where the target platform is manageable - and in most business enterprises, that platform is currently Windows. Security is another issue with ActiveX. ActiveX controls are built on a 'trust' security model - if users choose to let ActiveX controls run on their machines, this choice implies that the control is allowed to use the resources of those machines. This 'trust' model could be problematic if a malicious control were downloaded. Microsoft, however, considers the trust model necessary if components are to perform useful work.

The main competition for DCOM/ActiveX is CORBA, a standard developed by the Object Management Group (OMG) and endorsed by most Unix vendors. CORBA provides a language-independent, platform-independent framework for software modules to communicate. CORBA applications make requests to an object request broker (ORB), which directs the request to the server containing the object and then returns the result to the client that requested the object. CORBA ORBs communicate through the Internet Inter-ORB Protocol (IIOP).

The primary benefit of CORBA relative to DCOM/ActiveX is in its cross-platform capabilities. In many heterogeneous locations, CORBA can help provide interoperability and code reuse across multiple development platforms. The ability to run CORBA middleware on Unix platforms also provides access to a number of high-performance servers. But CORBA has its limitations, not the least of which is that it is not being pushed by a single, large vendor but is, rather, subject to the multiple interpretations and efforts of several vendors. And there are technical issues with the CORBA standard; for example, it does not currently provide a security model, and there is no specific facility for providing naming across a network.

Which one will win?
Although middleware architectures that tout distributed software components are common, their actual use has trailed interest. The reasons for this slow uptake involve issues relating to new technology - namely, competing standards, security, network bandwidth and required support infrastructure.

As is constantly the case in the computer industry, a clear, dominant software component standard has been elusive. Hence, for the near term at least, CORBA and DCOM/ActiveX will be forced to coexist. Neither technology has a significant technical edge. CORBA has a firm footing in the open systems world, while DCOM/ActiveX has the significant power and installed base of Microsoft behind it. They will coexist through the increasing use of application 'bridging' technology and cross-platform development tools. OLAP is one possible 'killer application' that may make most users or developers choose one over the other. But in the long term, the choice may be 'neither of the above'. There is the potential for a new standard (instead of standards) to evolve that encompasses and improves on a number of the required services and technologies.

Web OLAP will continue to proliferate because many OLAP users need easy-to-deploy analytical applications that do not demand a long learning curve. As a result, the functions and features of Web OLAP tools will no doubt continue to improve. The Web browser is increasingly being considered as the delivery mechanism of choice; it represents a development and delivery tool that has shaken the client-server tool industry to its core.

But while the Web browser does not yet support as rich a GUI as Windows environments, many developers and users have found that the benefits of platform independence, speed of development, and 'zero' deployment make the browser the platform of choice.

Expectations are that with the continued march towards Web-based delivery of information, Web OLAP tools will continue to mature and close the functionality gap between traditional client-server tools and Web-based tools.

However, Web OLAP clients require both a rich GUI and fast, efficient database access - they push the envelope of what the Web browser can deliver. Needing much more than what HTML alone can provide, OLAP vendors have started looking towards middleware to provide the expanded capabilities they need. With the advent of Java applets and software component solutions, they have now begun to implement these components as their 'middle-tier' - primarily to provide expanded GUI capabilities and better and faster database connectivity.