esProc, A Script Language for Data Analytics with Parallel Mechanism: What OLAP is Indeed for Actual Need

OLAP is an important constituent part of BI (business intelligence)?

Understood literally, OLAP is online analytical processing, that is, users conduct analytical operation on real-time business data.
But, currently the concept of OLAP is seriously narrowed, and only it refers to operations such as conducting drilling, aggregating, pivoting and slicing based on multi-dimensional data, namely, multi-dimensional interaction analysis.

To apply this kind of OLAP, it is necessary to create in advance a group of topic specific data CUBEs. Then users can display these data in the form of crosstab or graph and conduct in various real-time transformations (pivoting and drilling) on them, in the hope to find in the transformation process a certain law of the data or the argument to support a certain conclusion, thereby achieving the aim of analysis.
Is this kind of OLAP we need?
To answer this question, we need to carefully investigate the real application process of the OLAP, thereby finding out what the technical problem the OLAP needs to solve is on earth.
Employees with years’ working experiences in any industry generally have some educated guesses about the businesses they engage in, such as:
A stock analyst may guess stocks meeting a certain condition are likely to go up.
An employee of an airline company may guess what kinds of people are accustomed to buying what kind of flights.
A supermarket operator may also guess the commodity at what price is more suitable for the people around the supermarket.
…
These guesses are just the basis for forecast. After operating for a period of time, a constructed business system can also accumulate large quantities of data, and these guesses have most probably been evaluated by these accumulated data, when evaluated to be true, they can be used in forecast; when evaluated to be false they will be re-guessed.
It needs to be noted that these guesses are made by users themselves instead of the computer system! What a computer should do is to help a user to evaluate, according to the existing data, the guess to be true or false, namely, on-line data query (including certain aggregation computation). This is just the application process of OLAP. The reason why on-line analysis is needed is that many query computations are temporarily required after a user has seen a certain intermediate result. In the whole process, model in advance is impossible and unnecessary.
We call the above process evaluation process, whose purpose is to find from historical data some laws or evidences for conclusions, and the means adopted is to conduct interactive query computation on historical data.
The following are a few examples actually requiring computations (or queries):
The first n customers whose purchases from the company account for half of the sales volume of the company of the current year;
The stocks which go up to the limit for three consecutive days within one month;
Commodities in the supermarket which are sold out at 5 P.M for three times within one month;
Commodities whose sales volumes in this month have decreased by more than 20% over those of the preceding month;
…
Evidently, this type of computation demand is ubiquitous in business analysis process and all can be computed out from historical database.
Then, can the narrowed OLAP be used to complete the above-mentioned computation process?
Of course NOT!
Currently OLAP system has two key disadvantages:

The multi-dimensional cube is prepared in advance by the application system and user does not have the capability to temporarily design or reconstruct the cube, so once there is new analysis demand, it is necessary to re-create the cube.
The analysis actions could be implemented by cube are rather monotonous. The defined actions are quite few, such as the drilling, aggregating, slicing, and pivoting. The complicated analysis behavior requiring multi-steps is hard to implement.

Although the current OLAP products are splendid regarding its look and feel, few on-line analysis capabilities powerful enough are provided actually.
Then, what kind of OLAP do we need?
It is very simple, and we need a kind of on-line analytical system that can support evaluation process!
Technically speaking, steps for evaluation process can be regarded as computation regarding data (query can be understood to be filter computation). This kind of computation can be freely defined by user and user can occasionally decide the next computation action according to the existing intermediate result, without having to model beforehand. Additionally, as data source is generally database system, it is necessary to require this kind of computation to be able to very well support mass structured data instead of simple numeric computation.
Then, can SQL (or MDX) play this role?
SQL is indeed invented for this aim and it owns complete computation capability and it adopts a writing style similar to natural language.
But, as SQL computation system is too basic, it is very difficult and over-elaborate to use it to achieve complex computation, such as problems listed in the preceding paragraphs. It is even not so easy for programmers who have received professional training, so ordinary users can only use SQL to implement some of the simplest queries and aggregate computation (based on the filter and summarization of a single table). This result leads to the fact that the application of SQL has already deviated far away from its original intention of invention, almost becoming the expertise for programmers.
We should follow the working thought of SQL to carefully study the specific disadvantage of SQL and find the way to overcome it in an effort to develop a new generation of computation system, thereby implementing the evaluation process, namely, the real OLAP.

menu

May 27, 2012

What OLAP is Indeed for Actual Need

No comments:

Post a Comment