esProc, A Script Language for Data Analytics with Parallel Mechanism: The Disadvantages of SQL Computation (I)

SQL is invented primarily to provide a method to access structured data in order to transparentise the physical storage scheme, so a lot of various types of English vocabularies and syntaxes are used in SQL to reduce the difficulty in understanding and writing it. And the relational algebra as the basic theory of SQL is a complete computation system, which can compute everything in principle. In terms of this, we certainly should use SQL to satisfy various demands for data computation.

But, though relational database has achieved a huge success, evidently SQL fails to realize its original aim of invention. Except very few simple queries can be completed by end user using SQL, most of SQL users are still technical personnel, and even many complex queries are no easy job for technical personnel.

Why? We inspect the disadvantage of SQL in computation through a very simple example.

Suppose there is a sales performance table consisting of three fields (to simplify the problem, date information is omitted):

sales_amount	Sales performance table
sales	Name of salesman, suppose there is no duplicate name.
product	Products sold
amount	Sales amount of the salesman on the product

Now we want to know the name list of the salespersons whose sales amounts rank among the top 10 places both in air-conditioners and TV sets.

This question is rather simple and people will very naturally design out the computation process as follows:

1) Arrange the sequence according to the sales amount of air-conditioner and find out the top 10 places.

2) Arrange the sequence according to the sales amount of TV and find out the top 10 places.

3) Get the intersection of the results of 1 and 2 and obtain the answer.

Now we use SQL to do it.

1) Find out the top 10 places of the sales amount of air-conditioner. This is very simple:

select top 10 sales from sales_amount where product='AC' order by amount desc

2) Find out the top 10 places of the sales amount of TV. The action is the same:

select top 10 sales from sales_amount where product='TV' order by amount desc

3) Seek the intersection of 1 and 2. This is somewhat troublesome, as SQL does not support computation by steps. The computation result of the above two steps cannot be saved, and thus it is necessary to copy it once again:

select * from

( select top 10 sales from sales_amount where product='AC' order by amount desc )

intersect

( select top 10 sales from sales_amount where product='TV' order by amount desc )

A simple 3-step computation has to be written like this using SQL, and daily computations of more than 10 steps are in great numbers. So this evidently goes beyond the acceptability of many people.

In this way, we know the first important disadvantage of SQL: Do not support computation by steps. Dividing complex computation into several steps can reduce the difficulty of a problem to a great extent. On the contrary, completing many steps of computation into one step can increase the difficulty of a problem to a great extent.

It can be imagined that, if a teacher requires pupils to create only one calculation formula to complete the calculation in solving application problems, how distressed the pupils will feel (of course, there are certain clever children who can solve the problem)!

SQL query cannot by conducted by steps, but the stored procedure written out with SQL can operate by steps. Then, is it possible to use the stored procedure to conveniently solve this problem?

For the time being, we just ignore how complex is the technical environment in which the stored procedure is used (this is enough to make most people give it up) and the incompatibility caused by differences of databases. We only try to know theoretically whether it is possible to use SQL to make this computation simpler and faster.

1) Compute the top 10 places sales amount of air-conditioners. The statement is still the same, but we need to save the result for use by Step 3, while in SQL, it is only possible to use table to store set data. So we need to create a temporary table:

create temporary table x1 as

select top 10 sales from sales_amount where product='AC' order by amount desc

2) Compute the top 10 places of the sales amount of TV. Similarly

create temporary table x2 as

select top 10 sales from sales_amount where product='TV' order by amount desc

3) Seek the intersection, the preceding steps are troublesome but this step is simpler.

select * from x1 intersect x2

After the computation is done in steps, the working thought becomes clear, but it still appears over-elaborate to use a temporary table. In the computation of mass structured data, temporary set, as intermediate result, is rather common. If the temporary table is created for storage in all cases, the computation efficiency is low and it is not intuitive.

Related topics：

The Disadvantages of SQL Computation (II)

The Disadvantages of SQL Computation (III)

Basic Data Type in Data Processing Programming Language：SQL/R/esProc/Power Builder

Thinking of Set in esProc

menu

May 27, 2014

The Disadvantages of SQL Computation (I)

No comments:

Post a Comment