esProc, A Script Language for Data Analytics with Parallel Mechanism: The Disadvantages of SQL Computation (II)

Moreover, SQL does not allow the value of a certain field to be a set (namely temporary table), so in this way, it is impossible to implement some computations even if we tolerate the over elaborate.

If we change the problem into computing the salespersons whose sales amounts of all products rank among the top 10 places, try thinking how to compute it. By continuing to use the above-mentioned working thought, it is very easy to get the below points:

1) Group the data according to products, arrange the sequence of each group, and get the top 10 places;

2) Get the intersection of the top 10 places of all products;

As we do not know beforehand how many products there are, so it is necessary to also store the grouping result in a temporary table. There is a field in this table that needs to store the corresponding group members, which is not supported by SQL, so the method is unfeasible.

If supported by window function (SQL2003 standard), it is possible to change the working thought. After grouping by product, compute the number of times each salesman appears in the top 10 places of the sales amounts of all product category group. If the number of times is the same as the total number of the product categories, it indicates this salesman is within the top 10 places regarding the sales amounts of all product categories.

select sales

from ( select sales,

rank() over (partition by product order by amount desc ) ranking

from sales_amount)

where ranking <=10 )

group by sales

having count(*)=(select count(distinct product) from sales_amount)

How many people can write such complex SQL?

Moreover, in many databases, the window functions are not supported. Then, it is only possible to use the stored procedure to develop a loop, according to the sequence, the top 10 places of each product, and seek the intersection of the result of the preceding time. This process is not very much simpler than using high level language to develop, and it is also necessary to cope with the triviality of the temporary table.

Now, we know the second important disadvantage of SQL: Set-lization is not complete. Though SQL has the concept of set, it fails to provide set as a kind of basic data type, which makes it necessary to transform a lot of natural set computations in thinking and writing.

In the above computation, we have used the keyword top. In fact there is not such a thing (it can be combined out by other computation computations) in the theory of relational algebra, and this is not the standard writing style of SQL.

Let us see how difficult it is to look for the top 10 places when there is no top.

Rough working thought: Seek out the number of members whose sales amount are higher than itself to rank the sales person, and then get the members whose places do not exceed 10, and the SQL is written as follows:

select sales

from ( select A.sales sales, A.product product,

(select count(*)+1 from sales_amount

where A.product=product AND A.amount<=amount) ranking

from sales_amount A )

where product='AC' AND ranking<=10

select sales

from ( select A.sales sales, A.product product, count(*)+1 ranking

from sales_amount A, sales_amount B

where A.sales=B.sales and A.product=B.product AND A.amount<=B.amount

group by A.sales,A.product )

where product='AC' AND ranking<=10

Professional technical personnel may not necessarily write such SQL statement well! And only the first ten places are computed.

menu

May 29, 2014

The Disadvantages of SQL Computation (II)

No comments:

Post a Comment