SQL (Structured
Query Language) was initially developed in early 70s. To date, SQL is still a powerful,
simple, and widely-applied database query language. However, the drawbacks of
SQL are evident: non-stepwise computation, incomplete set-lization, no support
for ordered set, and lack of object reference. Although several producers has developed
a number of incompatible solutions, such as various stored procedures like
PL-SQL®, and T-SQL®, these improvement solutions cannot solve the inherited SQL
drawbacks.
As a query language of the new generation, esProc over-perform SQL
in respect of computational capability, usability, and applicability.
Case Description
A transnational retail enterprise needs to collect statistics on the
newly opened retail store, that is, how many new retail stores will open in
this year? Of which how many enterprises will exceed 1 million dollars? How
many enterprise open overseas?
This question is progressive. The three questions are mutually
related, the next question can be regarded as on the further explore on the
current question, fit for stepwise computation.
The original data is from the database of stores table with the main
fields: storeCode, storeName, openedTime, profit, and nation. Let’s see the SQL
solution first.
SQL Solution
To solve such problem with SQL, you will need to compose 3
statements as given below.
l SELECT COUNT(*) FROM stores WHERE to_char (openedTime, 'yyyy') = to_char
(sysdate,'yyyy');
l SELECT COUNT(*) FROM stores WHERE to_char (openedTime, 'yyyy') = to_char
(sysdate,'yyyy') and profit>1000000;
l SELECT COUNT(*) FROM stores WHERE to_char (openedTime, 'yyyy') = to_char
(sysdate,'yyyy') and profit>1000000 and nation<>’local’;
SQL1: Get the results of problem 1.
SQL2: Get the result of problem 2. Because the stepwise computation
is impossible (that is, the results of previous computation cannot be utilized),
you can only solve and take it as an individual problem.
SQL3: Solve the problem 3, and you are not allowed to compute in
steps either.
esProc Solution
A1 cell: Get the record meeting the problem 1.
A2 cell: Stepwise
computation. Operate on the basis of cell A1, and get the record
meeting the conditions of problem 2.
A3 cell: Proceed with the stepwise computation, and get the record
meeting the problem 3.
B1, B2, B3 cells: It is still the stepwise computation. Count the
corresponding records.
Comparison
For the SQL, there are 3 associations for you to compute in steps,
and explore progressively. However, because stepwise computation is hard to
implement with SQL, this problem have to be divided into 3 individual problems.
esProc is to compute in steps following the natural habit of
thinking: Break the general objective into several simple objective; Solve
every small objective step by step; and complete the final objective.
In case that you proceed with the computation on the basis of the original
3 problems, for example, seek “proportion of problem 3 taken in the problem 2”,
or “group by country for problem 2”, then, in the esProc, you can simply write ”=A3/A2”,
and ”A3.group(nation)”. In each step, there is a brief and clear expression of
highly readable, not requiring any advanced technical ability. By comparison,
SQL requires redesigning the statement. The redesigned statement will undoubtedly
become longer and longer, and more and more complex. Ultimately, such job can only
be assigned to those who have the advanced technical ability in SQL.
esProc can break the complex problem down to simple computation procedure
based on the descriptions from the business perceptive. This is the advantage
of the stepwise computation. By comparison, SQL does not allow for computation
by step or problem breakdown, and thus it is against the scientific methodology,
and not fit for the complex computation.
II Complete Set-lization
Case
Description
A certain advertisement agency needs to compute
the clients whose annual sales values are among the top 10.
The data are from the sales table, which
records the annual sales value of each client with the fields like: customer,
time, and amount.
SQL
Solution
SELECT customer
FROM (
SELECT customer
FROM (
SELECT customer,RANK() OVER(PARTITION BY time ORDER BY amount DESC)
rankorder
FROM
sales )
WHERE rankorder<=10)
GROUP BY customer
HAVING COUNT(*)=(SELECT COUNT(DISTINCT time) FROM sales)
Such Problem requires ranking the sets in a set, that is, group by “time”
and then rank by “customer” in the group. Since the currently popular SQL-92
syntax is unable to represent this, the SQL-2003 standard, for which some
vendors gradually provide the incomplete support, will be used to solve this
problem barely.
A tip can be adopted when computing the customer intersections in
the last step: the count of year equals to the count of clients.
esProc Solution
A1: Group the original dataset by year. A1 is a set of sets.
B1: Get the serial number of records whose sales values are among
the top 10 of each group. The rank() is used to rank in every group, and pselect()
can be used to retrieve the serial number on conditions. ~ is used to represent
every member in the set. B1 is the “set of set”.
A2: Retrieve the record from A1 according to the serial number
stored in B2, and get the customer field of the record.
A3: Compute the intersection of sets.
Comparison
The SQL set-lization is incomplete and can only be used to represent
the simple result set. Developers cannot use SQL to represent the concept of “set
of set”. Only the queries of 3-level-nested-loops are available to barely
perform the similar computations. In addition, SQL cannot be used to perform
the intersection operation easily that developers with advanced techniques can
only resort to the unreadable statements to perform the similar operations,
such as count of years equal to the count of clients. This equals to the
intersection of clients.
The set is the base of mass data. esProc can achieve set-lization
completely, represent the set, member, and other related generic or object reference
conveniently, and perform the set operations easily, such as intersection,
complement, and union.
When analyzing the set-related data, esProc can greatly reduce the
computation complexity. By taking the advantage of set, esProc can solve many
problems agilely and easily that are hard to solve with SQL.
III Ordered Set
Case
Description
Suppose that a telecommunication manufacturer needs to compute the link
relative ratio of sales value compared with that of last month (i.e. the
increase percent of sales value of every month compared with that of last
month). The sales data is stored in the sales table with the main fields
including salesMonth, and salesAmount.
SQL
Solution
select salesAmount,
salesMonth,
(case
when
prev_price !=0
then ((salesAmount)/prev_price)-1
else 0
end) compValue
from (select salesMonth,
salesAmount,
lag(salesAmount,1,0)
over(order by salesMonth) prev_price
from sales) t
The currently popular SQL-92 does not provide the concept of serial
number, which adds many difficulties to the computation. Considering this, the
designer of SQL-2003 partly makes up this disadvantage. For example, the window
function lag() of SQL-2003 is used to retrieve the next record in this example.
In addition, in the above statement, the “case when” statement is
used to avoid the error of division by zero on the first record.
esProc Solution
sales.derive(salesAmount / salesAmount [-1]-1: compValue)
The derive() is an esProc function to insert the newly computed
column to the existing data. The new column is compValue by name, and the algorithm
is “(Sales value of this month/Sales value of previous month)-1”. The “[n]” is
used to represent the relative position, and so [-1] represents the data of the
previous month.
On the other hand, for the data of the first record, the additional
procedure for division by zero is not
required in esProc.
Comparison
From the above example, even if using SQL-2003, the solution to such
problem is lengthy and complex, while the esProc that support the ordered set
is simple and clear.
Moreover, SQL-2003 only provides the extremely limited computation
capability. For example, esProc user can simply use the ”{startPosition,endPosition}”
to represent the seeking of a range, and simply use ”(-1)” to represent the
seeking of the last record. Regarding the similar functionality, it will be
much harder for SQL user to implement.
In the practical data analysis, a great many of complex computations
are related to the order of data. SQL users are unable to handle such type of
computations as easily as esProc users because SQL lacks of the concept of
Being Ordered.
IV Object Reference
An insurance enterprise has the below analysis demands: to pick out
the annual outstanding employees (Employee of the Year) whose Department Manager
has been awarded with the President Honor. The data are distributed in two
tables: department table (main fields are deptName, and manager), and employee
table (main fields are empName, empHonor, and empDept).
empHonor has three types of values: null value; ”president's award”,
PA for short; and ”employee of the year”, EOY for short. There are 2 groups of correspondence
relations: empDept and deptName, and Manager and empName.
SQL
Solution
SELECT A.*
FROM employee A,department B,employee C
WHERE
A.empDept=B.deptName AND B.manager=C.empName AND
A.empHonor=‘EOY’ AND C.empHornor=‘PA’
SQL users can use the nested query or associated query to solve such
kind of problems. In this case, we choose the association query that is both concise
and clear. The association statement behind the “where” has established the
one-to-many relation between deptName and empDept, and the one-to-one relation
between manager and empName.
esProc Solution
employee.select(empHonor:"EOY",empDept.manager.empHornor:"PA")
esProc solution is intuitive: select out the employee of “EOY” whose
Department Manager has be awarded with “PA”.
Comparison
The SQL statement to solve such kind of question is lengthy and not
intuitive. In fact, the complete association query language is “inner join…on…”
style. This statement is simplified in the above example. Otherwise it will be
much hard to understand.
esProc users can use ”.” for object reference. Such style is
intuitive and easy to understand. The complex and lengthy association statement
for multiple tables can thus be converted to the simple object access, which is
unachievable for SQL. When there are more and more tables, the complexity of
SQL association query will rise in geometric series. By comparison, the esProc
user can always access the data intuitively and easily by taking the advantage
of object reference.
Regarding the multi-table associations of complex computation,
esProc can handle it more intuitively and conveniently than SQL.
From the comparison of the above four examples, we can see that
esProc is not only characterized with stepwise computation, complete
set-lization, sorted sets, and object reference. The analysis style is
intuitive, the syntax style is agile, and the function is powerful. esProc is a
specific tool for mass data computation, and a query language of the new
generation over-performing SQL.
No comments:
Post a Comment