esProc, A Script Language for Data Analytics with Parallel Mechanism: Bid Farewell to Stored Procedure

A stored procedure is a collection of SQL queries and flow control statements. The stored procedure is usually used to break a complex computation task down to several simple computation steps. Although the stored procedure can improve the flexibility at the cost of complexity, it is always hindered by the inherent drawbacks of SQL, such as incomplete stepwise mechanism, unordered data, and lack of sets and references.
esProc not only overcomes the above drawbacks of SQL, but also greatly lowers the requirements on the technical background of developers. It is an ideal tool to substitute the stored procedure.

Case and Comparison

A telecommunications manufacturer develops a report for analyzing the sales value, sales amount, and link relative ratio of the Superior Products. The Superior Products refers to “products whose sales amounts are among the top 10 in every state”, and the data is mainly stored in the stateSales table. The data structure of this table is as given follows:

The data in stateSales table is mainly about the sales amount information of each product in each state. However, it is not easy for the reporting tool to directly use these data. This is because:

The data is still unarranged with lots of duplicate data of the same state and product.
The algorithm is complicated. It is hard to compute the Superior Product.

In order to make it easy for external use, we will need to process the data in the stateSales table. This procedure is called Data Preparation. Let’s compute the Superior Product in this case, and compare the respective performance of data preparation between the stored procedure and the esProc.

Stored Procedure Solution

1	create or replace package salesPkg
2	as
3	typesalesCur is ref cursor;
4	end;
5	CREATE OR REPLACE PROCEDURE topPro(io_cursor OUT salesPkg.salesCur)
6	is
7	varSql varchar2(2000);
8	tb_count integer;
9	BEGIN
10	select count(*) into tb_count from dba_tables where table_name=’TOPPROTMP’;
11	if tb_count=0 then
12	strCreate:=’CREATE GLOBAL TEMPORARY TABLE TOPPROTMP ( stateTmp NUMBER not null, productTmp varchar2(10) not null, amountTmp NUMBER not null ) ON COMMIT PRESERVE ROWS’;
13	execute immediate strCreate;
14	end if;
15	execute immediate ‘truncate table TOPPROTMP’;
16	insert into TOPPROTMP(stateTmp,prodductTmp,amountTmp) select state,product,amount from stateSales a where not( (a.state,a.product) in ( select state,product from stateSales group by state,product having count()>1 ) and rowid not in ( select min(rowid) from stateSales group by state,product having count()>1 ) ) order by state,product;
17	OPEN io_cursor for
18	SELECT productTmp FROM ( SELECT stateTmp,productTmp,amountTmp,rankorder FROM (SELECT stateTmp,productTmp,amountTmp,RANK() OVER (PARTITION BY stateTmp ORDER BY amountTmp DESC) rankorder FROM TOPPROTMP ) WHERE rankorder<=10 order by stateTmp ) GROUP BY productTmp HAVING COUNT(*)=(SELECT COUNT (DISTINCT stateTmp) FROM TOPPROTMP); END;

01-04: Define the ”cursor”
05-09: Define the stored procedure and variables
10-15: Define the ”temporary table”, check whether to redefine, and empty the temporary data.
16: Filter out the duplicate data and write the filtered data to the temporary table. Since it is not easy to retrieve the non-duplicate data, this tip may be helpful: find the duplicate data, and then use the Not operator to get the required data in a reverse way, then the remaining data is the non-duplicate data. These operations could be implemented by embedding two subqueries.
17: Open the cusor, and prepare for the data export.
18: Get the products that is among top 10 in every states: Firstly, rank the product in each state (i.e. rankorder) with the window function; Secondly, filter out the top 10 products in each state; Finally, get the top 10 products in every state. Because SQL lacks the function to compute the intersection set, a tip could be used in this case: Group by product. Then, check wheather the number of the same products equals to the number of states. If so, then it indicates that this product is among top 10 in every state. These operations can be implemented with a three-nested query plus a subquery. The SQL-2003 window functions can greatly reduce the difficulty.
Regarding the stored procedure, there are several ways to achieve this goal. In the above procedure, the computation is broken down to 2 clear steps. Of course, in the practice, more temporary tables and more SQL statements could be used to split the procedure into more steps at the cost of a much more lines of codes.

esProc Solution

A1 Cell: Retrieve data
A2 Cell: Filter the duplicate data in the A1
A3 Cell: Group the data in A2 by state
B3 Cell: Of each group (state) in the A3, compute the serial number of records whose sales amounts are among the top 10.
A4 Cell: Get the product from A3 with serial number
A5 Cell: Compute the intersection set of each group of product in the A4.
In this case, the function rank() is used to compute the rankings, taking amount as the criteria for comparison; the function pselect() is used to compute the serial number of records ranking below 10; the function isect() is used to compute the intersection set of data from multiple groups.

Comparison

To solve such problems with the stored procedure, the developer must have a relatively much higher technical competence. Even so, the highly capable developers with strong technical background may still often find themselves in a dilemma of either composing brief statements of bad readability or a great many lines of codes. On one hand, composing brief statements requires well-grasped SQL techniques, and composing a great many lines of codes requires great programming skills. On the other hand, the stored procedure is based on the SQL statement and thus hard to root out these inherent drawbacks. Considering this, it defeats the purpose for introducing the stored procedure to overcome the drawbacks of SQL statements.
As we can see, esProc users can solves this problem from a rather intuitive and descriptive business perceptive, not requiring any so-called “tips” to compose any statement hard to read. Every step is concise and easy-to-read. The whole procedure goes step by step to the analysis target easily.
Compared with the stored procedure, esProc by nature is definitely more suitable to solve the complex computation.

Feature: a JDBC Interface

JDBC is a widely-applied standard interface that is often used by the application of Java architecture to call various databases. esProc is built with Java totally and provide JDBC interface for external use. When being called, esProc will be recognized as a database, allowing the external application, such as reporting tools, to access it with SQL statements.

Feature: a Professional Analysis Tool for Mass Data

esProc is a tool ideal for mass data computation, with same analysis ability capable for SQL statement and stored procedure: On the one hand, esProc can be used to query, filter, group, and collect statistics on data like SQL statement; On the other hand, esProc can implement the loop and branch judgment for the analysis procedure like the stored procedure.

Feature: a Better Performance than that of Stored Procedure

Compared with esProc, both SQL statements and stored procedures are also the analysis tool for mass data but have some obvious defects: the incomplete stepwise mechanism, incomplete set-lizing, lack of serial number, and lack of reference. Therefore, when confront with complex computation, it is usually quite hard to design the SQL statement or stored procedures, and requires extremely strong technical background to be a competent developer.
esProc has overcome all drawbacks of stored procedure inherited from SQL, and thus is much more powerful than the stored procedure on the respect of analysis. Furthermore, esProc enable those developers of weak technical background to implement the complex computation easily

esProc Values

esProc boasts a powerful analysis ability and low requirements on developer’s technical background, which saves them from the complex and hard-to-understand stored procedure.
esProc is not bond to any database and supports most common datasources, allowing developers to migrate conveniently. It is an analysis tool that is not only highly efficient but also easy to maintain.
esProc is specially built for the mass data computation as a smarter choice for data analysis.

menu

May 8, 2012

Bid Farewell to Stored Procedure