September 1, 2015

esProc Simplifies SQL-style computations – Ungrouping

During database application development, we often need to handle complicated SQL-style computations. One of them is the ungrouping, that is, the opposite process of grouping and summarizing which requires, for example, splitting each record into multiple records. We cannot realize the operation in SQL without taking great trouble. The code is so complicated and difficult to understand.

By contrast, esProc provides easier way and thus simpler code for realizing the ungrouping. The following are two examples.

In packGather, a table that holds the summarized data of packages of various products, productID field represents the ID numbers of products, packing field represents the number of each product’s packages, all of which have the same number of products, and quantitySum represents the total quantity of a product. Some of the data are as follows:

We are asked to split packGather into tables of packages, that is, list packages separately and number each of them. Take product b as an example, there are 3 records in the table after it is split.

The following esProc code can be used to solve the problem:

A1:Select all the data from the database. They are as follows:

A2= A1.conj(packing.new(~:seq, productID:product, quantitiySum/packing:quan))

This line of code first splits each record in A1 into multiple records to create two-dimensional tables. Every two-dimensional table has different number of records but the same structure, with three fields: seq, product and quantity. Then it concatenates these two-dimensional tables to form a general two-dimensional table.


The function of conj function is to concatenate data. For example, split the first record in A1. The corresponding code and result are as follows:

A1(1).(packing.new( ~:seq,productID:product,quantitiySum/packing:quan))


A1.conj(…) is equal to [A1(1),A1(2),A1(3)…].conj(…). The final result is as follows:

Please note the expression packing.new(…), which means creating a new table sequence according to the packing field of each record in A1. new function is used to create a new table sequence based on an existing sequence or table sequence, like ["a","b","c"].new() or [1,2,3…N].new(). The latter can be abbreviated to N.new(). If, for example, the value of packing field in the first record is 2, this expression will be parsed as [1,2].new(…). While creating a new table sequence using new function, we can use “~” to represent members of the original sequence. So ~:seq in the expression in A2 means using the original sequence as the first field of the new table sequence, with seq being the field name.

A2 represents the final result of this example.

Now let’s look at another example that computes liquidated damages in a database. Here is a table – contract – that has multiple fields, three of which are ID (contract number), enddate (ending date) and amount (contract amount). Please compute how much liquidated damages should be paid each day for each breach of contract, on the assumption that the required liquidated damages per day is one thousandth of the contract amount.

Some of the data of contract are as follows:


esProc code: 

This piece of code uses periods function to generate a sequence of dates starting from the ending date of a contract to the current date. # represents the current sequence number in the time sequence. The final result is as follows: 

Note: esProc program can be called by a reporting tool or a Java program in a way similar to that in which a Java program calls an ordinary database. The JDBC provided by esProc can be used to return a result of the form of ResultSet to the Java main program. Please refer to the related documents for details.

August 31, 2015

esProc Simplifies SQL-style Computations – Data Sorting by Specified Order

Data sorting by a specified order is one of the complicated SQL-style computations we often need to deal with during the database application development. The computation requires that data be sorted by a specified order rather than in ascending or descending order. For example, sort data according to the order of 2,3,1, or of “DSGC”,”TAS”,”GC”,”HU”. If a table can be created to store the criteria in the database, we can sort data using JOIN statement. But in many cases the sorting criteria are dynamic, or in the database a table is not allowed to be created, thus we should perform UNION before JOIN, or use decode function. We can do so when the sorting criteria are short. Sometimes they are quite long, and we must write long UNION and decode strings manually, which causes mistakes frequently.
esProc, however, can perform this kind of computation easily. Let’s look at how it works through an example.

There is a table – LIQUORS – that holds inventory of a great many liquors. Now the records need to be sorted according to a specified LID order, like “2, 1 , 3, 4, 5…”. Some of the data in table LIQUORS are as follows:

esProc code: 

A1:Select all the data from LIQUORS. Some of the data are as follows: 

A2=A1.align@s([2,1,3,4,5], LID). This line of code sorts LID field in the object of A1 according to the order of 2, 1, 3, 4, 5. The result is as follows: 

The sorting criterion can also be an external parameter. For example, define a parameter – arg1, modify the code in A2 into =A1.align(arg1, LID), execute the script (which can be called by a Java program or a reporting tool as well) and enter [2,1,3,4,5] in the Input argument window, as shown below: 
The computed result is completely the same as that got by executing the previous code:

In addition to solving problems of data sorting by a specified order, esProc can provide good solution to problems of data grouping with fixed criteria. As to more information in this respect, please refer to esProc Simplifies SQL-style computations – Data Grouping with Fixed Criteria.

esProc program can be called by a reporting tool or a Java program in a way similar to that in which a Java program calls an ordinary database. The JDBC provided by esProc can be used to return a result of the form of ResultSet to the Java main program. For more details, please refer to the related documents.


August 28, 2015

esProc Simplifies SQL-style Computations – Transpose Rows and Columns

During database application development, we often need to deal with complicated SQL-style computations. The transposition of rows and columns is one of them. Oracle uses pivot function to realize this computation. The other databases haven’t the counterparts to realize it directly, which makes the corresponding code difficult to write, understand and maintain. Besides, even the pivot function can only transpose the fixed columns, but is powerless about the unfixed ones. So are the other databases. Generally all of them must resort to the high level programming languages to realize the dynamic SQL.

However, coding this computation with esProc will be concise and easy to understand. We’ll use an an example to illustrate this.

The following figure shows part of the SALES – a database table where order data are stored.

It is required to compute the total order amount, the maximum and minimum order amount, and the total number of orders of each month of the year 2013, and then transpose the data into a table with thirteen columns and four rows, in which the four operations occupy the first column, with subtotal being the column name, and every month covers a column, with the column names being 1, 2, 3, 4… The first five fields are as follows:

esProc code:

A1:Execute the SQL statement of selecting the data of the year 2013 and grouping and summarizing the data by the month. Result is as follows:

This simple SQL statement for data grouping and summarizing is supported by any database. The difficulty is the transposition of rows and columns following it.

A2=create(subtotal).record(["OSum","OMAX","OMIN","OCount"])

This line of code creates an empty table sequence where there is only one field: subtotal, as shown below:


Note: A table sequence is a data type in esProc. It is a structured two-dimension table similar to the data table of in SQL, but with more powerful function and more flexible usage. By the way, the result of A1 is a table sequence as well.

B2=A2.derive(${to(A1.len()).string()}).

This line of code adds twelve columns to the table sequence in A2 and thus forms the data structure after the transposition, as shown below:


derive function is used to add new columns to an existing table sequence so as to form a new one. For example, derive(1) means adding one column, where 1 is the field name and the field value is the same as the column name. derive(0:field1, null:field2) means adding two columns, where, respectively, field names are field1 and field2 and field values are 0 and null.

According to the requirement of transposition, twelve columns should be added here, for which the code should be derive(1,2,3,4,5,6,7,8,9,10,11,12). A macro, that is ${}, whose role is to convert a string into an expression, is used here in order to generate the code dynamically. to(A1.len()) in the macro is a sequence, whose value is [1,2,3,4,5,6,7,8,9,10,11,12]. The function string() is used to convert the sequence into the string “1,2,3,4,5,6,7,8,9,10,11,12”.

A3-A5:Perform loop on A1, accessing one record each time, rearranging it vertically and, at the same time, modifying the corresponding column in the table sequence in B2. Please note the working range of the loop statement can be represented by the indentation, with no need of using braces ({}), or begin/end. So both B4 and B5 are in the working range and neither A4 nor A5 is in it.

Note: In esProc’s loop body, the loop variable is the cell where for statement is entered. In other word, A3 can be used to reference the current record and A3.MONTH can be used to reference the MONTH field of the current record.

B4=A3.OSum | A3.OMAX | A3.OMIN | A3.OCount

This line of code concatenates the summarized fields of the current record in columns. The operator “|” represents concatenation. For example, the records of December in A1 should be like this after being concatenated:


A3.OSum in the code represents the OSum field of the current record. Since OSum is the second field of the records, it can be referenced by its sequence number, thus the code shall be written as A3.#2. 
Equally, the above line of code can be put as B4=A3.#2 | A3.#3 | A3.#4 | A3.#5.

B5=eval(“B2.run(B4(#):#”+ string(#A3+1)+ “)”).

This line of code means modifying the fields in B2 based on the result of B4.

eval function parses strings into expressions dynamically. For example, the computed result of eval(“2+3″) is 5; and here B2.run(B4(#): #13), the loop code for December, in eval function means inserting members of B4 in order into the 13th column (i.e. December) in B2 according to the sequence numbers of the records in B2.

run function is used to modify the fields. For instance, run(field1+field2:field1, 0:#2) means modifying the value of field1 into field1+field2 and the value of the second field (i.e. #2) into 0.

#A3 means the current loop number. Its value is 1 when the first loop is being executed, and the value is 2 when the second one is being executed, and so on and so forth.

When the loop statement in A3-B5 is executed, the final result will be got in B2. The first several columns are as follows:
In addition, esProc program can be called by the reporting tool or a Java program in a way similar to that in which a Java program calls an ordinary database. The JDBC provided by esProc can be used to return a computed result of the form of ResultSet to the Java main program. For more details, please refer to the related documents。

August 27, 2015

esProc Simplifies SQL-style Computations– Get Top N Rows of Each Group

The operation of getting the top N rows of data is common, such as the biggest salary raise of each employee, the three lowest scores in playing golf and the five days of each month when each product has its biggest sales. SQL resorts to advanced techniques, like window functions and keep/top/row number, to solve this kind of problem. Therefore the code is quite difficult. And as many databases (like MYSQL) doesn’t support these advanced techniques, they can only realize this operation using complicated JOIN statement and nested subquery. If multi-layered groups or multi-level relationships are involved, the computation will become even more complicated.

Yet esProc’s top function can select the top N rows of data of each group according to the row numbers as well as the maximum and minimum values, thus the computation will be made simpler. The following example will teach you how esProc works.  

golf, a database table, stores scores members have got in playing golf. Please select the three highest scores each member has had. Some of the data are as follows:

esProc Code: 

A1:Select data from the database. Click the cell and you can see the result in detail:

A2:=A1.group(User_ID). The code groups the selected data in A1. Result is as follows:

As shown in the above figure, the selected data have been divided into multiple groups by User_ID, with each row representing a group. Click the hyperlinks in blue and you can see members of each group: 
A3:=A2.(~.top(-Score;3)). The code gets the top three records of Score field of each group. Here “~” represents each group of data and ~.top() represents that top function is used to compute each group of data in order. top function can get the top N records of a data set. For example, top(Score;3) means sorting the records in ascending order by Score field and getting the top 3 records (i.e. the three smallest values); top(-Scroe;3) means sorting the records in descending order by Score field and getting the top 3 records (i.e. the three biggest values); and top(#;3) means getting the top three records according to the original order. The computed result of this step is as follows: 

A4:=A3.union(). It performs union operation on each group of data. Result is as follows: 
The above four steps can also be combined into one step for the convenience of maintenance and debugging. Thus the code will be db.query(“select * from golf”).group(User_ID). (~.top(-Score;3)).union().
In addition, esProc program can be called by the reporting tool or a Java program in a way similar to that in which a Java program calls an ordinary database. The JDBC provided by esProc can be used to return a computed result in the form of ResultSet to the Java main program. Please refer to related documents for more details.

August 26, 2015

esProc Simplifies SQL-style Computations – Data Grouping with Fixed Criteria

During database application development, we often need to handle complicated SQL-style computations. Data grouping with fixed criteria is just one type of these. It refers to the computation whose grouping criteria not come from the to-be-grouped data, but from the outside, such as another table, the external parameters or a list of conditions. For the operation of data grouping with fixed criteria in certain cases, like the one in which the grouping criteria are defined by another table and no specific grouping order is required, it can be realized easily in SQL. But for the other cases, SQL has difficulty in realizing the operation. esProc, however, provides easy solutions to various problems of this operation, which will be illustrated through the following examples.

A table – sales - stores the records of orders, in which column CLIENT contains names of clients and column AMOUNT contains order amounts. Now group the table according to the “list of potential clients” and sum up column AMOUNT of each group. Some of the data of sales are displayed in the following figure:

Example 1:The “list of potential clients” comes from the Std field of another table - potential – and only has four records, which are TAS, DSGC, GC and ZTOZ in order. Client ZTOZ is not in sales. It is required to group and summarize the data according to these four records in their order.
If no specific order is required, this operation can be realized easily with SQL. The corresponding code is as follows:

select potential.std as client, sum(sales.amount) as amount from potential left join client on potential.std=sales.client group by potential.std.

But since a specific order is required in this example, a field for sorting the data must be added if SQL is used to realize the operation and this field should finally be removed using subquery . By contrast, it will be much easier to perform the operation in esProc, as the following code shows:


A1,B1:Query data from the database, and name the two groups of selected data respectively as sales and potential, as shown below:

A3:=sales.align@a(potential:Std,Client) This line of code uses align function to divide the Client field of sales into four groups according to the Std field of potential, as shown in the following figure:

It can be seen that the data of the first three groups have existed in sales, while those in the fourth group are not included in it. So the value of the fourth group is empty. The @a option of align function means fetching all the eligible data of each group; without it only the first eligible data of each group will be fetched.

A4:=pjoin(potential.(Std),A3.(~.sum(Amount)))

This line of code joins two groups of data using pjoin function. One part is potential.(Std), which represents the Std field of potential, and the other part is A3.(~.sum(Amount)), which means summing up the Amount field of each group in A3. The final result of this example is as follows:


Example 2:The list of potential clients has fixed values but there are many clients in it many. If there are only a small number of clients, union statement can be used in SQL to combine all the clients into a pseudo table. But the method is not desirable if there are a large number of clients, and a new table must be created to store the data permanently. Yet the use of esProc can save us the trouble of creating the table. The esProc code is as follows:

In the above code, A2 contains strings which are separated from each other by commas and which represent a great number of fixed values conveniently.

Example 3:The list of potential client is the external parameters, like TAS, BON, CHO, ZTOZ.
As the parameters change frequently, it is inconvenient to create a pseudo table using union statement in SQL. Alternatively, we have to create a temporary table, parse the parameters and insert the parsed parameters into this table for later computation. In esProc, it is unnecessary to create a temporary table for this case. The realization process is as follows:

First, define a parameter arg1, as shown in the following figure:


Then modify the script file, as shown below:

Run the s esProc_sql_fixedcriteria_cript and enter the parameter values, which are assumed to be “TAS, BON, CHO, ZTOZ“, as shown below:

Because the grouping criteria are the same as those in example 1, their final results are the same.
Note: The code in A2 converts the strings “TAS, BON, CHO, ZTOZ” into the sequence ["TAS","DSGC","GC","ZTOZ"]. But this step of conversion can be omitted if the parameters are entered and ["TAS","DSGC","GC","ZTOZ"] is got directly.

Example 4:The criteria for the operation of data grouping with fixed criteria can be either the numerical values or conditions. For example, divide the order amounts into four sections by 1000, 2000 and 4000, with each section being a group of orders, and sum up the total amount of each group.
If the criteria have been given, they can be fixed in SQL; if the criteria are the dynamic external parameters, high-level programming languages, like Java, should be used to write a program to compose a SQL statement and the process will be rather complicated. By supporting dynamic expressions, esProc can realize the operation in this example easily. The following shows the esProc code:


In the above code, the variable byFac represents the grouping criteria in this example, which include four string conditions. byFac can be external parameters, or a view or a table in a database as well. The final result of A4 s as follows:

Example 5:In the previous operations of conditional grouping, it happens that there are no overlapped conditions. But actually it is common to find that conditions overlap with each other. For example, group the order amounts according to the following conditions:
1,000~4,000:Common orders - r14
Less than 2000:Non-important orders - r2
Above 3000:Important orders - r3

Here both r2 and r3 overlap with r14. In this case, if we don’t want the overlapped data, we can first select the data satisfying r14 and then select from the rest of the data those satisfying r2, and so forth.
esProc provides enum function to perform the conditional grouping whose conditions overlap with each other, as shown below:


The grouping result of A3 is as follows:

Computed result is as follows:

But sometimes overlapped data is needed. For this purpose, data satisfying r14 will be selected from sales – one of the two groups of selected data mentioned above, and then data satisfying r2 will be selected from the original table of sales, and so forth. Here @r option will be used in enum function to modify the code in A3 into =sales.enum@r(byFac,Amount). Now the grouping result is as follows: 

In the above figure, the data in red boxes overlap with each other. The final computed result is as follows: 

In addition, esProc program can be called by the reporting tool or a Java program in a way similar to that in which a Java program calls an ordinary database. The JDBC provided by esProc can be used to return a computed result of the form of ResultSet to the Java main program. For more details, please refer to the related documents.