esProc, A Script Language for Data Analytics with Parallel Mechanism: Simplifies SQL-style Computations

In real-world business, many computing tasks require transposing rows and columns dynamically. There are a lot of discussions around the operation in online IT groups and forums. Below lists some of them:

http://www.dbforums.com/showthread.php?1628876-SQL-question-how-to-move-two-column-data-to-one-column-data

https://www.linkedin.com/grp/post/137774-5981739289539526658?trk=groups-post-b-title

http://stackoverflow.com/questions/29756836/mysql-pivot-table-and-numbered-columns

http://www.linkedin.com/groups/need-help-86080.S.5796995228699222017?trk=groups_items_see_more-0-b-ttl

SQL implements row-to-column transposition in these ways:

1. Using functions for transposing rows and columns

Oracle 11g and above versions, as well as MSSQL2005+, provide row/column transposition operators – pivot and unpivot – for switching rows to columns and columns to rows. They require specifying specific targeted columns and thus cannot handle scenarios with dynamic columns straightforwardly.

2. Using CASE expression

For some databases that don’t support pivot, like MySQL and DB2, you can use the conditional expression case when to handle the transposition. Similar to pivot, case when also requires the fixed targeted rows. You cannot code dynamically switching rows to columns in a straightforward way.

To deal with transposition to dynamic columns, the only option is:

3. Composing dynamic SQL

To transpose rows to dynamic columns, you need to compose dynamic SQL statements in the stored procedure. The ways the coding is actually done and their levels of difficulty vary according to different databases. Therefore it is impossible to write universal SQL statements for the computation.

In real-world cases, the transposition from rows to columns is often accompanied by inter-column calculations. This further complicates the problem.

Usually the aim of transposing rows to columns is to further display data. That means a main program (like the reporting tool) will receive the transposing result and use it to proceed to the next operation. Use Raqsoft esProc (free version is available) to help to handle the transposition if the main program is in Java. esProc script is written for dynamic interpretation and execution, and, therefore, more universal for coding row/column transposition. esProc provides JDBC interface that acts as the middleware between Java application and the database to let Java application execute esProc script as it accesses a database, without changing the application structure.

A simple example will be used to illustrate how esProc handles the row-to-column transposition and integrates with Java main program.

1. Simple row-to-column transposition

Generally, a row-to-column transposition operation simply transposes rows of data to columns of data, without involving the complex inter-column calculations. For example, transposing the student score table into sets where data is displayed by subjects:

The targeted result:

esProc script for implementing the task:

A1: Execute SQL to retrieve data, and sort data by ID and SUBJECT.

A2-A3: Group data by ID and SUBJECT. esProc retains the grouping result – the subsets – for later use.

A4: Create a desired, dynamic empty result set.

A5-B5: Loop through A2’s student groups, and write student IDs, names and scores of subjects into the empty result set according to A3’s grouping result.

A6: Return the result set.

Basic steps for transposing rows to columns with esProc: Generate the empty targeted result set (A4); then compute and append every row of data to the result set (A5, B5). With the stepwise computing mechanism supporting data table object, you can code the transposition from rows to columns in a natural thinking pattern.

The result set of esProc script can be returned to Java main program or the reporting tool via JDBC interface. Below is the code for Java to call the esProc script:

Class.forName("com.esproc.jdbc.InternalDriver");

con= DriverManager.getConnection("jdbc:esproc:local://");

// Call esProc script (which is similar to the stored procedure); the script file name is p1.

st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call p1 ()");

// Execute the script

st.execute();

// Get the result set

ResultSet rs = st.getResultSet();

……

The returned value is a JDBC standard ResultSet object. The way of calling esProc script is the same as that of accessing a database. Programmers who are familiar with JDBC can master it fast.

About deploying esProc JDBC and calling esProc script in it, see esProc Integration & Application: JavaInvocation.

2.Transposing rows to dynamic columns

In the preceding example, column names (values of SUBJECT) can be determined directly. In that case, it is not so difficult to code the transposition problem using static syntax like pivot (or case when). If they need calculations to be determined dynamically, it is hard to handle the problem with pivot. For example, manufacturing workshops manufacture different kinds and numbers of product.

You need to determine the number of resulting columns according to the one with the greatest length among groups divided by code. The targeted result:

esProc script for implementing the task:

A1: Execute SQL to retrieve data from the output table.

A2: Group data by code. esProc retains the grouping result (members in every group) for use in later computations.

A3: Calculate the maximum number of members among the groups, so as to determine the number of columns in the result set.

A4-A5: Create an empty, dynamic result set.

A6-B7: Loop through A2’s grouping results, and write product and amount of each group into A5’s result table sequence.

Similar to the preceding script, this script first generates an empty, dynamic result set, and then calculates desired data and appends it to the result set.

This computational task requires writing dynamic SQL statements to compose the result set. As the number of columns can only be determined by getting the group with the greatest members, and, unlike the simple pivot operation, field values cannot be directly used as the column names in composing a result set, the most effective way is writing stored procedure step by step.

Compared with the complicated stored procedure programming, esProc produces more concise code in a more easily way with step-by-step computing model.

3. Row-to-column transposition involving inter-column calculations

As mentioned at the beginning of the article, transposing rows and columns often involves inter-column calculations. Below is such an example:

Output the payment of each month in the specified year (say, 2014). If a month lacks the related data, the payment for this month is the same as that for the previous month.

The targeted result:

esProc script for implementing the task:

A1: Execute SQL to retrieve data of the specified year.

A2: Create an empty table sequence with 12 months for the result set.

A3: Group the data by customer name.

A4-B7: Loop through A3’s groups to calculate. B5 specifies the payment for the current month; B6 specifies null as the payment value of the previous month; and B7 inserts the resulting records into the empty table sequence.

Likewise, the implementation first creates an empty result set and then appends data to it. Difference is that the appended data is got through a series of calculations.

esProc supports order-related computing, so it is easy for it to reference the value of the previous record. Compared with complex SQL approach or stored procedure programming, esProc script is clearer and easier to understand in performing inter-column calculations for dynamic row-to-column transposition.

4. Column-to-row transposition

On top of the transposition scenarios mentioned in the above, there are others requiring switching multiple columns in one row to multiple rows (column-to-row transposition). The following source data has unfixed number of columns:

The targeted result:

esProc script for implementing the task:

A1: Execute SQL to retrieve data.

A2: Create an empty table sequence for the targeted result.

A3: Calculate the number of rows into which each record is to be split, based on the number of A1’s columns.

A4-B4: Loop through A1’s data sets to dynamically get data from each column and insert it into A2’s resulting table sequence.

menu

June 8, 2015

Simplifies SQL-style Computations – Dynamically Transposing Rows and Columns

1. Simple row-to-column transposition

2.Transposing rows to dynamic columns

3. Row-to-column transposition involving inter-column calculations

4. Column-to-row transposition

No comments:

Post a Comment