August 13, 2012

Free Record Acess: Serial Number, Locating and Sorting In esProc


In esProc, the data is ordered. You can access the data freely by locating, ranking, sorting, and with other methods.


"Being ordered" means the data are stored in a specific order. Every piece of data of each record has its own relative or absolute serial number. You can access the data according to its serial number and feel free to perform the order-relating actions on data, such as locating, ranking, and sorting.
The typical algorithms involving "being ordered" include the top N records, year-on-year, and link relative ratio statistics. The ordered data with esProc can help you solve a great number of challenges easily regarding the data analysis.
Check the below examples for further explanation.

I Case and Comparison

Case

Assume that there is a telecommunications product manufacturer which needs to analyze the superior products: to find the products whose sales values are always among the Top 10 in each state. The sales value data are stored in the sales table, and the main fields are amount, product, and state.

SQL Solution

SELECT product
FROM (
SELECT product
FROM (
SELECT product,RANK() OVER(PARTITION BY state ORDER BY amount DESC) rankorder
FROM sales)
WHERE rankorder<=10)
GROUP BY product
HAVING COUNT(*)=(SELECT COUNT(DISTINCT state ) FROM sales)

This type of problems requires running statistics on data group rankings. The currently popular SQL-92 syntax is unable to implement it. Therefore, we use SQL-2003 standard, which is gradually supported by more and more vendors, to achieve this goal. Despite the currently incomplete and imperfect support for SQL-2003, the problem can still be solved. However, the HAVING sub-queries of bad readability are used to compute the intersection of sets. The average developers may find it a bit difficult to understand and leverage.

esProc Solution


A1 Cell: Group the data by state, and each group consists of all products and sales volume of a state
A2 Cell: Compute the Top 10 records of the highest sales value in each group (state) in A1.
A3 Cell: Based on A2, use isect to compute the intersection of product sets of each group, that is, the top 10 products enjoying the highest sales in each state.

Comparison

As we can see in the above example, SQL statements lack the concept of “being ordered” and thus are not fit for this type of analysis relating to serial numbers. Moreover, it is difficult to wade through the SQL statements cobbled together. esProc statements are based on serial numbers and follow the natural pattern of human thought, which empowers the user with a free access to records.

II Function Description:

Access with Basic Serial Number

Get the first three records of sales table: sales([1,2,3]), which is same to sales.m([1,2,3]).
Get the sales field value of the last record: sales.m(-1).(amount)
Get the serial number of record whose sales is above 1000: sales.pselect@a(amount>1000)
Get the serial number of record whose sales is the highest: sales.pmax@a(amount)

Serial Numbers of Mass Data

The serial number of mass data is quite common and thus a specific sign of # is assigned to represent it. For example, with the number of clients increase over time, we need to allocate the clients to 2 departments. In order to allocate it evenly, we need to sort the records by client contribution and then group. One group is for clients with odd number, and the other group is for the clients with even number: customer.group(#%2==1).
The relative number of the mass data is represented with [n]. For example, the link relative ration of sales value increase of this month compared with that of last month:
sales.(amount-amount[-1]).

Rank and Sort

The rank action can be used to compute rankings. For example, to compute the sales value rankings of each record, the esProc user can compose like this: sales.rank(amount).
The sort action is to arrange the data according to a specific rule for reviewing and further-processing. For example, by sorting the data by the alphabetical order of state names in ascending order, put the data from a same state together. The esProc user can compose like this: sales.sort(state). They can also review the sales value sorted in descending order in each state and compose like this: sales.sort(state,amount: -1).
Although sorting may disturb the serial number of existing data, sometimes we do need to know the formal serial number of a record after sorting. For example, regarding the three months of the highest sales value in this year, compute the link relative ratio on a monthly basis. In this case, the esProc user can use psort for a much more convenient computation.

In the Cell A1, records are sorted by month in ascending order.
In the cell A2, sort the data by sales value and compute the serial numbers, then get the first three serial numbers. At this time, psort is a “mimic” sorting, and the formal sequence of record in A1 is not changed. This equals to “Let’s find out the formal serial number of the newly sorted data in the old data sequence by supposing such sorting was already implemented”.
In the cell A3, retrieve the record from the formal data according to the resulting serial number from A2, that is, the top 3 months of the highest sales volumes. At last, execute the “this month subtract previous month” computation to obtain the final results.
The same computation will be a great challenge to SQL. As for the intuitive and convenient esProc, it is just a piece of cake.

III Advantages

Convenient to Locate, Rank and Sort

The data in esProc are all ordered, offering the underlying natural support for the locating, ranking, sorting, and other statistics computations of this type.

Easy to Handle the Complex Computation

In the practical data analysis, a great number of complex computations are related to data order. Compared with esProc, SQL lacks the concept of order, and the order-related computation is comparatively harder for SQL to handle.

Fit for Mass Data Computation

Regarding the mass data access by serial number, both esProc and senior language can implement it, but this is impossible for SQL. Although the senior languages allow for record access by serial number, they are by no means convenient once compared with esProc. Just name a few, the esProc user can represent the serial number so easily and access to multiple records by a set of serial numbers in a quite simple way. esProc also provides lots of other similar convenience to implement the mass data access additionally.

Leave a Reply

if you have any questions, please feel free to leave your ideas below by comments.