"Being ordered" means the data are stored in a specific order. Every
piece of data of each record has its own relative or absolute serial
number. You can access the data according to its serial number and feel
free to perform the order-relating actions on data, such as locating,
ranking, and sorting.
The typical algorithms involving "being ordered" include the top N
records, year-on-year, and link relative ratio statistics. The ordered
data with esProc can help you solve a great number of challenges easily regarding
the data analysis.
Check the below examples for further explanation.
I Case and Comparison
Case
Assume that there is a telecommunications product manufacturer which
needs to analyze the superior products: to find the products whose sales
values are always among the Top 10 in each state. The sales value data
are stored in the sales table, and the main fields are amount, product,
and state.
SQL Solution
SELECT product
FROM (
SELECT product
FROM (
SELECT product,RANK() OVER(PARTITION BY state ORDER BY amount DESC) rankorder
FROM sales)
WHERE rankorder<=10)
GROUP BY product
HAVING COUNT(*)=(SELECT COUNT(DISTINCT state ) FROM sales)
This type of problems requires running statistics on data group
rankings. The currently popular SQL-92 syntax is unable to implement it.
Therefore, we use SQL-2003 standard, which is gradually supported by
more and more vendors, to achieve this goal. Despite the currently
incomplete and imperfect support for SQL-2003, the problem can still be
solved. However, the HAVING sub-queries of bad readability are used to
compute the intersection of sets. The average developers may find it a
bit difficult to understand and leverage.
esProc Solution
A1 Cell: Group the data by state, and each group consists of all products and sales volume of a state
A2 Cell: Compute the Top 10 records of the highest sales value in each group (state) in A1.
A3 Cell: Based on A2, use isect to compute the intersection of product
sets of each group, that is, the top 10 products enjoying the highest
sales in each state.
Comparison
As we can see in the above example, SQL statements lack the concept
of “being ordered” and thus are not fit for this type of analysis
relating to serial numbers. Moreover, it is difficult to wade through
the SQL statements cobbled together. esProc statements are based on
serial numbers and follow the natural pattern of human thought, which
empowers the user with a free access to records.
II Function Description:
Access with Basic Serial Number
Get the first three records of sales table: sales([1,2,3]), which is same to sales.m([1,2,3]).
Get the sales field value of the last record: sales.m(-1).(amount)
Get the serial number of record whose sales is above 1000: sales.pselect@a(amount>1000)
Get the serial number of record whose sales is the highest: sales.pmax@a(amount)
Serial Numbers of Mass Data
The serial number of mass data is quite common and thus a specific
sign of # is assigned to represent it. For example, with the number of
clients increase over time, we need to allocate the clients to 2
departments. In order to allocate it evenly, we need to sort the records
by client contribution and then group. One group is for clients with
odd number, and the other group is for the clients with even number:
customer.group(#%2==1).
The relative number of the mass data is represented with [n]. For
example, the link relative ration of sales value increase of this month
compared with that of last month:
sales.(amount-amount[-1]).
Rank and Sort
The rank action can be used to compute rankings. For example, to
compute the sales value rankings of each record, the esProc user can
compose like this: sales.rank(amount).
The sort action is to arrange the data according to a specific rule for
reviewing and further-processing. For example, by sorting the data by
the alphabetical order of state names in ascending order, put the data
from a same state together. The esProc user can compose like this:
sales.sort(state). They can also review the sales value sorted in
descending order in each state and compose like this:
sales.sort(state,amount: -1).
Although sorting may disturb the serial number of existing data,
sometimes we do need to know the formal serial number of a record after
sorting. For example, regarding the three months of the highest sales
value in this year, compute the link relative ratio on a monthly basis.
In this case, the esProc user can use psort for a much more convenient
computation.
In the Cell A1, records are sorted by month in ascending order.
In the cell A2, sort the data by sales value and compute the serial
numbers, then get the first three serial numbers. At this time, psort is
a “mimic” sorting, and the formal sequence of record in A1 is not
changed. This equals to “Let’s find out the formal serial number of the
newly sorted data in the old data sequence by supposing such sorting was
already implemented”.
In the cell A3, retrieve the record from the formal data according to
the resulting serial number from A2, that is, the top 3 months of the
highest sales volumes. At last, execute the “this month subtract
previous month” computation to obtain the final results.
The same computation will be a great challenge to SQL. As for the intuitive and convenient esProc, it is just a piece of cake.
III Advantages
Convenient to Locate, Rank and Sort
The data in esProc are all ordered, offering the underlying natural
support for the locating, ranking, sorting, and other statistics
computations of this type.
Easy to Handle the Complex Computation
In the practical data analysis, a great number of complex
computations are related to data order. Compared with esProc, SQL lacks
the concept of order, and the order-related computation is comparatively
harder for SQL to handle.
Fit for Mass Data Computation
Regarding the mass data access by serial number, both esProc and
senior language can implement it, but this is impossible for SQL.
Although the senior languages allow for record access by serial number,
they are by no means convenient once compared with esProc. Just name a
few, the esProc user can represent the serial number so easily and
access to multiple records by a set of serial numbers in a quite simple
way. esProc also provides lots of other similar convenience to implement
the mass data access additionally.