The inter row computation is quite common, such
as the aggregate, comparison with same period of any previous year, and link
relative ratio. Both R language and esProc provides the pretty good inter-row
computation ability with slight difference to each other. In the below case, the
utilization of some basic inter-row computations is demonstrated to study on the differences between the two methods:
A sales department of a company wants to
make statistics on the outstanding sales persons, that is, the sales persons achieve
half of the total sales of the company. The data are mainly from the order
table of MSSQL database salesOrder. The main fields include the ID of order: ordered,
Name of sales person: name, Sales amount: sales, and date of order: salesDate.
The straightforward solution is as shown
below:
1.Group by sales
person to calculate the sales amount of each sales person.
2.Sort by sales
amount in reverse order on the basis of the data from the previous step.
3.According to the
previous step, calculate the aggregate value of each record, and calculate the
standard of comparison: the half of total sales of this company.
4.Of the
aggregate values calculated in the previous step, select out the list of sales persons
whose sales achievement meet the below conditions: lower or equal to the
standard of comparison; or although higher than the standard of comparison, the
sales achievement of previous sales person is lower than the standard of comparison.
The detailed solution of R language is as
shown below:
01
library(RODBC)
02
odbcDataSources()
03
conn<-odbcConnect("sqlsvr")
04
originalData<-sqlQuery(conn,'select * from salesOrder')
05
odbcClose(conn)
06
nameSum<-gNameMonth<-aggregate(originalData$sales,list(originalData$name),sum)
07
names(nameSum)<-c('name','salesSum')
08
orderData<-nameSum[rev(order(nameSum$salesSum)),]
09
halfSum<-sum(orderData$salesSum)/2
10 orderData$addup<-cumsum(orderData$salesSum)
11 subset(orderData,addup<=halfSum
| (addup>halfSum& c(0,addup[-length(addup)])<halfSum))
Please find the detailed solution of esProc
below:
Then, let us study on the differences between
aggregate values:
The R language uses cumsum to compute the aggregate value in the line 10.
esProc uses cumulate in A4 to calculate the aggregate value.
Both writing styles are very convenient for
users. However, the operation principle of esProc is aimed to each record: firstly,
calculate the cumulate, then, get
the aggregate value corresponding to this record according to the #row number. By comparison, R language
enjoys a higher efficiency than esProc on this respect since the computation
will be only carried out once if using R language.
Dividing one statement of esProc into two
statements can solve the efficiency issue, that is, firstly, calculate the list
of aggregate value separately, and then insert it to the original data set.
However, such writing style is not as concise as the R language that only
requires one line of code.
Then, let us check the qualified sales
person and the differences:
The R language completes the computation at the Line 11, mainly by
moving the line, and using c(0,addup[-length(addup)])to
construct a column for the new data. Compared with the column addup, the new column just moves down one
column, and the last entry of data is removed and filled with 0 of the first
entry. Then, you can compare whether the aggregate value is lower than the
standard of comparison, or although it is higher than the standard of
comparison, its previous record is lower than the standard.
The R language does not provide the ability
to access the data at the relative position. Therefore, the method of “move the
data in the relative position to the current position” is adopted. Though the
result is still the same, the style of writing is not intuitive enough, and it
requires the analyst a relatively higher ability in logic thinking.
The writing style of esProc is select(addup<=B3 || (addup>B3
&&addup[-1]<B3)). Simple and clear indeed! This is the
expression of relative position featured by esProc. Users can use the method of
[-1] to represent the record in a position one record before or several records
after the current record. For example, the aggregation value calculation in A4
can also be rewritten to A3.derive(addup[-1]+salesSum:addup).
Unlike the fixed algorithm of aggregate
value, the algorithm of this step is relatively much freer. You may find that
the style of expression regarding the relative position of esProc is very agile
with great advantages.
Compared with the fixed algorithms, this
step of algorithm is much freer.
As we can see from the above case, the
computations of relative position and inter-row computations can solve many problems
which are apparently complex. esProc is more flexible in expressing the
relative positions. Therefore, esProc users can feel more relax when
calculating the complex problems.
About esProc: http://www.raqsoft.com/product-esproc
No comments:
Post a Comment