**analytical computation**based on sales database is necessary for every marketer. How can marketers do analytics easily by themselves? For those who have IT background, this is off course not a problem. But for marketers who have not enough technical background, such as SQL skills, modeling or software experiences, the sales data analytics can of much a problem. For example, to find out top 3 salesmen by monthly sales.

Both R language and esProc have the outstanding ability to perform the step by step computations. And they are great tools for statistical computing. However, in the particulars they differ from each other. To demonstrate the difference, a case for statistical computing is designed as follows:

A company's Sales department wants to select out the outstanding salespersons through

**statistics**, that is, the salesmen whose sales amounts are always among the top 3 in each month from the January this year to the previous year. The data is mainly from the order table of database: salesOrder, and the main fields include the ID of order:

**ordered**, name of salesperson:

**name**, sales amount:

**sales**, and date of order:

**salesDate**.

**The solution is like this substantially**:

Compute the beginning dates of this year and this month, and filter the data by date.

Group by month and salesperson, and compute the sales amount of each salesperson in each month.

Group by month, and compute the rankings of sales amount in each group.

Filter out the top 3 salespersons from each group.

Compute the set of intersections of each group, that is, salesmen always among the top 3 in each month.

**:**

The solution of R language is shown as below

The solution of R language is shown as below

01 library(RODBC

)

02 odbcDataSources()

03 conn<-odbcConnect("sqlsvr")

04 originalData<-sqlQuery(conn,'select * from salesOrder')

05 odbcClose(conn

)

06 starTime<-as.POSIXlt(paste(format(Sys.Date(),'%Y'),'-01-01',sep=''))

07 endTime<-as.POSIXlt(paste(format(Sys.Date(),'%Y'),format(Sys.Date(),'%m'),'01',sep='-'))

08 fTimeData<-subset(originalData,salesDate>=starTime & salesDate

09 gNameMonth<-aggregate(fTimeData$sales,list(fTimeData$name,format(fTimeData$salesDate,'%m')),sum)

10 names(gNameMonth)<-c('name','month','monthSales')

11 gNameMonth$rank <- do.call(c, tapply(gNameMonth$monthSales, gNameMonth$month,function(x) rank(-x)) )

12 rData<-subset(gNameMonth,rank<=3)

13 nameList <- split(rData$name, rData$month)

14 Reduce(intersect, nameList)

**:**

The solution of esProc is shown as below

The solution of esProc is shown as below

Then, let's compare the two solutions by checking the database access:

Firstly, R solution implements the data access from Line01 to 05 through relatively a few more steps, and this is acceptable considering it as the normal operations.

esProc solution allows for directly inputting SQL statements in the cell A1, which is quite convenient.

In field of database access, R language and esProc differ to each other slightly. Both solutions are convenient.

Secondly, compare the time function:

R solution computes the beginning dates of this year and this month through line 06-07. Judging from this point, R is abundant in the basic functions.

esProc solution completes the same computation in A2 and B2, in which pdate function can be used to compute the beginning date of this month directly, which is very convenient.

In field of date function, it seems that esProc is slightly better, while R has a huge amount of 3rd-party-function library, and maybe there is any date function that is easier to use. Therefore, on the whole, esProc plays R to a draw in this respect.

The focal point is step by step computation:

Firstly, filter by date, group by month and sales person and then summarize by sales amount. The above functionalities are implemented respectively in line 8-9 for R and cell A3-A4 for esProc. The difference is not great.

Proceed with the computation. According to the a bit straightforward thought of analysis, the steps followed should be: 1 Group by month; 2 Add the field of ranking in the group, and compute the rankings; 3 Filter by ranking, and only keep the salespersons that achieved the sales amounts ranking the top 3 in each group; 4. Finally, compute the set of intersection on the basis of the data in each group.

The corresponding codes of R are from line 10 – 14 in the order of 2->3->1->4. In order words, rank the data in each group throughout the whole table, and then group. Have you noticed anything awkward? Although it is the ranking within the group, R users have to sort first and then group! This is because R is weak in the ability to group first and then process. To barely compose the statements following the train of thought of 1->2->3->4, R users must have a strong technical background to handle the complex iteration statement expressions. The style of reverse thinking on this condition will greatly simplify the codes.

esProc solution completes the similar computation in the cell A5 – A8, not requiring any reverse thought. esProc users can simply follow their intuitive thinking of 1->2->3->4. This is because that esProc provides the ingenious representing style of ~. The symbol of ~ represents the current member that takes part in the computation. For this case, the ~ is each 2-dimension table in the group (corresponds to the data.frame of R or the resutSet of SQL). In this way, ~.monthSales can be used to represent a certain column of the current 2-dimension table. By compassion, R users can only resort to some rather complicated means like loops to access the current member, which is more troublesome for sure.

With regard to this comparison, esProc is more intuitive with relatively more advantages.

To this point, I guess that IT pros like R while the business people would prefer esProc.

Next, let's study on their abilities in computing the intermediate results.

R allows users to view the result of each step by clicking the variable name at any time, with RStudio and other tools.

esProc provides only one official tool, that is, click the cell to view the result of this step.

Regarding this ability, esProc does not differ from R much. Considering that R supports for a great many of 3rd party tools, maybe there is any tool capable of providing the better observed results. R has the potential advantages to some extent in this respect.

Then, let compare their abilities to reference the result.

R users are only required to define a variable for the result of computation in each step to conveniently reference the result in the steps followed with regard to the R solution.

esProc users can also define variables to reference, and according to my findings, using the cell name as the variable name is more convenient and saves the trouble of finding a meaningful name.

In this field, their abilities are at almost the same level.

Next, let's compare their performances on set of intersection.

In the last step, the intersection set of data of every group are to be computed. R provides the intersect function at the bottom layer, using together with Reduce function, the intersection set of multiple groups of data can be computed.

esProc provides isect function to compute the set of intersection on multiple sets, which is quite convenient.

Apparently, esProc and R solutions draw a tie in the computation of set, but R provides the Reduce function of greater imaginary space that is a plus point to the advantages of R.

As it can be seen from the above case, R boasts the abundant fundamental functions and a huge amount of library functions from the 3rd party.

In field of data member access, esProc provides the excellent representing style, in particular the grouping at multi-levels. By comparison, R relies more on the loop statements.

Both esProc and R solutions have excellent performances in respect of interaction.

Related News from Raqsoft:

2012 End of the World: Is This Prediction Based on Correct Analysis?

Various Data Environments Support of esProc Makes Statistical Computing more Flexible

## No comments:

## Post a Comment