esProc, A Script Language for Data Analytics with Parallel Mechanism: Vector Computing Comparison: R Language vs. esProc

One of the most attractive features of R language and esProc is that their codes are both of the agile style, only requiring few lines of codes to implement plentiful functions. For example, both of them allow for composing Vector Computing expression, simplify the judgment statements, extend the basic functions to the advanced ones, and support the generic type. In which, regarding the vector computing, they are characterized with the massive data processing through functions and operators, so as to avoid the loop statement. Users can benefit from 2 resulting advantages: first, easy to grasp for developers and keep the learning cost low; second, easy to implement the computation and improve the performance.

Let’s compare the subtle differences between R and esProc on vector computing with several examples below.

Firstly, let’s check the most basic functions like vector value getting and assigning. For example, get 5 values of vectors whose subscripts are from 5 to 10, and replace them with another 5 values.

01 A1<-c(51,52,53,54,55,56,57,58,59,60)

02 A2<-A1[6:10]

03 A1[6:10]<-seq(1,5)

esProc:

A1 =[51,52,53,54,55,56,57,58,59,60]

A2 =A1(to(6,10))

A3 >A1(to(6,10))=to(1,5)

Comments: Both of them enable users to get and assign values easily with almost the same usage. However, it seems that the “:” of R language to represent the interval ranges is very intuitive.

Then, let’s compare them on the arithmetical operations of vector.

04 A4<-c(1,2,3)

05 A5<-c(2,4,6)

06 A4*A5 # multiplying the vector, and the result is: [1] 2 8 18

07 A4+2 #adding the vector to the constant, and the result is: [1] 345

08 ifelse(A4>1,A4+2,A4-2)

#conditional evaluate, and the result is: [1] -1 45

09 sum(A4) #aggregate, sum up the vector member, and the result is:6

10 sort(A4,decreasing = TRUE) #sort reversely, and the result is: 3 2 1

esProc:

A4 =[1,2,3]

A5 =[2,4,6]

A6 =A4**A5 ‘multiplying the vector, and the result is: 2 4 18

A7 =A4.(~+2)

‘adding the vector to the constant, and the result is:3 4 5

A8 =A4.(if(~>1,~+2,~-2))

‘conditional evaluate, and the result is:-1 4 5

A9 =A4.sum()

‘aggregating, vector member sum up, and the result is:6

A10 =A4.sort(~:-1) ‘reverse sorting, and the result is:3 2 1

Comments: As can be seen from the above, no matter the four arithmetic operations, aggregating, or sorting operations of vector, both R and esProc can implement it well, and their syntaxes are very close. One thing worthy of notice is that the code of esProc looks more “object-oriented”, while R is truly“object-oriented” judging from the bottom layer. The former is more suitable for direct use in common business sector, and the latter is more suitable for programmers to compile the extended package by themselves and more acceptable to those from the scientific expertise sector.

Let us check the vector computing on the structured data, such as computations based on the Orders table from the Northwind database:

1.Query the data with freightage from 200 to 300.

2.Query the order dated 1997.

3.Compute the intersection set of above-mentioned sets, i.e. data not only with freightage from 200 to 300 but also with orders placed in 1997.

4.Group the result from the previous step by EmployeeID, and average the freightage for each employee.

02 A2<-result[result$Freight>=200 & result$Freight<=300,]

03 A3<- result[format(result$OrderDate,'%Y')=="1997",]

04 A4<-result[result$Freight>=200 & result$Freight<=300 & format(result$OrderDate,'%Y')=="1997",]

05 A5<-tapply(A2$Freight,INDEX=A2$EmployeeID,FUN=mean)

esProc :

A2 =A1.select(Freight>=200 && Freight<=300 && year(OrderDate)==1997)

A3 =A1.select(year(OrderDate)==1997)

A4 =A3^A4

A5 =A4.group(EmployeeID;~.avg(Freight))

Comments: R is good at querying and makes statistics in groups. However, as for the set operations, R is worse than esProc. In the above example of R, the result is obtained by an indirect means of query instead of any set operations.

R can only perform the set operations on simple vectors, for example, intersect(A2$Orderid,A3$Orderid), and cannot directly implement the set operation on the structured data like data.frame.

Of course, this is not to say that the R is not powerful in vector computing. In effect, R is easier to use than esProc in the aspect of matrix-related computation. For example, to seek the eigenvalue of matrix A, R users can simply use eigen(A), while esProc users are not provided with any functions for them to represent it directly. Judging from this aspect, it proves that esProc is more suitable for business computing, while R is better in handling the scientific computation.

In conclusion, considering the vector computing, both R and esProc demonstrate perfect performance in the basic computing. More specifically speaking, R is second to none in matrix computation, and esProc is superior R in handling the structured data.

menu

July 15, 2014

Vector Computing Comparison: R Language vs. esProc

No comments:

Post a Comment