esProc, A Script Language for Data Analytics with Parallel Mechanism: Syntax Agility Comparison: R Language vs. esProc

By definition the true agile syntax only requires users to memorize a small number of basic functions to implement a great many of advanced functions through simple processing on the basic functions. The said simple processing is a programming style of lightweight effort that is far easier and simpler to grasp than the common programming. As the advantage of agile syntax, the number of basic functions will be reduced to alleviate the learning effort and cost of users, providing users with the simple-and-easy syntax to implement the more advanced functions.

People need to use statistical tools for analytics computing and statistical computing. No matter business experts or the technicist, they both needs a application with agile Syntax. Both R language and esProc are good at agile syntax. Their differences can be illustrated with the examples below:

Take computing the quadratic sum of various vector members for example. Although both R and esProc provide some functions to compute the quadratic sum, we are not going to use the existing functions in this example. Instead, the most fundamental functions will be adopted to implement this function.

R solution:

　　01 A4<-c data-blogger-escaped-br="br">
　　02 Reduce(function(x,y) x+y*y, c(0,A4))

esProc solution:

　　A1 =[1,2,3]

　　A2 =A1.loop@s( 0; ~~+~*~ )

Comments:

In the example codes of R, the function Reduce is a quite useful function to implement a great many of advanced functions. In addition, the lambda syntax is also adopted for R users to construct the simple functions easily with concise codes. In addition, to facilitate the computation, one extra 0 is required to fill into the original vector.

esProc users can use the basic looping function loop to implement the similar function, not requiring the lambda syntax (although it is supported), and not having to add extra 0 (although it still requires a 0 as initial). Comparatively, the esProc code is much easier to understand. In addition, esProc can simply use ~ to represent the "member in computation" directly, not requiring an additional function to represent it. While the R users cannot represent it directly, they will have to construct a function and use xy to represent the "member in computation". It is obviously very inconvenient for R users. Let's take the moving averages computation below for example:

Based on the Orders table from Northwind, compute the moving averages of freightage in three days.

R solution:

　　 filter(result$Freight/3, rep(1, 3),sides = 1)

esProc solution:

=A1.( ~{-1,1}.( Freight).avg())

Comments:

Because R cannot represent the "member in computation", it is hard to compute the moving average with the basic functions. To reduce the difficulty, we will have to establish a customized advanced function filter at the cost of spending extra time of users to study the new function.

As for esProc users, to make up the advanced function to compute the moving averages, they only need the basic moving average function avg (of course the similar functions are also available in R) plus the featured member representations.

In this case, the {-1,1} is to represent the relative position in esProc. The -1 indicates the previous member, and the 1 indicates the next member. The {-1,1} indicates the range of members: there are 3 in total. Representing the relative positions in this way is easy to understand and use. Comparatively, the R users can represent the absolute position directly but hard to represent the relative position. This is not a minor obstacle. Let's check the example below.

It is an example of period-on-period comparison from another essay I composed, Computation after Grouping - R Language vs. esProc. To demonstrate it in a more simplified way, let's suppose the data is sorted by time. In this case, only compute the freightage compared with the previous period.

esProc solution:

　　=A1.((Freight-Freight[-1])/Freight[-1])

R solution:

　　c(0,result$Freight[-1]-result$Freight[-length(result$Freight)])/result$Freight[-length(result$Freight)])

Comments: esProc solution is quite intuitive and straightforward, that is, "(current-previous)/previous", in which the [-1] indicates the previous one.

R users can use the [-1] but to represent the "remains after removing the first record in the vector", instead to represent the "previous". Because no representation of relative position is available, the row-moving method is adopted, that is, to construct a new column and then move the data of each row downward by one row. R users will need to convert the computation between rows to that between columns. As can be seen, although the result is correct, the algorithm is quite perplexing to those who are not expert at R.

These examples prove that the "current member" and "relative position" of esProc is very characteristic and impressive to simplify the data computation greatly. Thus users only need to memorize few basic functions to implement a great many of advanced functions, saving lots of effort to learn. This is good news for lazy bone like me. Undoubtedly, there are quite a lot of similar syntax, such as the function options and cascade parameters.

The function option is a way to use the symbol to expand the common functions/features of functions, such as to change the type of return value and to determine whether to sort the results. For example, the group is a function for grouping, group@o indicates not to sort out the result, group@z indicates to sort out in reverse order, and group@1 indicates that only the first record of each group will be retrieved. Then, the result of this type is not a set but a TSeq (corresponds to the list and dataframe of R). The common functions/features refer to those can be implemented through other functions by adding @o@z@1, for example, the filter function select.

R uses various function names and parameters to expand the functions/features of functions, such as tapply, sapply, lapply, by and other functions. Although they are in effect just the extension to the cycling and computing function apply, you will have to grasp 5 functions at least before you can declare that the cyclic computation is grasped. To this point, it is self evident which way requires less functions to memorize. You may already have an idea of which solution demands less to learn.

As for the cascade parameters, this document will not discuss it in details. The interested readers can research it for themselves.

Finally, R excels in the matrix computation with a great many of fixed analysis algorithms. Plus, the customized library functions are also excellent. I am fascinated with these advantages. However, I must admit that esProc beat R for the basic characteristics of being easy-to-learn/use as well as the syntax agility among the statistical computing tools.

Author: Jim King

BI technology consultant for Raqsoft

10 + years of experience on BI/OLAP application, statistical computing and analytics

Email: Contact@raqsoft.com

Website: www.raqsoft.com

Blog: datakeyword.blogspot.com/

1 comment:

MichaelNovember 1, 2012 at 10:17 PM
I like "the {-1,1} is to represent the relative position in esProc. The -1 indicates the previous member, and the 1 indicates the next member. The {-1,1} indicates the range of members: there are 3 in total. Representing the relative positions in this way is easy to understand and use.”

It's really quite easier to understand and operate, good invention!

menu

November 1, 2012

Syntax Agility Comparison: R Language vs. esProc

1 comment: