November 28, 2012

R Project and esProc: Save Time & Money for Debugging in Business

As is well known, in the development process of a program, the time consumed to remove and correct the error is usually greater than time spent on coding. Therefore, a friendly debug environment can save a lot of time. In this respect, VB.NET and SQL are two extremes, that the former provides almost a perfect Debug environment, while the latter nearly provides no error debugging tool.

As powerful statistical computing tools, R and esProc support debug to save time and efforts for debugging, and brings great convenience for business experts or analysts.

R language and esProc as two development tools for computation and analytics are both capable to debug to some degree. We will study on their differences in this respect.

Let's kick off by making ourselves familiar with the debugging environments of both R (take RStudio for example) and esProc from their respective interfaces:

RStudio Debugging Environment



esProc's Debugging Environment



Let's compare the basic functions.

Break point: For R, the break point is set by inserting brower() into the codes. Users have to remove these statements manually once debugged, which brings us back to the cherished old times of using BASIC to code when Windows was not invented, impressed us with a strong feeling of reminiscence. In those days, removing the stop breakpoint statement is even an important job before releasing. By comparison, the breakpoint style of esProc is similar to that of VB.NET and other alike modern programming languages. By clicking the button or pressing shortcut keys, the breakpoint can be set to the cell in which the mouse cursor is located. This is a common style nowadays with no challenge or interests to anyone.

Debug command: with the same style of break point, debug command of R is input from the console, including c to resume running, n to go run the next statement, and Q to exit the debug mode. In addition, there are also functions like trace /setBreakpoint/debug/undebug/stop. It is important to note that it would be best not to have any variable named after c, n, and Q in the codes. Otherwise, accidental conflicts will occur.

Regarding the procedure control, esProc is no different than VB.net and alike programming languages for just requires click(s) on button or shortcut keys to implement, not requiring users to memorize any command, as we all know.

Variable watch: The variable watch window of R is on the right, in which all current variables will be listed. On clicking it, a new window will prompt to display the value of this variable. Alternatively, R users can also enter the fix(variable name) at the command line window as shown below. In the right bottom corner of esProc user interface, there is a similar variable list. Seldom do esProc users use this list because esProc does not require users to specially define the variable name. The name of cell is taken as the variable name by default, and thus users can simply click the cell to view the variable value.

One thing to notice is that R is friendly to display the variables of Frame format. However, it is comparatively not so friendly to support the irregularly-structured variables that we can say it is unreadable at all, as the below typical List for example:




esProc does a much better job in this respect. For the same data, in esProc, it is represented by drilling through the hyperlinks:





Then, let's compare some more advanced functions, and start from checking the Immediate Running first.

As for esProc (download), a cell will be calculated immediately and automatically once codes are entered into this cell. Therefore, the developers can view the execution result immediately and adjust the code for re-run on conditions. This style can speed up the development speed and lower the probability of errors, allowing the green hand to become familiar with it quickly. RStudio provides the similar means that more resembles the "immediate window" of VB, that is, users type in codes at a command line window and run immediately. If it is run correctly, then copy the codes to the formal code section. Judging on the whole, R is less convenient than esProc in this respect.

Finally, let us discuss the function to debug the functions separately.

R users can use the debug(Function Name) to debug the functions separately and directly so as to modularize in development and implement the large-scale test. esProc users, on the contrary, are not allowed to debug the function separately, which is a pity more or less. However, the debug function of R has not implemented the true "separate" test. Its working principle is actually to add a browser() command prior to the function to be debugged, still requiring running all codes before entering the function to debug.

From another perspective, such computational analysis software is rarely used for the large-scale development and test. There is not much significance and value for its ability to debugging function separately.

Through the above comparison, we can see that both R and esProc provide the powerful enough debugging function. In which, R is better in debugging the function separately, and esProc is more convenient and easier to use.




Related News about Raqsoft:

Creative Spreadsheet Software for Data Processing in Development

Business Intelligence Suppliers: Are You Ready for 2013?

Made-in-China IT Products Emerge with Outstanding Capability

 

November 23, 2012

Vector Computing, who is more powerful, R language or esProc

Do you find Vector Computing tiresome while using statistical computing tools? Here we go for a Vector Computing Comparison: R Language vs. esProc. To me, one of the most attractive features of R language and esProc is that their codes are both agile, that is, only requiring a few lines of codes to implement plentiful functions. For example, both of them allow for composing Vector Computing expression, simplify the judgment statements, extend the basic functions to the advanced ones, and support the generic type. In which, regarding the vector computing, they are characterized with the massive data processing through functions and operators, so as to avoid the loop statement. Users can benefit from 2 resulting advantages: first, easy to grasp for business experts and keep the learning cost low; second, easy to implement the parallel computation and improve the performance.

In order to show users the subtle differences between R and esProc on vector computing, we will go on with several examples below.

Firstly, let's check the most basic functions like vector value getting and assigning. For example, get 5 values of vectors whose subscripts are from 5 to 10, and replace them with another 5 values.

R solution:
01    A1<-c(51,52,53,54,55,56,57,58,59,60)
02    A2<-A1[6:10]
03    A1[6:10]<-seq(1,5)

esProc solution:
A1    =[51,52,53,54,55,56,57,58,59,60]
A2    =A1(to(6,10))
A3    >A1(to(6,10))=to(1,5)

Comments: Both of them enable users to get and assign values easily with almost the same usage. However, subjectively, I prefer using the ":" of R language to represent the interval ranges. It looks more intuitive and agile.

Then, let's compare them on the arithmetical operations of vector.

R solution:
04    A4<-c(1,2,3)
05    A5<-c(2,4,6)
06    A4*A5 # multiplying the vector, and the result is: [1] 2 8 18
07    A4+2    #adding the vector to the constant, and the result is: [1] 3 4 5
08    ifelse(A4>1,A4+2,A4-2) #conditional evaluate, and the result is: [1] -1 4 5
09    sum(A4)    #aggregate, sum up the vector member, and the result is:6
10    sort(A4,decreasing = TRUE)    #sort reversely, and the result is: 3 2 1

esProc solution:
A4    =[1,2,3]
A5    =[2,4,6]
A6    =A4**A5    'multiplying the vector, and the result is: 2 4 18
A7    =A4.(~+2)    'adding the vector to the constant, and the result is:3 4 5
A8    =A4.(if(~>1,~+2,~-2))    'conditional evaluate, and the result is:-1 4 5
A9    =A4.sum()    'aggregating, vector member sum up, and the result is:6
A10    =A4.sort(~:-1)    'reverse sorting, and the result is:3 2 1

Comments: As can be seen from the above, no matter the four arithmetic operations, aggregating, or sorting operations of vector, both R and esProc can implement it well, and their syntaxes are very close. One thing worthy of notice is that the code of esProc looks more "object-oriented", while R is truly "object-oriented" judging from the bottom layer. The former is more suitable for direct use by business experts by themselves and popular with those from the common business sector, and the latter is more suitable for programmers to compile the extended package by themselves and more acceptable to those from the scientific expertise sector.

Let us check the vector computing on the structured data, such as computations based on the Orders table from the Northwind database:
Query the data with freightage from 200 to 300.
Query the order dated 1997.
Compute the intersection set of above-mentioned sets, i.e. data not only with freightage from 200 to 300 but also with orders placed in 1997.
Group the result from the previous step by EmployeeID, and average the freightage for each employee.

R solution:
02    A2<-result[result$Freight>=200 & result$Freight<=300,]
03    A3<- result[format(result$OrderDate,'%Y')=="1997",]
04    A4<-result[result$Freight>=200 & result$Freight<=300 & format(result$OrderDate,'%Y')=="1997",]
05    A5<-tapply(A2$Freight,INDEX=A2$EmployeeID,FUN=mean)

esProc solution:
A2    =A1.select(Freight>=200 && Freight<=300 && year(OrderDate)==1997)
A3    =A1.select(year(OrderDate)==1997)
A4    =A3^A4
A5    =A4.group(EmployeeID;~.avg(Freight))

Comments: R is good at querying and make statistics in groups. However, as for the set operations, R is worse than esProc. In the above example of R, the result is obtained by an indirect means of query instead of any set operations.

R can only perform the set operations on simple vectors, for example, intersect(A2$Orderid,A3$Orderid), and cannot directly implement the set operation on the structured data like data.frame.

Of course, this is not to say that the R is not powerful in vector computing. In effect, R is easier to use than esProc in the aspect of matrix-related computation. For example, to seek the eigenvalue of matrix A, R users can simply use eigen(A), while esProc users are not provided with any functions for them to represent it directly. Judging from this aspect, it proves that esProc is more suitable for business computing, while R is better in handling the scientific computation.

In conclusion, considering the vector computing, both R and esProc demonstrate perfect performance in the basic computing. More specifically speaking, R is second to none in matrix computation, and esProc (download) beats R in handling the structured data.

More news from Raqsoft:

Made-in-China IT Products Emerge with Outstanding Capability

Raqsoft Organizes Training to Better Serve Customers

Business Intelligence Suppliers: Are You Ready for 2013?



November 20, 2012

Simple Interrow Computation: esProc Keeps It Simple!

Original posted on: http://ezinearticles.com/?Simple-Interrow-Computation:-Keep-It-Simple!&id=7379887


The interrow computation is quite common, such as the aggregate, comparison with same period of any previous year, and link relative ratio in business statistics and analytics. Both R language and esProc provide a pretty good interrow computation ability with slight difference to each other. In the case below, the utilization of some basic inter-row computations is demonstrated to expound the differences between the two methods:


Since chrismas is coming, and the sales department of a company wants to make statistics on the outstanding sales persons after the chrismas big promotion, for example, the salesman who achieves half of the total sales amount of the company. The data are mainly from the order of table database: salesOrder. The main fields include the ID of order: ordered, Name of sales person: name, Sales amount: sales, and date of order: salesDate.


The straightforward solution is shown as below:

1. Group by sales person to calculate the sales amount of each sales person.
2. Sort by sales amount in reverse order on the basis of the data from the previous step.
3. According to the previous step, calculate the aggregate value of each record, and calculate the standard of comparison: the half of total sales of this company.
4. Of the aggregate values calculated in the previous step, select out the list of sales persons whose sales achievement meeting the below conditions: lower or equal to the standard of comparison; or although higher than the standard of comparison, the sales achievement of previous sales person is lower than the standard of comparison.


The detailed solution of R is shown as below:


01 library(RODBC)
02 odbcDataSources()
03 conn<-odbcConnect("sqlsvr")
04 originalData<-sqlQuery(conn,'select * from salesOrder')
05 odbcClose(conn)
06 nameSum<-gNameMonth<-aggregate(originalData$sales,list(originalData$name),sum)
07 names(nameSum)<-c('name','salesSum')
08 orderData<-nameSum[rev(order(nameSum$salesSum)),]
09 halfSum<-sum(orderData$salesSum)/2
10 orderData$addup<-cumsum(orderData$salesSum)
11 subset(orderData,addup<=halfSum | (addup>halfSum & c( 0, addup[- length (addup)]) <halfSum))


Please find the detailed solution of esProc (download) below:



Then, let us study on the differences between aggregate values:


The R uses cumsum to calculate the aggregate value in the line 10.
esProc uses cumulate in A4 to calculate the aggregate value.


Both writing styles are very convenient for users. However, the operation principle of esProc is aimed to each record: firstly, calculate the cumulate, then, get the aggregate value corresponding to this record according to the # row number. By comparison, R enjoys a higher efficiency than esProc on this respect since the calculation will be executed only once.


Dividing one statement of esProc into two statements can solve the efficiency issue, that is, firstly, calculate the list of aggregate value separately, and then insert it to the original dataset. However, such writing style is not as concise as the R that only requires one line of code.


Then, let us check the qualified sales person and the differences:
R completes the computation at the Line 11, mainly by moving the line, and using c( 0, addup[- length (addup)]) to construct a column for the new data. Compared with the column addup, the new column just moves down one column, and the last entry of data is removed and filled with 0 of the first entry. Then, you can compare whether the aggregate value is lower than the standard of comparison, or although it is higher than the standard of comparison, its previous record is lower than the standard.


R does not provide the ability to access the data at the relative position. Therefore, the method of “move the data in the relative position to the current position” is adopted. Though the result is still the same, the style of writing is not intuitive enough, and it requires the analyst a relatively higher ability in logic thinking.


The writing style of esProc is select(addup<=B3 || (addup>B3 && addup[-1]. Excellent indeed! This is the expression of relative position featured by esProc. Users can use the method of [-1] to represent the record in a position one record before or several records after the current record. For example, the aggregation value calculation in A4 can also be rewritten to A3.derive(addup[-1]+salesSum:addup).

Unlike the fixed algorithm of aggregate value, the algorithm of this step is relatively much freer. You may find that the style of expression regarding the relative position of esProc is very agile with great advantages.


Compared with the fixed algorithms, this step of algorithm is much freer.


As we can see from the above case, the computations of relative position and interrow computations can solve many problems which are apparently complex. esProc is more flexible in expressing the relative positions. Therefore, esProc users can feel more relax when calculating the complex problems.


Regarding the R language, appending to the whole column/row and the fixed algorithm are relative more concise and very impressive.


For industries like marketing and sales, healthcare and pharmaceutical, educational, financial and telecommunication, statisticsl computing tools like R or esProc are usually helpful to ease working strength and improve efficiency.


Author: Jim King
BI technology consultant for Raqsoft
10 + years of experience on BI/OLAP application, statistical computing and analytics
Email: Contact@raqsoft.com
Website: www.raqsoft.com

November 15, 2012

Grouping Function Comparison: R Language vs. esProc


Recently, I participated in the analysis for several marketing questionnaires in a row. Various typical grouping issues are encountered during the process. To share with the interested fellow in the same business, I classified and summarized these issues.

Grouping is to allocate the samples into several groups according to a specific flag. There is a difference between groups and the relative commonness shared by group members. The grouping plays an important role in statistical analytics. For example, the type grouping is used to differentiate the types of economy, society, sciences, and other phenomena. The structural grouping is used to study the internal structure, and the analysis group is used to analyze the coexistence relation between data.

As the mainstream structured data analysis language, both R language and esProc (download) provide the rich functions of grouping. Let's use some examples to have an idea of their difference. In these cases, we will use the Orders table from Northwind database as the sample data.

Basic grouping: group by a certain column, for example, view data by employee.

R: orderByEmp<-split(result,result$EmployeeID)
esProc: A2=A1.group(EmployeeID)

Regarding the basic functions, both R and esProc implement it well. In addition, users can expand the basic grouping functions, such as, group by multi-columns, group and summarize concurrently, first group and then summarize, and continuous grouping at one level after another, and the interrow computation on data to be grouped. For details on the expanded functions/features of the basic groupings, the interested readers can refer to another essay Computation after Grouping - R language vs. esProc.

The basic grouping, in effect, can be characterized as follows: the original member will always be assigned to a certain group, and no duplicate member is allowed. This is the completely-partitioned grouping that is provided by the relation algebra (i.e. SQL). In some cases, the conditions are even more complicated. For example, the Marketing department sent a list of advertisements & regions (AdCountry by name). These regions are the location where the advertisement campaigns are intensively launched. Currently, we need to analyze the order conditions in these regions. Such conditions are characterized as follows:

The advertisement list has definitely less countries than those in the Orders table because it is the advertisement for partial countries.
The advertisement list may comprise more countries than those in the Orders table because it is quite normal for some countries having no orders.

Such type of grouping can be referred to as "incompletely-partitioned grouping". This is not supported by SQL and hard to implement. Let's check whether R and esProc can over-perform SQL in this respect:

R solution:
AdCountry<-c("USA","Finland","Canada","NotInOrders")
AdCountryFac<-factor(result$ShipCountry,levels=AdCountry)
groupByAd<-split(result,AdCountryFac)

esProc solution:
A3=["USA","Finland","Canada","NotInOrders"]
A4=A1.align(A3,ShipCountry)

Comments:
Both R and esProc can solve this problem well. The data is grouped into 4 groups. There are some data in the first 3 groups, and the last group is empty, as expected.

Let us then check the grouping on simple conditions: classify the freightage into 3 categories of 0-30, 30-100, and 100.

R solution:
Frekind<-cut(result$Freight,breaks=c(0,30,100,Inf))
orderByFei<-split(result,Frekind)


esProc solution:
A5=["?<=30", "?<=100 && ?>=30" ,"?>100"]
A6=A1.enum(A5,Freight)

Comments: Both solutions solve the problem perfectly. However, you may have noticed that the representation of esProc is much more flexible. For example, esProc users can carry out the boolean judgment on character string or data of other types. They can also compute the boolean expression on 2 fields concurrently. By comparison, it is much more complicated for R to implement the similar functions. Since R users can only perform grouping on one field of numeric type to group it into a non-overlapped range/category. The limitations are really not just a few.

Then, let us check a case of much more complicated example of grouping on conditions. For example, the freightage belongs to these 3 categories: the $5-$15 is the freightage range that is most easily to be accepted by users, the low freightage range is for those below the $50, and the high freightage range is for those higher than 50. In this case, there is some overlapped area of category 1 and category 2. Then, the record with less than $10 freightage must exist in both these 2 groups.

R solution:
subset(result,Freight>=5 & Freight<=15)->g5to15
subset(result,Freight<=50 )->g0to50
subset(result,Freight>50 )->g50toinf
gFeight<-list(g5to15=g5to15,g0to50=g0to50,g50toinf=g50toinf)

esProc solution:
A5=["?<=5 &&?>=15", "?<=50 " ,"?>50"]
A6=A1.enum@r(A5,Freight)

Comments: R does not provide the function/feature to implement the grouping on complicated conditions. In fact, such grouping is made out awkwardly. So, it by no means resembles the "grouping action". esProc solution is the same to that for the previous example. In which, @r is used to indicate that the duplicates are allowed in the groups. Such syntax style is flexible, and you can implement lots of functions or features on the basis of the limited number of functions, not having to name any new functions. I am considering the necessity to discuss the related topic in my next essay.

As can be seen from several examples above, R can be used to implement the advanced grouping function. R is much more powerful than SQL in this respect. However, it is still less superior and easy-to-use than esProc regarding the flexibility and usability.


Computation after Grouping: R Language vs. esProc

Original post:  http://it.toolbox.com/blogs/data-analytics/computation-after-grouping-r-language-vs-esproc-53756

For SQL, the grouping and summarizing actions are inseparable and must be performed at the same time, which compromises its ability to analyze interactively. By comparison, for R language and esProc, the users can group first and then decide whether to summarize or carry out any more complex computation. For example, without summarizing, R or esProc users can perform the inter-row computation within the group. They can select one of the groups to regroup after studying the values summarized.


Some of you smart readers may have discovered that the latter one is just the OLAP drilling indeed. Well, we will discuss it in details in another essay. Now, let's focus on the subject of this essay and check the respective characteristics of R
and esProc.


Please find the Orders table from the typical Northwind database, as given in the blow example.


Example 1: Group by year without summarizing.


R solution: orderByYear<-split data-blogger-escaped-br="br" data-blogger-escaped-format="format" data-blogger-escaped-rderdate="rderdate" data-blogger-escaped-result="result">
esProc solution: A2=A1.group(year(OrderDate))


Comments: Regarding the basic computation functionalities performed in steps, both R and esProc are perfect in achieving the goal.


Example 2: On the basis of example 1, summarize the freightage of the data from each group by totaling up.

R solution: sumByYear<-mapply data-blogger-escaped-br="br" data-blogger-escaped-function="function" data-blogger-escaped-orderbyyear="orderbyyear" data-blogger-escaped-reight="reight" data-blogger-escaped-sum="sum" data-blogger-escaped-x="x">
esProc solution: A3=A2.(~.sum(Freight))


Comments: Both solutions are perfect.

Example 3: On the basis of example 1, regroup the data by month this time, and then sum up the freight.

R solution:

orderBymonth<-orderbyyear data-blogger-escaped-br="br">
for(i in seq(orderByYear)){

orderBymonth[[i]]<-aggregate data-blogger-escaped-br="br" data-blogger-escaped-format="format" data-blogger-escaped-i="i" data-blogger-escaped-list="list" data-blogger-escaped-m="m" data-blogger-escaped-orderbyyear="orderbyyear" data-blogger-escaped-rderdate="rderdate" data-blogger-escaped-reight="reight" data-blogger-escaped-sum="sum">
}

esProc solution: A4=A2.(~.group(month(OrderDate);~.sum(Freight)))


Comments: As can be seen from the above, it seems that the R solution is fairly complicated. However, it follows the same procedure of that of esProc actually. They all aim to the data of each year (the orderByYear[[i]] for R, and the ~ for esProc,) for grouping operation (the aggregate for R, and the group for esProc). The difference lies in that esProc use ".()" to represent the looping of array. While in R, the looping is represented with for/while/loop. One point worthy of notice is that there are several representations of various usages are available for R to represent the grouping action, which increases the learning difficulties quite a bit. Even the SQL of 10 years ago can beat R in this aspect. We will discuss the flexibility of syntax in a specific essay. Let's proceed with the topic to see the examples below.


Example 4: From the data in the Example 3, we can see that only the statistics of year 1997 is complete with the data of all 12 months. Let's compute the month-on-month value in the year 1997. In which, the data of year 1997 is represented as orderBymonth$"1998" in R, and it is represented as A(2) in esProc.


R solution: orderBymonth$"1997"$lrr<- data-blogger-escaped--="-" data-blogger-escaped-br="br" data-blogger-escaped-c="c" data-blogger-escaped-length="length" data-blogger-escaped-orderbymonth="orderbymonth" data-blogger-escaped-x="x">
esProc solution: A5=A4(2).derive((#1-#1[-1])/#1[-1])


Comments: In this case, the two solutions differ totally. Let's talk about esProc first, #1 represents the first field, that is, the summarized value of Freight. It corresponds to the orderBymonth$"1997"$x in R, and #[-1] represents the previous piece of data in this loop. Therefore, compared with that of the previous period, the expression is simply: (This month – Previous month)/Previous month. Such style of presentation is really simple and clear.


For R user, they can also adopt the similar practices with loops as a must. In fact, if the month-on-month comparison is to be conducted every year, esProc users will only need to replace the 2 in A4(2).derive((#1-#1[-1])/#1[-1]) with ~, while R user will have to use the nested double loops.


Considering the business expert is less skilled in IT field than that of the program developer or the IT pro, I remove the loop with a trick so as to let the business people to handle it smoothly. The actual solution of R is to move the original X down a row, acting as the computational column. Based on this new column and the original column to compute, the computation of relative position can be converted to the inter-column computation. Someone may think the solution becomes simpler at the cost of the adding difficulty of comprehension. Because the R lacks the means to represent the relative positions, the R users will have no choice. They get such a dilemma of compromising the complexity of coding or that of the understanding.


Judging from above examples, we can find that both R and esProc are highly capable for users to compute after grouping. However, esProc provides a much simpler and easier style of representation for users to understand.

November 7, 2012

Group the subtables: SQL and esProc Comparison

We often need to group the subtables during the business analytics and statistics. We know it is easy for us to group a parent table, but not that convenient to group the subtables. What your way to group a subtable? Here let's see how SQL and esProc group the subtables.




Group the subtables: SQL vs. esProc, for example:



To list the employee and count the cities WHERE the employee has worked over one year.



Database table: staff, resume.

And their main fields:

Staff: name

Resume: name,city,workingDays



Check the SQL solution:



SELECT name,count( *) cityCount

FROM (SELECT staff.name name,resume.city city

FROM staff,resume WHERE staff.name=resume.name

GROUP BY name,city

HAVING sum(workingDays)>=365)

GROUP BY name




Process the subtable in the way as joining the multiple tables. The grouped result set has the same number of records as the subtable. The result sets must be grouped again in order to join the records to have the same number as the primary table.





Check the esProc solution:


A

1 =staff.new(name,resume.group(city).count(~.sum(workingDays)>=365):cityCount)




Handle the subtable sets as the fields of the primary table, hence group and filter them as a regular set.

So any other way to group the subtable? Welcome to discuss it with me!



Author: Jim King

BI technology consultant for Raqsoft

10 + years of experience on BI/OLAP application, statistical computing and analytics

Email: Contact@raqsoft.com

Website: www.raqsoft.com



November 1, 2012

Syntax Agility Comparison: R Language vs. esProc

By definition the true agile syntax only requires users to memorize a small number of basic functions to implement a great many of advanced functions through simple processing on the basic functions. The said simple processing is a programming style of lightweight effort that is far easier and simpler to grasp than the common programming. As the advantage of agile syntax, the number of basic functions will be reduced to alleviate the learning effort and cost of users, providing users with the simple-and-easy syntax to implement the more advanced functions.



People need to use statistical tools for analytics computing and statistical computing. No matter business experts or the technicist, they both needs a application with agile Syntax. Both R language and esProc are good at agile syntax. Their differences can be illustrated with the examples below:



Take computing the quadratic sum of various vector members for example. Although both R and esProc provide some functions to compute the quadratic sum, we are not going to use the existing functions in this example. Instead, the most fundamental functions will be adopted to implement this function.



R solution:

  01 A4<-c data-blogger-escaped-br="br">
  02 Reduce(function(x,y) x+y*y, c(0,A4))


esProc solution:

  A1 =[1,2,3]

  A2 =A1.loop@s( 0; ~~+~*~ )


Comments:

In the example codes of R, the function Reduce is a quite useful function to implement a great many of advanced functions. In addition, the lambda syntax is also adopted for R users to construct the simple functions easily with concise codes. In addition, to facilitate the computation, one extra 0 is required to fill into the original vector.



esProc users can use the basic looping function loop to implement the similar function, not requiring the lambda syntax (although it is supported), and not having to add extra 0 (although it still requires a 0 as initial). Comparatively, the esProc code is much easier to understand. In addition, esProc can simply use ~ to represent the "member in computation" directly, not requiring an additional function to represent it. While the R users cannot represent it directly, they will have to construct a function and use xy to represent the "member in computation". It is obviously very inconvenient for R users. Let's take the moving averages computation below for example:




Based on the Orders table from Northwind, compute the moving averages of freightage in three days.




R solution:

   filter(result$Freight/3, rep(1, 3),sides = 1)


esProc solution:

=A1.( ~{-1,1}.( Freight).avg())



Comments:

Because R cannot represent the "member in computation", it is hard to compute the moving average with the basic functions. To reduce the difficulty, we will have to establish a customized advanced function filter at the cost of spending extra time of users to study the new function.



As for esProc users, to make up the advanced function to compute the moving averages, they only need the basic moving average function avg (of course the similar functions are also available in R) plus the featured member representations.



In this case, the {-1,1} is to represent the relative position in esProc. The -1 indicates the previous member, and the 1 indicates the next member. The {-1,1} indicates the range of members: there are 3 in total. Representing the relative positions in this way is easy to understand and use. Comparatively, the R users can represent the absolute position directly but hard to represent the relative position. This is not a minor obstacle. Let's check the example below.




It is an example of period-on-period comparison from another essay I composed, Computation after Grouping - R Language vs. esProc. To demonstrate it in a more simplified way, let's suppose the data is sorted by time. In this case, only compute the freightage compared with the previous period.



esProc solution:

  =A1.((Freight-Freight[-1])/Freight[-1])


R solution:

  c(0,result$Freight[-1]-result$Freight[-length(result$Freight)])/result$Freight[-length(result$Freight)])

Comments: esProc solution is quite intuitive and straightforward, that is, "(current-previous)/previous", in which the [-1] indicates the previous one.



R users can use the [-1] but to represent the "remains after removing the first record in the vector", instead to represent the "previous". Because no representation of relative position is available, the row-moving method is adopted, that is, to construct a new column and then move the data of each row downward by one row. R users will need to convert the computation between rows to that between columns. As can be seen, although the result is correct, the algorithm is quite perplexing to those who are not expert at R.


These examples prove that the "current member" and "relative position" of esProc is very characteristic and impressive to simplify the data computation greatly. Thus users only need to memorize few basic functions to implement a great many of advanced functions, saving lots of effort to learn. This is good news for lazy bone like me. Undoubtedly, there are quite a lot of similar syntax, such as the function options and cascade parameters.



The function option is a way to use the symbol to expand the common functions/features of functions, such as to change the type of return value and to determine whether to sort the results. For example, the group is a function for grouping, group@o indicates not to sort out the result, group@z indicates to sort out in reverse order, and group@1 indicates that only the first record of each group will be retrieved. Then, the result of this type is not a set but a TSeq (corresponds to the list and dataframe of R). The common functions/features refer to those can be implemented through other functions by adding @o@z@1, for example, the filter function select.




R uses various function names and parameters to expand the functions/features of functions, such as tapply, sapply, lapply, by and other functions. Although they are in effect just the extension to the cycling and computing function apply, you will have to grasp 5 functions at least before you can declare that the cyclic computation is grasped. To this point, it is self evident which way requires less functions to memorize. You may already have an idea of which solution demands less to learn.



As for the cascade parameters, this document will not discuss it in details. The interested readers can research it for themselves.



Finally, R excels in the matrix computation with a great many of fixed analysis algorithms. Plus, the customized library functions are also excellent. I am fascinated with these advantages. However, I must admit that esProc beat R for the basic characteristics of being easy-to-learn/use as well as the syntax agility among the statistical computing tools.


Author: Jim King

BI technology consultant for Raqsoft

10 + years of experience on BI/OLAP application, statistical computing and analytics

Email: Contact@raqsoft.com

Website: www.raqsoft.com

Blog: datakeyword.blogspot.com/