1.Generating data
Generate odd numbers between 1 and 10.
x=to(1,10).step(2)
In the code, to(1,10)generates consecutive integers from 1 to 10, step function gets members in consecutively according to the computed result of last step and the final result is [1,3,4,5,7,9]. This type of data in esProc is called a sequence.
The code has a simpler version: x=10.step(2).
R language:
This piece of code gets integers directly and inconsecutively from 1 to 10. Computed result is c(1,3,4,5,9). This type of data in R language is called vector.
A simpler version of this piece of code isx<-seq(1,10,2).
Comparison:
1.Both can solve the problem in this example. esProc needs two steps to solve it, indicating theoretically a poor performance. While R language can resolve it with only one step, displaying a better performance.
2.The method for esProc to develop code is getting members from a set according to the sequence number. It is a common method. For example, there is a string sequence A1=["a", "bc", "def"……],now get strings in the positions of odd numbers. Here it’s no need to change the type of code writing, the code isx=A1.step(2).
R language generates data directly, thus it has a better performance. It can write common expressions, too. For example, get strings in the positions of odd numbers from the string vector quantity A1=c("a", "bc", "def"……), the expression in R language can bex=A1[seq(1,length(A1),2)].
R language generates data directly, thus it has a better performance. It can write common expressions, too. For example, get strings in the positions of odd numbers from the string vector quantity A1=c("a", "bc", "def"……), the expression in R language can bex=A1[seq(1,length(A1),2)].
3.esProc loop function has characteristics that R language hasn’t, that is, built-in loop variables and operators. “~” represents the loop variable, “#” represents the loop count, “[]” represents relative position and “{}” represents relative interval. By using these variables and operators, esProc can produce common concise expressions. For example, seek square of each member of the set A2=[2,3,4,5,6]:
A2.(~*~) /Result is[4,9,16,25,36], which can also be written as A2**A2. But the latter lacks a sense of immediacy and commonality.R language can only use A2*A2 to express the result.
Get the first three members:
A2.select(#<=3) / Result is [2,3,4]
Get each member’s previous member and create a new set:
Growth rate:
A2.((~ - ~[-1])/ ~[-1]) /Result is [null,0.5,0.33333333333,0.25,0.2]
Moving average:
Summary:
In this example, that R language can directly generate data and produce common expressions shows that it is more flexible and takes less memory space than esProc.
2. Filtering records
Computational objects of a loop function can be an array or a set whose members are single value, or two-dimensional structured data objects whose members are records. In fact, loop function is mainly used in processing the latter. For example, select orders of 2010 whose amount is greater than 2,000 from sales, the order records.Note: sales originates from a text file, some of its data are as follows:
esProc:
sales.select(ORDERDATE>=date("2010-01-01")
&& AMOUNT>2000)
Some of the results are:
R
language:
Some of the results are:
Comparison:
1. Both esProc and R language can
realize this function. Their difference lies that esProc uses select loop function while R language
directly uses index. But there isn't an essential distinction between them. In
addition, R language can further simplify the expression by using attach function:
sales[as.POSIXlt(ORDERDATE)>=as.POSIXlt("2010-01-01")
& AMOUNT>2000,]
Thus, there are more similarities
between them.
2. Except query, loop function can
be used to seek sequence number, sort, rank, seek Top N, group and summarize,
etc. For example, seek sequence numbers of records.
sales.pselect@a(ORDERDATE>=date("2010-01-01")
&& AMOUNT>2000) /esProc
which(as.POSIXlt(sales$ORDERDATE)>=as.POSIXlt("2010-01-01")
&sales$AMOUNT>2000) #R language
For example, sort
records by SELLERID in ascending order and by AMOUNT in descending order.
sales.sort(SELLERID,AMOUNT:-1) /esProc
sales[order(sales$SELLERID,-sales$AMOUNT),] /R language
For example, seek the
top three records by AMOUNT.
sales.top(-AMOUNT;3) /esProc
head(sales[order(-sales$AMOUNT),],n=3) /R language
3. Sometimes, R language computes
with index, like filtering; sometimes it computes with functions, like seeking
sequence numbers of records; sometimes it programs in the form of “data set +
function + data set”, like sorting; and other times it works in the way of “function
+ data set + function”, like seeking TopN. Its programming method seems
flexible but is liable to greatlyconfuse programmers. By comparison, esPoc
always adopts object-style method “data set + function + function …”in access.
The method has a simple and uniform structure and is easy for programmers to
grasp.
Here is an example of performing
continuous computations. Filter records and seek Top N. esProc will computelike
this:
sales.select(ORDERDATE>=date("2010-01-01")
&& AMOUNT>2000).top(AMOUNT;3)
And R language will
compute in this way:
head(Mid [order(Mid$AMOUNT),],n=3)
As you can see, esProc
is better at programming multi-step continuous computations.
Summary:In
this example, esPoc gains the upper hand in ensuring syntax consistency and
performing continuous computations, and is more beginner-friendly.
3.
Grouping and summarizing
The loop function is
often employed in grouping and summarizing records. For example, group by
CLIENT and SELLERID, and then sum up AMOUNT and seek the maximum value.
esProc:
sales.groups(CLIENT,SELLERID;sum(AMOUNT),max(AMOUNT))
Some of the results
are as follows:
R
language:
result1<-aggregate(sales[,4],sales[c(3,2)],sum)
result<-cbind(result1,result2[,3])
Some of the results
are as follows:
Comparison:
1.In this case, more than one
summarizing method is required. esProc can complete the task in one step. R
language has to go through two steps to sum up and seek the maximum value, and
finally, combine the results with cbind,
because its built-in library function cannot directly use multiple summarizing
methods simultaneously. Besides, R language will have more memory usage in
completing the task.
2. Another thing is the illogical
design in R language. For sales[c(3,2)],
the group order in the code is that SELLERID is ahead of CLIENT, but in
business, the order is completely opposite. In the result, the order changes
again and becomes the same as that in the code. In a word, there is not a
unified standard for business logic, the code and the computed result.
Summary:In this example, esProc has the advantages of high
efficiency, small memory usage and having a unified standard.
4.Seeking quadratic sum
Use
a loop function to seek quadratic sum of the set v=[2,3,4,5].
Please
note that both esProc and R language have functions to seek quadratic sum, but a
loop function will be used here to perform this task.
esProc:
v.loops(~~+~*~;0)
R
language:
1.Both esProc and R language can
realize this function easily.
2.The use of loops function by esProc means that it sets zero as the initial
value, computes every member of v in
order and returns the final result. In the code, "~" represents member being
computed and "~~" represents computed result of last step. For example, the
arithmetic in the first step is 0+2*2 and
that in the second step is4+3*3,
and so forth.The final result is 54.
The use of reduce function by R language means that
it computes members of [0,2,3,4,5] in order, and puts the computed result of
the current step into the next one to go on with the computation. As esProc,
the arithmetic in the first step is 0+2*2 and that in the second step is 4+3*3,
and so forth.
3. R language employs lambda expression
to perform the operation. This is one of the programming methods of anonymous
functions, and can be directly executed without specifying the function name.
In this example, function(x,y),the
specification, defines two parameters; x+y*y, the body, is responsible for performing the operation; c(0,v) combines 0and v into[0,2,3,4,5] in which every member will take
part in the operation in order. Because it can input a complete function, this
programming method becomes quite flexible and is able to perform operations
containing complicated functions.
The esProc programming method can be regarded as an implicit lambda expression, which is essentially
the same as the explicit expression in R language. Butit has a bare expression
without function name, specification and variables and its structure is
simpler. In this example, "~" represents the built-in loop variable unnecessary to be defined; ~~+~*~is the expression
responsible for performing the operation; v is a fixed parameter in which every member will take
part in the operation in order. Being unable to input a function, it is not as
good as R language theoretically in flexibility and ability of expression.
4. Despite being not flexible enough in theory, esProc programming
method boasts convenient built-in variables and operators, like ~, ~~, #, [],
{}, etc., and gets a more powerful expression in practical use. For example,
esProc uses“~~” to directly represent the computed result of last step, while R
language needs reduce function and
extra variables to do this. esProc can use “#” to directly represent the current
loop number while R language is difficult to do this. Also, esProc can use “[]”to
represent relative position. For example, ~[1]is used to represent the value of next member
and Close[-1]is used to
represent value of the field Close in
the last record.
In addition, esProc can
use“{}”to represent relative interval. For example, {-1,1}represents the three members between the previous
and next member. Therefore,the common expression v.(~{-1,1}.avg())can be used to compute moving
average, while R language needs specific functions to do this. For
example,there is even no such a function for “seeking average” in the
expression filter(v/3, rep(1,
3),sides = 1), which is difficult to understand for beginners.
Summary:In this case, the lambda
expression in R language is more powerful in theory but is a little difficult
to understand. By comparison, esProc programming method is easier to
understand.
5. Inter-rows and –groups
operation
Here
is a table stock containing daily
trade data of multiple stocks. Please compute daily growth rate of closing
price of each stock.
Some
of the original data are as follows:
esProc:
A12=A11.(~.derive((Close-Close[-1]):INC))
R
language:
for(I in 1:length(A10){
A10[[i]][order(as.numeric(A10[[i]]$Date)),]
#sort by Date in each group
A10[[i]]$INC<-with(A10[[i]],
Close-c(0,Close[-
length (Close)])) #add a column, increased price
}
Comparison:
1. Both esProc and R language can achieve
the task. esProc only uses loop function in computing, achieving high
performance and concise code. R language requires writing code manually by
using for statement, which brings poor
performance and readability.
2. To complete the task, two
layers of loop are required: loop each stock, and then loop each record of the
stocks. Except being good at expressing the innermost loop, loop function of R
language (including lambda syntax)
hasn't built-in loop variables and is hard to express multi-layer loops. Even
if it manages to work out the code, the code is unintelligible.
Loop function
of esProc can not only use “~” to represent the loop variable, but also be used in
nested loop, therefore, it is expert at expressing multi-layer loops. For
example, A10.(~.sort(Date))in
the code is in fact the abbreviation of A10.(~.sort(~.Date)).The first “~” represents the current stock,
and the second "~" represents the current record of this stock.
3. As a typical ordered operation, it is required that the closing price of
last day be subtracted from the current price. With the useful built-in
variables and operators, such as #,[] and {}, esProc is easy to express this type of ordered
operation. For example, Close-Close[-1]can
represent the increasing amount. R language can also perform the ordered
operation, but its syntax is much too complicated due to the lack of facilities
like loop number, relative position, relative interval and so on. For example,
the expression of increasing amount is Close-c(0,Close[- length (Close)]).
It is hard enough for loop function in
R language to perform the relative simple ordered operation in this example,
let alone the more complicated operations. In those cases, multi-layer for loop is usually needed. For example,
find out how many days the stock has been rising:
A10<-split(stock,
stock $Code)
for(I in
1:length(A10){
A10[[i]][order(as.numeric(A10[[i]]$Date)),]
#sort by Date in each group
A10[[i]]$INC<-with(A10[[i]],
Close-c(0,Close[- length (Close)])) #add a column, increased price
if(nrow(A10[[i]])>0){ #add a column, continuous increased days
A10 [[i]]$CID[[1]]<-1
for(j in 2:nrow(A3[[i]])){
if(A10 [[i]]$INC[[j]]>0 ){
A10 [[i]]$CID[[j]]<-A10 [[i]]$CID[[j-1]]+1
}else{
A10 [[i]]$CID[[j]]<-0
}
}
}
}
The code in esProc
is still concise and easy to understand:
A10=stock.group(Code)
A11=A10.(~.sort(Date))
A12=A11.(~.derive((Close-Close[-1]):INC),
if(INC>0,CID=CID[-1]+1, 0):CID))
Summary:In
performing multi-layer loops or inter-rows and -groups operations, esProc loop
function has higher computational performance and more concise code.
No comments:
Post a Comment