A moving average is used to smooth out a
time series. Computing moving average is a typical case of ordered data computing.
Its basic computing method is to create a subset composed of N consecutive members
of a time series, compute the average of the set and shift the subset forward
one by one. The following example teaches you how to compute moving average in
R language.
Case description:
Data frame sales has two
fields: salesDate and Amount of this date. Requirement:
compute the moving average in three days. Computing steps include seeking sales
amount average of the previous day, the current day and the next day, and shift
forward along the dates. A part of the source data is as follows:
Code:
Computed result:
Code interpretation:
filter function can be used in R language to compute moving average, which
produces concise code. This method is quite convenient.
Despite the convenience of the filter function, it is difficult to
understand for beginners. For example, sales$Amount/3means dividing the current value of field Amount by
three,but when it is used in filter
function, it may mean adding the three consecutive values together, then divide
the sum by three. [1,1,1] is the value of expression rep(1,3), which is used
here to specify the range of data fetching. In addition, because neither the
name nor the parameters of filter
function contain the words “average” and “moving”, even many developers of R
language don’t know its use for computing moving average.
In fact, filter function is a universal linear filter. Its use is more than
computing moving average. Its complete function reference is filter(x, filter, method =
c("convolution", "recursive"),sides = 2, circular = FALSE,
init).
Any modification of the requirement will
make the code more difficult to understand. For example, the code for computing
moving average of the current day and the previous two days cannot be written
as filter(sales$Amount/3, rep(0,2)),
it has to befilter(sales$Amount/3,
rep(1,3), sides = 1).
Summary:
R language can compute moving average, but
its code is rather elusive.
Third-party solutions
We
can also use Python, esProc and Perl to handle this case. As R language, all of
these languages can perform data statistics and analysis and compute moving
average. The following introduces solutions of Python and esProc briefly.
Python(pandas)
Pandas is Python's third-party library
function. It is powerful in processing structured data with basic data type
imitating R's data frame. At present the latest version is 0.14. Its code for
handling this case is as follows:
pandas.stats.moments.rolling_mean(sales["Amount"],
3)
The name of rolling_mean function is clear, even a developer without experience
with pandas can understand it easily. The function’s usage is simple too. Its
first parameter is the sequence being computed and the second parameter is N, which
is the number of days in seeking moving average.
esProc
esProc is good at expressing business logic
freely with agile syntax. Its expressions for relative position can solve
computational problems of ordering data easily. The code is as follows:
sales.(Amount{-1,1}.avg())
{-1,1}
in the code represents a relative interval, that is, the three days of the
previous day, the current day and the next day. It can be seen that moving
average can be worked out clearly and flexibly by using a relative interval. If
it is required, for example, to compute the moving average of the current day
and the previous two days, we just need to change the interval to {-2,0}in esProc.
No comments:
Post a Comment