It is difficult for
SQL to handle order-related computations, for it does not support ordered sets.
But as these computations are very common in real-world business, there are a
lot of online discussions related to them. For example:
Usually SQL
approaches order-related computations by:
1.Using window functions
In most cases, use of window functions can help
simplify the SQL if the working database (such as Oracle MSSQL) supports window
functions. The dependence on database support limits their working sphere.
2.Using complex statements
If the database does not support window
functions, you need to use very complicated nested subqueries in composing SQL
statements. Of course some databases (such as MySQL) that support variables and
ordered execution could simplify certain part of the algorithm.
3.Composing a stored procedure
But in handling a complicated scenario, it
is really not easy to compile queries for the stored procedure step by step.
If the main program is Java-based, you can
use esProc (free version is available) to assist the computing process. esProc
supports ordered data. It is much simpler to perform order-related calculations
with esProc than with SQL. Here are some examples.
Calculating year-on-year growth rate and link relative
ratio
Inter-row calculations (such as calculating year-on-year growth rate
and link relative ratio) are difficult to be performed if the current database
does not support window functions. In that case you need to change the way of
approaching the problem by performing join operations. But by doing so, the
code becomes unreadable and inefficient. Even if you can use the window
functions, you still have to write the nested subqueries, producing quite
lengthy SQL statements.
esProc script:
A1: Query the database by the specified time period, and group
Amount by the year and month in OrderDate. Both begin and end are
external parameters. Suppose begin="2011-01-01 00:00:00" and end="2014-07-08
00:00:00".
A2: Add a new field lrr to A1’s table for holding the monthly link
relatives and the year-on-year growth rate of each month. The expression representing
the former is mAmount/mAmount[-1].
esProc supports using [N] or [-N] to represent the Nth row after or
before the current row. In the expression, mAmount represents the amount of the current month and mAmount[-1]
represents that of the previous month. Note that the link relative ratio value of
the first month (here it is January 2011) is null.
A3: Sort A2 by the month and the year to calculate year-on-year
growth rate. The complete expression should’ve been =A2.sort(m,y), but as A2 has
been sorted by the year, you can just sort it by the month with A2.sort(m),
which is more efficient.
A4: Calculate year-on-year growth rate of the sales amount based on A3. Note
that the calculation can only be performed between the same months.
The execution result of the esProc script
can be the data source of the report, or can be called through JDBC from the
Java application. The code for Java to call the esProc script is as follows:
Class.forName("com.esproc.jdbc.InternalDriver");
con=
DriverManager.getConnection("jdbc:esproc:local://");
//Call
esProc script (which is similar to the stored procedure); p1 is the name of the
script file
st
=(com. esproc.jdbc.InternalCStatement)con.prepareCall("call p1()");
st.setObject(1,"2011-01-01
00:00:00");
st.setObject(2,"2014-07-08
00:00:00");
//Execute
the script
st.execute();
//Get
the result set
ResultSet
rs = st.getResultSet();
……
The returned value is a ResultSet object in
accordance with JDBC standard. The method of calling an esProc script is the
same as that of accessing a database. Programmers can master it fast as long as
they are familiar with JDBC.
More details about deploying esProc JDBC
and calling script through it can be found in esProc Integration & Application:
Java Invocation.
A
special type of inter-row calculation
There is another type of
inter-row calculation, which will be explained through an example. Below is the
source data:
Target: To add columns store1 and store2.
For the first record, the rule is store1=store and store2=store. From the
second record up, it would be store1=store+store1[-1] and store2=store1+store2[-1]
in which [-1] represents the previous record.
esProc script:
A1: Execute the SQL statement to retrieve
data.
A2: Create the result set and set the rule
that store1=store+store1[-1] and store2=store1+store2[-1]. esProc uses [-1] to
represent a position relative to the current one, that is, the previous record.
Calculating
continuous rising/declining values
Below is the stock records,
based on which you need to count the number of continuous days when Price
values are positive or negative. The targeted result:
esProc script:
A1: Execute the SQL
statement to retrieve data and sort it by Date
A2: Add a result
field. If both the current Price value and the previous Price value are positive
or negative, add 1 to the result value; otherwise the result value remains 1.
esProc
uses [-1] to represent the previous record relative to the current one, (-2) to
reference the second-to-last record and {-1,1} to represent a dynamic interval.
They can help simplify the complicated inter-row calculations.
Querying
continuous same values
Below is the source data:
Target: To get the num values that have
appeared continuously for at least three times.
esProc script:
A1: Execute the SQL statement to retrieve
data.
A2: Create the variable lx to which the
initial value assigned is 1.
A3: Judge if the num value of the current
record is the same as that of the previous record. If the result is true, add 1
to lx. Finally get num values whose corresponding lx value is 3.
Querying
neighboring record
Below is the source data:
Target: To select the record whose
Property2 value is greater than 100 and find its next record.
esProc script:
A1: Execute the SQL statement to retrieve
data.
A2: Get the position of the record where
Property2>100.
A3: Get the record in the desired position
and the one next to it based on A2’s sequence of position.
Counting
number of rows by specified interval
Below is the table data:
Target: To count the number of rows where
value<10 (or whatever value). If the value is less than 10 in multiple
continuous rows, count them one row. Thus the computing result based on the
above data should be 3.
A1: Execute the SQL statement to retrieve
data.
A2: Group rows according to the condition
that value>10 and value<10, in which @o means only comparing the
neighboring rows. Then count the number of groups where value<10.
Different from SQL grouping which must be
followed by an aggregate, esProc group operation can retain the members of each
group for potential use. This design helps make an SQL algorithm simpler.
Filtering
on dynamic intervals
Below is the source data:
Target: To sort the records by ID in
ascending order and find the records where DDATE is non-ordered, that is, DDATE
hasn’t been sorted in ascending order.
esProc script:
A1: Execute the SQL statement to retrieve
data and sort it by ID.
A2: Select the records where DDATE value is
not the biggest compared with all previous records, and where DDATE value is
not the smallest compared with all the following records. In this statement, DDATE{,0}
represents a set consisting of all DDATE values from the first to the current record
and DDATE{0,} represents a set consisting of all DDATE values from the current
record to the last.
Finding
lost values for an interval
Below is the source data:
Target: To find the lost numbers in an
interval of numbers generated by the smallest value and the biggest value in
field A
esProc script:
A1: Execute the SQL statement to retrieve
data and sort it by A.
A2: Generate an interval consisting of continuous
numbers according to the value of the first record, and then get difference of
this interval and A1.
Merging
intervals
Below is the source data:
Target: To judge whether intervals generated
by values of num1 and num2 in each record are overlapped. Merge them if the
result is true.
The way of approaching it is to judge
whether a row can merge with the previous row. If the result is false, make it
a separate group (+1); if true, group them into one (+0). Finally get the biggest
and the smallest values from each group to generate the interval.
A1: Execute the SQL statement to retrieve
data, and sort it by num1.
A2: Create a temporary variable a, and
assign 0 to it as its initial value.
A3: In the sorted table, compare each num1
value with the num2 value in the previous row. If the num1 value is bigger, start
a new group with the row; otherwise merge the row to which num1 belongs with
the previous row.
Generating
intervals
Below is the source data:
Target: To divide RBD_VALUE values into
segments and compose them into strings, which are like A-F,H,J-K,N-O,Q-S,U-V,X-Z.
Rules:
If RBD_VALUE= “ALL”, then discard the
value;
According to the alphabetical order if certain
RBD_VALUE values are consecutive letters , then make them a segment with both
the first and last letter included and “-” between them, like “A-F”;
If two values of RBD_VALUE are
inconsecutive, like K and N between which L and M are absent, then unite the
former with the letter or letters before it and join the latter with the letter
or letters after it. Separate the two segments with a comma. This is like “J-K”,
“N-O”;
If a value in RBD_VALUE has not consecutive
letters both before and after it, then take it as a separate segment, like the
letter H. G and I are absent respectively before and after it.
esProc script:
A1: Execute the SQL statement to retrieve
data.
A2: Align RBD_VALUE field in A1 with the
alphabet.
A3: Group rows in A2. Group rows where
RBD_VALUE is null together and put the other rows where RBD_VALUE is non-null
into another group. @o means performing
merge-style data grouping, which only compares the neighboring values.
A4: Select the group in which RBD_VALUE is
non-null.
A5: Compose strings according to the number of members in each group. ~.m(-1) means getting the last row from the current group.
No comments:
Post a Comment