Java
language does not have any competitive advantages in data computing, in particular
the massive structured data computing. For
example, according to the order detail computation, we need to find out the
sales persons whose sales growths are over 10% in 3 consecutive months.
Java
does not have the related advanced function to implement this. So, it’s hard
for Java to handle such computation only with its own capability. Java needs a
large amount of time and effort to manually realize the details in computation.
For example, firstly, define classes and represent every piece of data with
objects; secondly, use List to store multi-pieces of data; thirdly, use the
nested multi-level loops to compute. Except the
sorting algorithm, almost all massive data processing algorithms involved in
the computation require manual implementation, such as aggregating, filtering,
and grouping. Such
computations usually involve the set computation and relation computation among
massive data, or computation on relative positions between objects and object
attributes. It takes great efforts to implement the underlying logics for these
computations.
That’s
why we must improve the Java computational capability. We need a tool tailored
for implementing the structured data computation easily!
How
about SQL? Not all Java application allows for using database. In addition,
there are many data in Txt/Excel, and sometimes, problems of computation across
databases and code reuse may be encountered. Moreover,
SQL is still not convenient for handling many computations. Taking
the above-mentioned computation for example, SQL is by no means convenient to
compose:
01 WITH A AS
02 (SELECT salesMan,month,
amount/lag(amount)
03
OVER(PARTITION BY salesMan ORDER BY month)-1 rising_range
04 FROM sales),
05 B AS
06 (SELECT
salesMan,
07 CASE WHEN rising_range>=1.1 AND
08
lag(rising_range) OVER(PARTITION BY salesMan
09 ORDER BY month)>=1.1 AND
10
lag(rising_range,2) OVER(PARTITION BY salesMan
11 ORDER BY month)>=1.1
12
THEN 1 ELSE 0 END is_three_consecutive_month
13 FROM A)
14 SELECT DISTINCT salesMan FROM B WHERE is_three_consecutive_month=1
In this
case, esProc is the better choice.
esProc
is a development tool for
database computing, specializing in simplifying the complex computation and is
quite convenient to integrate with Java. For
esProc, the corresponding scripts are shown below:
esProc allows for the
direct retrieval and computation across multiple databases, text files, and
Excel sheets. Its grid style and agile syntax are especially designed for the
massive structured data computation. It supports external parameters,
and the result can be exported directly via JDBC. So, with esProc, the
computational capability of Java is dramatically improved. In addition, by nature,
esProc supports cross-database computation and the code reuse, with very
perfect debugging functions. No wonder that the development productivity of esProc
is also superior to that of SQL.
I am not sure about java but SQL does seem to give me headaches! I use STATISTICA which is excellent with both structured/unstructured data.
ReplyDelete