esProc, A Script Language for Data Analytics with Parallel Mechanism: Developers Find Innovative Way for Data Process in Java

In Java development, the typical data computation problems are characterized with:

Long computation procedure requiring a great deal of debugging

Data may from database, or Excel/Txt

Data may from multiple databases, instead of just one.

Some computation goals are complex, such as relative position computation, and set-related computation

Just suppose a sales department needs to make statistics on the top 3 outstanding salesmen ranking by their monthly sales in every month from Jan to the previous month, based on the order data.

Java alone is difficult to handle such computations. Although it is powerful enough and also quite convenient in debugging, Java has not directly implemented the common computational algorithms yet. So, Java programmers still have to spend great time and efforts to implement the details like aggregating, filtering, grouping, sorting, and ranking. In the respect of data storage and access, programmers have to use List and other objects to assemble every 2D table and every piece of data, and then arrange the nested multi-level loops. In addition, such computation involves set and relation operations on massive data, or relative position between object and object properties. The underlying logics for these computations demand great efforts, not to mention the Excel or Text data, data from set, and the complex computational goal.

How to improve the data computational capability for Java? How to solve this problem easily?

SQL is an option. SQL implements lots of data computational algorithms and alleviates the workload to some extent. But, it is far from solving the problem due to the below weak points:

First, SQL takes a long query as a basic computation unit. Programmers are only allowed to view the final result but not the details of running. It is awkward to prepare the stored procedure and a great many of stage tables just to debug barely. Using special scripting for debugging? Low cost-efficiency indeed! A lengthy SQL statement will bring about exponential increase in the difficulty of reading or writing, possibility of error, and maintenance cost.

Second, to address the Excel, text, or heterogeneous data computation with SQL, programmers have to establish the data mart or global view with ETL or Linked Server at great cost. In addition, SQL does not support the step-by-step computations for decomposing the complex computation goal. Its incomplete support for the set makes programmers still feel tough to solve some complex problems.

So, we can conclude that SQL has limited impact on improving the computational efficiency for Java.

In this case, esProc is highly recommended – a database computation development tool ideal for simplifying the complex computations and tailored for cross-database computation and explicit sets with convenient debugging, and direct support for JDBC to integrate with the Java apps easily.Still the above example, esProc scripts are as shown below:

esProc boasts the grid-style and agile syntax specially designed for the massive amount of structured data. In addition, esProc can directly retrieve and operate on the data from multiple databases, Text files, and Excel sheets. With the support for external parameters, native support for cross-database computations, and code reuse, esProc boosts the data computing efficiency of Java greatly.

menu

August 13, 2013

Developers Find Innovative Way for Data Process in Java

No comments:

Post a Comment