esProc, A Script Language for Data Analytics with Parallel Mechanism: Computing with Java for Massive Data without Database

Many Java applications are not incorporated with database. So, what if using such Java applications for query or structured data computing? For example, according to an Excel sheet downloaded from a finance website, find the shares rising for N consecutive days in a certain period.

For the computation on structured data, programmers usually embed the SQL statements in the Java code, and access the database server via JDBC. Although SQL statements are embedded with lots of structured-data-specific algorithms, Java lacks the advanced functions to implement these operations directly and straightforwardly. Therefore, without database, it is quite hard to implement such computation with the language capability of Java only.

It takes programmers a great amount of time and effort to implement every detail in the computation manually. Except the sorting algorithm, almost all algorithms for massive data computing require manual implementations, for example, aggregating, filtering, and grouping. For another example, to define the class and represent every piece of data with object, use List to store multiple pieces of data, and then compute through the nested multi-level loops. The computations of such kinds usually also involve the operations on sets and relations among massive data, or the computations on the relative positions between objects or object properties. It is quite cumbersome to implement these underlying logics.

Embedding a database and then performing ETL is obviously an awkward method. Is there any more agile and convenient method?

In this case, esProc is the best choice. It is a professional database computing and development tool.

esProc is good at simplifying the complex computation, and allows for Java application to access the result from esProc via JDBC. The esProc solution to this case is given below:

esProc can directly retrieve data from and compute on multiple databases\txt files\Excel sheets. esProc offers a grid style and agile syntax specially tailored for massive structured data computation. With the support for external parameters, the result can be exported via JDBC, and invoked by Java language and reporting tools. So, esProc can boost the Java computational capability dramatically. In addition, it enables the cross-database computation and supports code reuse by nature. Even the debug functionality is also quite perfect. Considering all these advantages, it is clear that esProc is more efficient than SQL.

menu

June 26, 2013

Computing with Java for Massive Data without Database

No comments:

Post a Comment