July 10, 2013

What’s the Best Way for Structured Data Computing in Java?

When developing Java programs, sometimes, we may find ourselves facing such a challenge: Performing the massive structured data computation according to the data from text or Excel. For example, with the policy data details, how to find out the salesman who sold the most or the least insurance products during a certain period?

How about importing the above mentioned data into database manually and then computing with SQL database? Bad idea! Importing all the data to database just for once cannot ensure the live update of data. Once data changed, users will have to import over again. This will lead to cumbersome procedures and bring about great workload.

How about automatically importing data to database at a specified time with ETL tool? Still bad! This practice will usually require PHP/Perl/VBScript/JavaScript knowledge and massive data update algorithms, which is very costly in time and money. Moreover, the changes of demand cause difficult maintenance and also have great impact on the performance and stability of the existing database. To meet a certain computing demand, it is not worthy to build a system that is similar to the data warehouse.

In addition, not all Java applications are incorporated with a database. What if there is no database? It is obviously a bad idea to install a set of databases for just meeting a computation demand.

Even if the database and the ETL regular update system are established successfully. It is still inconvenient in most cases, for example, the cross-database computation, and sometimes it is hard to reduce the coupling between data computing scripts and Java codes. Take the SQL statements in this case for an example, it is already quite hard to understand and maintain them, let alone composing such complex scripts, as shown below:

SELECT salesMan 
FROM (SELECT salesMan,
             row_number() OVER (ORDER BY isrCount DESC) descOrder, 
             row_number() OVER (ORDER BY isrCount ASC) ascOrder 
      FROM (SELECT salesMan,
                   COUNT(*) isrCount
           FROM insurance  where salesDate>= ? and salesDate<=?
           GROUP BY salesMan
     )
WHERE descOrder=1 OR ascOrder=1 ORDER BY descOrder

How about using the Java language directly? Even worse! Java does not offer the functions to implement query, group, sort, and summarizing directly and straightforwardly, while these are basic requirements for the massive structured data computation. You will have to implement all these details by yourself.Then, no way out? No. esProc is just an ideal tool to accomplish the objective. This is a professional Java development tool for database computing and really good at simplifying the complex computation to integrate with Java easily. For esProc, the scripts are shown below:

esProc can retrieve data directly from and compute on multiple databases\txt files\Excel sheets. esProc offers a grid style and agile syntax specially tailored for massive structured data computation. The result can be exported to JDBC to boost the Java computational capability dramatically. In addition, it enables the cross-database computation and supports code reuse by nature. Even the debug functionality is also quite perfect. Considering all these advantages, it is clear that esProc is more efficient than SQL.

No comments:

Post a Comment