esProc, A Script Language for Data Analytics with Parallel Mechanism: A Middleware for Data Reporting

Traditionally, there are three ways to prepare data source for reporting:
1.Retrieve data by directly connecting the reporting tool to the database and submit SQL statements to the database in which data source is prepared;
2.Partial data preparation is made by the reporting tool using its own computing capabilities (functions for handling grid cells or calculations between them);
3.The reporting tool will use API to call a Java program to accomplish the data preparation.

In this article, let’s compare the three ways and try to find out an optimal solution.

Data preparation with database

Advantages: High data integrity, rich basic algorithms and high-performance in-database computation.

Disadvantages:

1.Despite the high performance the in-database computation achieves, much of it could be canceled by the performance loss – a persistent problem – resulted from retrieving data by the reporting tool via JDBC and conversion from data flow to objects.

2.Simple SQL statements are executed rapidly, but complex ones are not easy to optimize. The stored procedure that uses a for statement for data fetching could be even slower than Java.

3.SQL isn’t thoroughly set-oriented; doesn’t promote stepwise computation and support ordered sets and object reference. These shortcomings make natural thinking in SQL and the stored procedure impossible and programming in them difficult.

4.The building and maintenance of a database is expensive, so it’s reasonable to store only the core business data, instead of the intermediate tables, in it. Unfortunately, most report data sources originate from the intermediate tables.

5.A database is flat-structured, only capable of managing a small number of tables and unable to organize data in multilevel directories. In-database computation will generate a large number of intermediate tables that are related to each other and that may have confusing names, causing difficulty to data administration.

6.It’s difficult to handle multiple/heterogeneous data sources in databases, such as cross-database computations and computations between database and text file.

7.The tightly coupled storage and computing structures reduce portability and scalability.

Data preparation with reporting tools

Advantages: It addresses the shortcomings of in-database computation to some extent.

Disadvantages:

1.In handling a computation, the reporting tool has to hide its rows and columns to make room for storing the intermediate results, increasing memory usage and decreasing performance.

2.Apart from storing values, the grid cells of a report have appearance properties (such as font, size and background color) that may affect performance.

3.Reporting tools can only handle relatively simple computations, such as aggregation and getting computed columns. They lack the ability to deal with most of the common data handling tasks like related computing, set operations and multistep computations.

4.The tightly coupled storage and data presentation structures reduce portability and scalability.

Data preparation with Java program

Advantages:

1.A Java program can access file data through parallel processing to enhance the performance and capacity of a reporting tool.

2.A Java program can handle multiple/heterogeneous data sources.

3.A Java program enables programmers and developers to retrieve core business data from database and store and get the large number of intermediate tables from files, effectively alleviating the database pressure.

4.A Java program permits accessing multiple databases simultaneously to realize real-time big data query (T+0) and display newly-generated data in the report.

Theoretically, a Java program is the optimal solution. Because it works as a middleware between the database and the reporting tool to separate storage, computing and presentation from each other and reduce coupling for the whole system. However, there’s a major drawback to the Java program. It doesn’t possess the capability for structured-data handling and thus can only manage the simplest computing scenarios.

esProc can serve as Java’s extended class library to solve the basic class library issue and still cover the four Java merits. It has these features:

1.esProc provides a wide selection of class libraries for handling semi-structured and structured data. They enable stepwise computing model, complete set orientation, ordered sets and object reference mechanism, making the programming more natural and simpler, and producing concise code.

2.An esProc script can be saved as a dfx file and place in a directory together with the corresponding report file, simplifying the code management process.

3.esProc is an interpreted language and doesn’t need to be compiled and packaged.

4.esProc encapsulates the Java multithreaded programming to simplify and facilitate the development.

5．esProc accepts invocation from reporting tools through integration-friendly and cost-effective JDBC interface.

For more information, please see How to Use esProc to Assist Reporting Tools.

menu

January 7, 2016

A Middleware for Data Reporting

No comments:

Post a Comment