esProc, A Script Language for Data Analytics with Parallel Mechanism: Data Source Preparation Tool Especially for Report Developers

Many report developers may have the experience in presenting the KPIs in a report for those outstanding sales man whose sales has been rising over 10% for consecutive 3 months. The procedure of finding the outstanding salesmen is actually preparing the data source.

Preparing the data source is the key and the tough part of developing a report.

There are multiple ways to prepare the data source. SQL or SP can be used to handle the normal data computation of a single database; R language for the complex data computation; ETL or data warehouse for cross-database computation, by arranging all to a same database and then compute. For the structural data from non-database files or sheets, the senior programing languages can be used to generate result sets, for example, retrieving the data from Text file with Java class. However, these methods all have some drawbacks as discussed below.

Let’s start the discussion with SQL/SP. First, SQL/SP alone can only work on a single database because various cumbersome workarounds are unavoidable for multi-database computation. What’s worse, second, SQL statement is hard to debug. That situation is even worse for the long SQL statement since a more complex computational goal will inevitably give rise to more steps and a longer statement. It is the real nightmare for preparing data source. Third, the inability of SQL statement to run step by step has great impact on the maintenance and re-use. The SQL statement can only run as a whole, and all computational logics must be crammed into a single statement. It is impossible to split one SQL statement into several examinable computational procedures for users to check out the result at each step intuitively. Forth, SQL lacks the explicit sets and the direct support for the ordered computation which are common in the complex computation. So, SQL/SP usually costs several-fold more time and effort than other tools do in the related computation.

R language is quite good at handling the complex data computing, isn’t it a better choice? No. The truth is that R language has not incorporated with a perfect IDE. It is very inconvenient for users to compose and edit the computational scripts, not to mention its poor debugging. The report developers are not the professional coder, so their productivity will suffer if working in such IDE. More importantly, R does not provide JDBC or any output interface for the direct use by reporting tools. In order to use the R in reports, users have to implement a report interface program additionally to process the data and receive the parameters. Too much trouble.

For ETL or data warehouse, first, it usually incurs a great expense on human resources, equipment, maintenance, and training. Second, report developers will have to grasp ETL scripts like PHP, Perl, VBScript, and JavaScrip, and design the massive update algorithms. Considering these troubles, 99% report developers will surely get a headache.

The real trouble for Java and other senior languages is that users will have to implement all the details by themselves: open the Excel file, build a record, generate a List, retrieve with loops, seek the maximum value, group, compute the average, filter the data, sort, and then seek the top N – the greatest flexibility seems to be obtained at the cost of the greatest workload.

In view of all these discussions above, that would have been good news for report developers if there is a data source computation tool specially built for the report, with all advantages of the above methods, and free from their disadvantages.

esProc is such a tool. On the one hand, it is as capable as SQL or SP regarding its professional database computational capability; on the other hand, it offers the convenient debugging and enables the step-by-step computation. Compared with R, esProc also supports the ordered computations and the explicit sets for solving the complex computation problem, while still offering a more user-friendly IDE interface and JDBC output interface to ensure the usability. esProc is as capable as ETL/data warehouse on performing the cross-database computation, but more cost-effectively owing to its low TCO and efficient deployment and usage. esProc allows for the direct data retrieval from Excel and Txt file as Java does, and more superior to Java in the respect of handling the massive structural data directly.

In conclusion, esProc is the ideal tool specially designed to prepare the data source of reports.

menu

September 17, 2013

Data Source Preparation Tool Especially for Report Developers

No comments:

Post a Comment