esProc, A Script Language for Data Analytics with Parallel Mechanism: report tools invoke

Showing posts with label report tools invoke. Show all posts

September 17, 2013

Data Source Preparation Tool Especially for Report Developers

Many report developers may have the experience in presenting the KPIs in a report for those outstanding sales man whose sales has been rising over 10% for consecutive 3 months. The procedure of finding the outstanding salesmen is actually preparing the data source.

Preparing the data source is the key and the tough part of developing a report.

There are multiple ways to prepare the data source. SQL or SP can be used to handle the normal data computation of a single database; R language for the complex data computation; ETL or data warehouse for cross-database computation, by arranging all to a same database and then compute. For the structural data from non-database files or sheets, the senior programing languages can be used to generate result sets, for example, retrieving the data from Text file with Java class. However, these methods all have some drawbacks as discussed below.

Let’s start the discussion with SQL/SP. First, SQL/SP alone can only work on a single database because various cumbersome workarounds are unavoidable for multi-database computation. What’s worse, second, SQL statement is hard to debug. That situation is even worse for the long SQL statement since a more complex computational goal will inevitably give rise to more steps and a longer statement. It is the real nightmare for preparing data source. Third, the inability of SQL statement to run step by step has great impact on the maintenance and re-use. The SQL statement can only run as a whole, and all computational logics must be crammed into a single statement. It is impossible to split one SQL statement into several examinable computational procedures for users to check out the result at each step intuitively. Forth, SQL lacks the explicit sets and the direct support for the ordered computation which are common in the complex computation. So, SQL/SP usually costs several-fold more time and effort than other tools do in the related computation.

R language is quite good at handling the complex data computing, isn’t it a better choice? No. The truth is that R language has not incorporated with a perfect IDE. It is very inconvenient for users to compose and edit the computational scripts, not to mention its poor debugging. The report developers are not the professional coder, so their productivity will suffer if working in such IDE. More importantly, R does not provide JDBC or any output interface for the direct use by reporting tools. In order to use the R in reports, users have to implement a report interface program additionally to process the data and receive the parameters. Too much trouble.

For ETL or data warehouse, first, it usually incurs a great expense on human resources, equipment, maintenance, and training. Second, report developers will have to grasp ETL scripts like PHP, Perl, VBScript, and JavaScrip, and design the massive update algorithms. Considering these troubles, 99% report developers will surely get a headache.

The real trouble for Java and other senior languages is that users will have to implement all the details by themselves: open the Excel file, build a record, generate a List, retrieve with loops, seek the maximum value, group, compute the average, filter the data, sort, and then seek the top N – the greatest flexibility seems to be obtained at the cost of the greatest workload.

In view of all these discussions above, that would have been good news for report developers if there is a data source computation tool specially built for the report, with all advantages of the above methods, and free from their disadvantages.

esProc is such a tool. On the one hand, it is as capable as SQL or SP regarding its professional database computational capability; on the other hand, it offers the convenient debugging and enables the step-by-step computation. Compared with R, esProc also supports the ordered computations and the explicit sets for solving the complex computation problem, while still offering a more user-friendly IDE interface and JDBC output interface to ensure the usability. esProc is as capable as ETL/data warehouse on performing the cross-database computation, but more cost-effectively owing to its low TCO and efficient deployment and usage. esProc allows for the direct data retrieval from Excel and Txt file as Java does, and more superior to Java in the respect of handling the massive structural data directly.

In conclusion, esProc is the ideal tool specially designed to prepare the data source of reports.

September 10, 2013

Data Source Computation in Advance to Simplify Report Development

According to research, most complex report development work can be simplified by performing the data source computation in advance. For example, find out the clients who bought all products in the given list, and then present the details of these clients.

In developing such reports, it is the “computation” part and not the “presentation” part that brings about major difficulties. In which stage will the computation be most cost-effective? Shall the computation be set in the data retrieval scripting or the post-retrieval report presentation?

The report developers as usual are more willing to compute in the report straightforwardly after retrieving data with SQL or Wizard. On the one hand, it is because most report tools are capable of some step-by-step simple computations by themselves, while SQL only allows for incorporating all logics in one statement and is impossible to be decomposed into several examinable components; on the other hand, most report developers are more familiar with the report functions than that of SQL/SP, and the SQL/SP scripts are more difficult to understand.

However, the report alone cannot give the satisfactory result. Many report developers find the computational goal is hard to achieve in the report. They will ultimately be hard-pressed to learn the SQL/SP, or request the assistance from the database administer. Why?

The root cause is that the report is mainly developed to present but not to compute. The computation is a non-core feature of a report designed to solve the commonest and easiest problem. Achieving the truly complex computational goal will still depend on the professional scripts for computing like SQL. So, only computing the data source in advance can simplify and streamline the developing procedure of such reports.

Stuck in a dilemma? On the one hand, the report can only provide the limited data computing capability; on the other hand, SQL/SP is hard to comprehend and the computational procedure is neither intuitive, nor step-by-step. This is such a headache for most report developers.

esProc can solve the dilemma. It is a professional development tool for report data source, offering the expected computational capability and the user-friendly grid style. In addition, it enables the step-by-step computation to present the result at each step more clearly than report. Compared with SQL, esProc is easier for report developers to learn and understand. They can use it to solve the complex computation more easily and independently, including the computation of the above case.

esProc scripts:

Like SQL, esProc supports the external parameters. The report can reference the esProc directly through the JDBC interface.

In addition, esProc is built with the perfect debugging function, and is also capable of retrieve and operating on the data from multiple databases, text files, and Excel sheets to implement the cross-database computation. esProc is the good assistant to reporting tools and the expert in report data source computation.

August 6, 2012

esProc Being Invoked by Report Tool via JDBC

Robust, platform-independent, and easy for clustering, load-balancing, maintenance, and extension, Java reporting tool has been widely used in various applications. Therefore, esProc provides the JDBC interface for external use and being called by Java reporting tools, as shown in the below structure figure:

For a system in which the Java reporting tool is adopted, esProc is an ideal choice to perform the complex computation, compute with multiple data sources, and clean the dirty data sources. The reporting tool will receive the result returned by esProc via JDBC regarding esProc as a database. Then, the ultimate data can be rendered by the reporting tool boasting the form & chart design, beautiful style, query interface, entry and commit, export for print, and other advantages by nature.
Please find some scenarios below about invoking esProc by reporting tools:

Data is easy to render and algorithm is hard to implement

esProc is quite ideal to handle the complex computations on mass data in the reports.
Different men may have different areas of expertise. Same law applies to the tools. Reporting tool is good at data rendering. The key advantage of reporting tool is to render the data in a way compatible with business style. However, reporting tools can only offer limited support for the complex computation on mass data. By compassion, esProc can help reporting tool solve the most complex computation of the business reports by working with the reporting tools.

Computation with Multi-Datasources

Most reporting tools only allows for one data source. In order to handle the computation with multiple data sources, the data will have to be pre-processed through the complicated SQL/stored procedures in the background. In the background, multiple data sources are merged into one to be submitted later for report use. esProc is ideal to handle such situation.
The applications of multi-datasources may not only refer to multiple query results of one database, but also the query results from various databases, and even the data from the database or the Txt/Excel files. Many computations on such data are not supported by SQL/stored procedure, or the support is limited. esProc is good at the interactive computation with multiple datasources.

Dirty Data Source Need Cleaning

If the data source is quite dirty and not fit for the reporting tool to use directly, then you can use esProc to clean up and output the result to the reporting tool.
Dirty data is quite a common thing: two duplicate employee records, the employee name is null, a document having several UIDs, the date is mistaken to “2011-13-01”, and the digit “10” is taken to be “IO” by mistake. Such dirty data cannot be used directly and must be examined and cleaned according to certain rules. esProc can clean these dirty data well, and the reporting tool can be ensured to receive the clean data from esProc.
In this last example, a communication enterprise needs a report that will display the top 10 states of the highest turnover and the respective rankings of the communications product in these states. The turnover in each city and the administrative divisions are stored in different physical tables.
The computation on this report is too complicated to handle well for the traditional BAND style reporting model. The obvious difficulty is to convert the 3-level relation of “state – county, county – city” in the administrative division to the two-level relation of “state – city” so as to facilitate the summarization of the turnover table. The individual reporting tool is hard to implement such complex computation by itself. esProc is more ideal to solve such problem.

Reduce the 3-level relations to the 2-level relations by the loop statements of esProc.
Use the align function to perform the align action and summarizing computations in a mode of 2-level relations regarding the turnover data in each city. Summarize to the state level, and get the result named detail is just the product turnover of each category in each state.
Use the Group function to group the detail by state, and get the list of turnover of each state. Retrieve the top 10 states with the name top.
Leverage the previous computation and use the align function again to align the detail by top. The result, detailOfTop by name, is composed of the top 10 states listed in order as well as the turnover data of each product in each city.
detalOfTop is a result set that is easy to render. Just return the detalOfTop to Java reporting tool via JDBC.
The reporting tool will output the result in the required format.

Leave ur comments below!

menu