esProc, A Script Language for Data Analytics with Parallel Mechanism: cross database

January 21, 2015

esProc Helps Process Heterogeneous Data Sources in Java –Cross-Database Relating

JoinRowSet and FilteredRowSet provided by RowSet– Java’s class library for data computing – can perform cross-database related computing, but they have a lot of weaknesses. First, JoinRowSet only supports inner join, it doesn’t support outer join. Second, test shows that db2, mysql and hsql can work with JoinRowSet, yet the result set of join oracle11g to other databases is empty though no error reporting will appear. The fact is there were two users who perform cross-database join using oracle11g database even got the correct result. This suggests that JDBC produced by different database providers will probably affect the result obtained by using this method. Last, the code is complicated.

esProc has proved its ability in assisting Java to perform cross-database relating. It can work with various databases, such as oracle, db2, mysql, sqlserver, sybase and postgresql, to perform a variety of cross-database related computing, like inner join and outer join involving heterogeneous data. An example will teach you the way esProc works. Requirement: relate table sales in db2 to table employee in mysql through sale.sellerid and employee.eid, and then filter data in both sales and employee according to the criterion state=”California”. The way the code is written in this task applies to situations where other types of databases are involved.

The structure and data of table sales are as follows:

The structure and data of table employee are as follows:

Implementation approach: Call esProc script using Java program, join the multiple databases together to realize the cross-database relating, perform filtering and return the result to Java in the form of ResultSet.

The code written in esProc is as follows:

A1: Connect to the data source db2 configured in advance.

A2: Connect to the data source mysql configured in advance. In fact oracle and other types of databases can be used too.

A3, A4: Retrieve table sequences: sales and employee, from db2 and mysql respectively. esProc’s Integration Development Environment (IDE) can display the retrieved data visually, as shown in the right part of the figure in the above.

A5: Relate sales to employee through sellerid=eid using esProc’s object reference mechanism.

A6: Filter the two table sequences according to state="California".

A7: Generate a new table sequence and get the desired fields.

A8,A9: Close the connection

A10: Return the result to the caller of esProc program.

This piece of program is called in Java using esProc JDBC to get the result. The code is as follows (save the above esProc program as test.dfx):

//create a connection using esProcjdbc

Class.forName("com.esproc.jdbc.InternalDriver");

con= DriverManager.getConnection("jdbc:esproc:local://");

// call esProc program (the stored procedure) in which test is the name of file dfx

com.esproc.jdbc.InternalCStatementst;

st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test()");

// execute esProc stored procedure

st.execute();

// get the result set

ResultSet set = st.getResultSet();

May 11, 2014

Cross Database Computing in Report Data Source integration

Recently a development team met some difficulties in data source computation when developing iReport reports. After the use of esProc for cross database computation, the problem is resolved.

This is a project payment progress report, as part of a project management system. It’s based on an Oracle database. The reports needs to present the project name, payment amount, contract value, payment progress (in percentage), and name of the project manager, etc., for all ongoing projects. Amount these the first 3 items contributes to the difficulties met by the development team.

Note that the project payment record is stored in accounting system, rather than the Oracle database used by the development team. The accounting system is built on MSSQL database. The contract value information is also stored in a separate MSSQL database, used by contract management system. After analysis to the database, the project information record has an one-to-many relationship with payment record, and an one-to-one relationship with contract information.

By simplifying the table and fields other than the hardest part of the problem, the issue can then be described as: we need to join table project，payProcess and contract from three heterogeneous databases.

Main fields from project include: projectNo, projectName.

Main fields from payProcess include: payID, projectNo, amount.

Main fields from contract include: contractNo, projectNo, conAmount.

It’s fairly easy to write the SQL statements if the three tables are in the same database. However they now belong to three heterogeneous databases, the join is then difficult. Meanwhile, the two MSSQL databases are used by two independent commercial software, which the development team can load data from, but not able to modify or administrate. This adds to the difficulty of the development. Although iReport support simple cross database computation with two tables, it’s very difficult to do this with three tables. The development team can not leverage iReport to meet this requirement.

The use of new reporting tool that supports multiple data sources might help. However as iReport is the only one used in the development process so far, a sudden change is not realistic. iReport supports user-defined JavaBean as data source, this can also be used to handle cross database computation. But JAVA is not a language for computation. The “join” computation between multiple sets requires a lot of coding. The workload makes it a non-sense. ETL tool is also a possible way to consolidate tables from multiple databases into one. The ETL approach normally requires incremental update to databases, which means timestamp and triggers need to be used in payProcess and contract table. As we know, commercial software does not allow databases to be modified in this way. ETL cannot provide the real time updates required for payment progress monitoring, which means it’s not suitable for this report.

Under such condition, the use of esProc for cross database computation is a handy approach. Here are the codes：

A1, A2, A3 are for loading data from different databases, which are native syntax for each database. Note the difference in SQL for each heterogeneous database.

A4 is an association statement. From here on the syntax has nothing to do with certain database. The “join” function associate A1, A2, A3 according to projectID. Note that this is inner join. If it’s left join, the syntax of “join@1” should be used, whereas for full join, the syntax is “join@f”.

A5 is to output the required fields from the join, which could be part or all fields from A1, A2 and A3.

The statement of result A5 means to output the computation result by JDBC, so that iReport could rerieve them directly through JDBC. It’s also possible for esProc to be used in JAVA codes directly by means of JDBC call.

The above codes are just a prototype, which can actually run successfully. In reality more table association and data fields need to be added, plus some and parameter filters.

As we could see, the solving of cross database computation issue with esProc is fairly easy. esProc supports JAVA well, and is handy to be used with iReport.

menu

January 21, 2015

esProc Helps Process Heterogeneous Data Sources in Java –Cross-Database Relating

May 11, 2014

Cross Database Computing in Report Data Source integration