August 17, 2015

esProc Assists Report Development – JOIN Operation in MongoDB

Heterogeneous datasources are frequently needed in report development, but they are difficult to realize using the reporting tool, like JasperReport, alone. To present the result of joining two MongoDB collections together, for instance. Though JasperReport has the functions of virtual data source or table join, but they are offered only in the commercial or higher versions and probably won’t appear in free version. Moreover, these two functions support connection to only two datasources. If users need more, the program development becomes complicated. Additionally, they don’t support subsequent structured data computing of the joined data as SQL does, due to providing only a graphical interface.

esProc has a powerful structured data computing engine, supports heterogeneous datasources and is easy to be integrated, thus it is useful for assisting the reporting tool to realize MongoDB join conveniently. The following example will show you how esProc works to create a join in MongoDB.

There are two collections – sales and emp – in MongoDB. Logically, sale’s SellerId field is equivalent to a foreign key that points to emp’s EId field. The task is to query orders in sales by the time range, create a left join with emp and present the result in the report. Some of the source data are as follows:

Collection sales

Collection emp 

esProc script for completing the task: 

A1=MongoDB(“mongo://localhost:27017/test?user=root&password=sa”)

This line of code establishes the connection to MongoDB. User name and password can be specified through the two parameters user and password.

As connecting to an ordinary database, esProc also supports connecting to MongoDB through JDBC. But since the third-party JDBC is not free, and has a weaker function than the official library function – for example, it cannot retrieve the multilayer data, esProc will directly encapsulate the native methods, thereby retaining the function and syntax of MongoDB. find function, for example, can be used in this case.

A2=A1.find(“sales”,”{‘$and’:[{'OrderDate':{'$gte':'"+string(begin)+"'}},{'OrderDate':{'$lte':'"+string(end)+"'}}]}”,”{_id:0}”).fetch()

This line of code finds records during a certain time period from MongoDB’s collection salesfind function’s first parameter is the collection’s name, its second parameter is the query condition that is defined according to MongoDB syntax and the third one specifies the field to be returned. Note that begin and end in the query condition are external parameters passed from the report, respectively representing the beginning date and the ending date of OrderDate.

find function will return a cursor, which means it doesn’t retrieve data into the memory entirely and thus supports processing big data. Data of the result cursor can be further processed with functions like skipsort and conj and data retrieval won’t start until either fetch function or groups function, or for statement appears. fetch() function is used in this example to fetch the data into the memory. 

Suppose the time period is from 2009-01-01 to 2009-12-31, result of A2 is as follows:


A3=A1.find(“emp”,,”{_id:0}”).fetch()

This line of code retrieves all data, except for _id field, from collection emp unconditionally. Result is as follows:


A4=A1.close()

This line of code closes the connection to MongoDB established in A1.

A5=join@1(A2:sales,SellerId;A3:emp,EId)

This line of code creates a left join between A2 and A3. The fields to join them together are A2’s SellerId and A3’s EId. Intuitively, two parts of the joined data are respectively named sales and empjoin function is used to perform the join operation, in which @1 option means left join. The result can be seen in the left part of the following figure:


It can be seen that some of the SellerId in sales cannot find corresponding records in emp because of the left join. We can use @f to perform a full join; without any option, the function will perform an inner join.

A6=A5.new(sales.OrderID:OrderID,sales.Client:Client,sales.Amount:Amount,sales.OrderDate:OrderDate,emp.Name:Name,emp.Dept:Dept,emp.Gender:Gender)

A5 joins the data together. A6 gets the fields we want from the result of joining and creates a two-dimensional table using new function. For example, sales.OrderID:OrderID means getting sales.OrderID field from A5 and rename it OrderID (because reporting tools cannot identify field names like sales.OrderID). Result is as follows:


Now all data are ready for creating the report. The final step is to return A6’s two-dimensional table to the reporting tool with result A6. esProc offers JDBC interface to be integrated with the reporting tool, and the latter will identify it as the ordinary database. Please refer to related documents for the integration solution.

Then design the report using JasperReport, for instance. The appearance and layout is as follows:


Define two parameters – Pbegin and Pend – in the report according to the corresponding esProc parameters. Click Preview to see the report:


The way the reporting tool calls the esProc script is the same as that it calls the stored procedure. Save the esProc script as, say, mongodbJoin.dfx to be called by mongodbJoin $P{pbegin},$P{pend} in JasperReport’s SQL designer. 

No comments:

Post a Comment