December 22, 2014

esProc Helps with Computation in MongoDB – Relationships between Tables

MongoDB doesn't support join. The unity JDBC recommended by its official website can perform join operation after retrieving data out, but charges a fee for the service. Other free JDBC drivers can only support the basic SQL statements, without join included. If you use programming languages, like Java, to retrieve data first and proceed to perform join operation, the process will still be complicated.

However, the join operation can be realized by using esProc, which is free of charge, to help MongoDB with the computation. Here is an example for illustrating the method in detail.

In MongoDB, there is a document - orders – that holds the order data, and another file – employee – for storing employee data, as shown in the following:
MongoDB shell version: 2.6.4
connecting to: test
> db.orders.find();
{ "_id" : ObjectId("5434f88dd00ab5276493e270"), "ORDERID" : 1, "CLIENT" : "UJRNP
", "SELLERID" : 17, "AMOUNT" : 392, "ORDERDATE" : "2008/11/2 15:28" }
{ "_id" : ObjectId("5434f88dd00ab5276493e271"), "ORDERID" : 2, "CLIENT" : "SJCH"
, "SELLERID" : 6, "AMOUNT" : 4802, "ORDERDATE" : "2008/11/9 15:28" }
{ "_id" : ObjectId("5434f88dd00ab5276493e272"), "ORDERID" : 3, "CLIENT" : "UJRNP
", "SELLERID" : 16, "AMOUNT" : 13500, "ORDERDATE" : "2008/11/5 15:28" }
{ "_id" : ObjectId("5434f88dd00ab5276493e273"), "ORDERID" : 4, "CLIENT" : "PWQ",
 "SELLERID" : 9, "AMOUNT" : 26100, "ORDERDATE" : "2008/11/8 15:28" }
> db.employee.find();
{ "_id" : ObjectId("5437413513bdf2a4048f3480"), "EID" : 1, "NAME" : "Rebecca", "
SURNAME" : "Moore", "GENDER" : "F", "STATE" : "California", "BIRTHDAY" : "1974-1
1-20", "HIREDATE" : "2005-03-11", "DEPT" : "R&D", "SALARY" : 7000 }
{ "_id" : ObjectId("5437413513bdf2a4048f3481"), "EID" : 2, "NAME" : "Ashley", "S
URNAME" : "Wilson", "GENDER" : "F", "STATE" : "New York", "BIRTHDAY" : "1980-07-
19", "HIREDATE" : "2008-03-16", "DEPT" : "Finance", "SALARY" : 11000 }
{ "_id" : ObjectId("5437413513bdf2a4048f3482"), "EID" : 3, "NAME" : "Rachel", "S
URNAME" : "Johnson", "GENDER" : "F", "STATE" : "New Mexico", "BIRTHDAY" : "1970-
12-17", "HIREDATE" : "2010-12-01", "DEPT" : "Sales", "SALARY" : 9000 }
The SELLERID in orders corresponds to EID in employee. Please select from employee all order information in which the state is California. orders holds big data and thus cannot be loaded into the memory entirely. But employee has fewer data and so has the filtering result of orders.


The conditional expression for selecting data can be passed to the esProc program as a parameter, as shown in the following figure:

esProc code:

A1: Connect to MongoDB. Both IP and the port number are localhost:27017. The database name, user name and the password all are test.

A2: find function is used to fetch data from MongoDB and create a cursor. orders is the collection, with a filtering condition being null and _id , the specified key, not being fetched. esProc uses the same parameter format in find function as that in find statement of MongoDB. esProc's cursor supports fetching and processing data in batches, thereby avoiding the memory overflow caused by importing big data at once.

A3: Fetch data from employee. Because the data size to be fetched is not big, you can use fetch function to get them all at once.

A4: switch function is used to convert values of SELLERID field in A2’s cursor into the record references in A3’s employee

A5: Select the desired data according to the condition. Here a macro is used to dynamically parse an expression, in which where is the input parameter. In esProc, the expression surrounded by ${…} will be first computed, the result be taken as the macro string value to replace ${…} , and then the code be interpreted and executed. Therefore the actual code executed in this step is =A4.select(SELLERID.STATE=="California"). Since SELLERID has been converted into the references of corresponding records in employee, you can write the code in such a way as SELLERID.STATE. As the data size of the filtering result is not big, you can fetch the data all at once. But if the data size is still rather big, you can fetch the data in batches, say, 10,000 rows per batch, which can be expressed as fetch(10000).

A6Reconvert the values of SELLERID in the filtering result into ordinary ones. 

The result of A6 is as follows:

When the filtering condition is changed, you need not change the whole program, but modifying the parameter where. For example, the condition is changed to "orders information in which state is California or CLIENT is PWQ", then the value of where can be expressed as CLIENT=="PWQ"|| SELLERID.STATE=="California".

The thing is that esProc isn't equipped with a Java driver included in MongoDB. So to access MongoDB using esProc, you must put MongoDB's Java driver (a version of 2.12.2 or above is required for esProc, e.g. mongo-java-driver-2.12.2.jar) into [esProc installation directory]\common\jdbc beforehand.

The script written in esProc which is used to help MongoDB with the computation is easy to be integrated into the Java program. You just need to add another line of code - A7 – that is, result A6, for outputting a result in the form of resultset to Java program. For the detailed code, please refer to esProc Tutorial. In the same way, MongoDB's Java driver must be put into the classpath of a Java program before the latter accesses MongoDB by calling an esProc program.  

No comments:

Post a Comment