January 27, 2015

esProc Helps Process Heterogeneous Data Sources in Java - MongoDB

It is convenient to establish and close a connection to MongoDB in esProc , as well as to call the database to query and count the data, perform distinct and aggregate operations. 

1. Preparation for connecting to MongoDB

The MongoDB Java driver (like mongo-java-driver-2.12.2.jar), which esProc does’nt provide, should be put into [esProc installation directory]\common\jdbc beforehand when esProc designer is used to access the database. The MongoDB Java driver’s download link is https://github.com/mongodb/mongo-java-driver/releases.

To access MongoDB by calling the esProc script through esProc JDBC in a Java program, the above MongoDB Java driver needs to be put into the classpath of the Java program. 

2. Create and close a connection to MongoDB

In an esProc script, mongodb(con) function is used to establish a connection to the MongoDB database and mdb.close() function is used to close it, as shown by the following code:

A1: Connect to MongoDB. IP and port are localhost:27017; database name is test, and so are the user name and the password.
A2: Close the MongoDB connection. 

3. Common data query and selection

Let’s take the employee, a collection in the database whose name is test, of MongoDB as an example to illustrate the esProc script for accessing MongoDB. Some of the data of employee are as follows:
{ "_id" : ObjectId("546aea22ee62ea452bdef4fe"), "EID" : 1, "NAME" : "Rebecca", "
SURNAME" : "Moore", "GENDER" : "F", "STATE" : "California", "BIRTHDAY" : "1974-1
1-20", "HIREDATE" : "2005-03-11", "DEPT" : "R&D", "SALARY" : 7000 }
{ "_id" : ObjectId("546aea22ee62ea452bdef4ff"), "EID" : 2, "NAME" : "Ashley", "S
URNAME" : "Wilson", "GENDER" : "F", "STATE" : "New York", "BIRTHDAY" : "1980-07-
19", "HIREDATE" : "2008-03-16", "DEPT" : "Finance", "SALARY" : 11000 }
{ "_id" : ObjectId("546aea22ee62ea452bdef500"), "EID" : 3, "NAME" : "Rachel", "S
URNAME" : "Johnson", "GENDER" : "F", "STATE" : "New Mexico", "BIRTHDAY" : "1970-
12-17", "HIREDATE" : "2010-12-01", "DEPT" : "Sales", "SALARY" : 9000 }

The following is a sample of esProc script for querying data of the collection of MongoDB: 

A2: Filter and query the data of employee. find function’s first parameter is employee, the collection’s name; its second parameter is the filtering criterion: STATE=” California” and SALARY >4,000; the third one specifies which properties should be fetched (here GENDER won’t be fetched). Noticed that esProc uses the same parameter format as that used in MongoDB’s find statement.

The function returns a cursor, which can fetch data in batch for computing when the result set is big, so as to avoid the memory overflow. The usage of cursor has been explained in detail in esProc-related documents. Here suppose the data are not big and use fetch function to fetch all the results for observation: 

4.Query and filter nested arrays and documents

Nested arrays and docements are multilevel data structure specific to MongoDB. The Java application doesn’t support this multilevel data structure when it accesses MongoDB through JDBC. Because esProc supports referencing data by fields, like referencing a record sequence by a field, it can fetch the nested arrays and documents of MongoDB.

For a nested array, let’s look at a collection – fruitshop:
{ "_id" : ObjectId("548a58c19cfbc9dd702eb71e"), "name" : "big fruit", "fruits" :
 [ "apple", "pear", "orange" ] }
{ "_id" : ObjectId("548a58c19cfbc9dd702eb71f"), "name" : "good fruit", "fruits"
: [ "banana", "pear", "orange" ] }
{ "_id" : ObjectId("548a58c29cfbc9dd702eb720"), "name" : "my fruit", "fruits" :
[ "banana", "apple", "tomato" ] }

And for a nested docment, let’s look at this collection – blogs:
{ "_id" : ObjectId("548a59fa9cfbc9dd702eb721"), "content" : "It is too hot", "co
mment" : [ { "author" : "joe", "score" : 3, "comment" : "just so so!" }, { "auth
or" : "jimmy", "score" : 5, "comment" : "cool! good!" } ] }
{ "_id" : ObjectId("548a59fa9cfbc9dd702eb722"), "content" : "It is too cold", "c
omment" : [ { "author" : "james", "score" : 1, "comment" : "yes!" }, { "author"
: "jimmy", "score" : 5, "comment" : "cool!" } ] }
{ "_id" : ObjectId("548a59fb9cfbc9dd702eb723"), "content" : "It is windy day tod
ay", "comment" : [ { "author" : "tom", "score" : 3, "comment" : "I do not think
so!" }, { "author" : "jimmy", "score" : 5, "comment" : "cool!" } ] }

esProc script for querying and filtering data of the two collections: 

A2: The filtering criterion of find function includes the pear insead of being equal to pear, for the fruits is an array. The return result of find funciton is a cursor, from which the resulting table sequence is fetched using fetch function and you can see fruits field is a referencing field, which references a sequence consisiting of various fruits names, as shown below: 

Click the blue part of the first row to see the detailed information of the referenced sequence, as shown below: 

Based on this table sequence with a referencing field, esProc can further perform more complicated set operations.

A3: comment is a collection of documents, and the filtering criterion is the comment’s documents whose author is jimmy. find function returns a cursor, from which the table sequence is fetched using fetch function. comment field of the table sequence is a referencing field that references a table sequence, as shown below: 

Click the blue part of the second row to see the detailed information of the table sequence, as shown below: 

Based on this table sequence with a referencing field, esProc can further perform more complicated set operations. Here further discussion is omitted. 

5. Count the data, perfor distinct and aggregate operations

With more powerful ability in computing and flow control, usually esProc is used to perform the complicated multi-step operation and MongoDB, a highly-efficient storage solution, is responsible for providing data without participating in the computation. But considering the performance loss during the data transmission, you can use MongoDB alone to perform some simple aggregate operations.

esProc provides the way of calling MongoDB interface to count the data, perform distinct and aggregate operations. Different from the find function for querying and filtering data, the return results of the three operations are table sequences instead of cursors. A sample code is as follows: 

A2: Count the number of eligible files in employee, a collection, using count function. The function’s first parameter is employee, the collection’s name; its second parameter is the filtering criterion: STATE=” California”. The result is count value: 

A3: Get the distinct values from employee using distinct function. The function’s first parameter is employee, the collection’s name, and the second parameter is one of the properties, DEPT. The return result is a table sequence consisting of employee’s DEPT without duplicate members: 
A4: Group and summarize employee. The first parameter of aggregate function is employee, the collection’s name, and the second one includes the grouping property and aggregate property, which means the data of the collection will be grouped by DEPT and then count the number of files in each group. aggregate function returns a cursor as well, whose data will be fetched all at once using fetch function. 

A5: Group and summarize employee, which is the same as the above. But another aggregate property is added to the second parameter of aggregate function, which means grouping the data of the set by DEPT and computing the average value of SALARY in each group. 


January 21, 2015

esProc Helps Process Heterogeneous Data Sources in Java –Cross-Database Relating

JoinRowSet and FilteredRowSet provided by RowSet– Java’s class library for data computing – can perform cross-database related computing, but they have a lot of weaknesses. First, JoinRowSet only supports inner join, it doesn’t support outer join. Second, test shows that db2, mysql and hsql can work with JoinRowSet, yet the result set of join oracle11g to other databases is empty though no error reporting will appear. The fact is there were two users who perform cross-database join using oracle11g database even got the correct result. This suggests that JDBC produced by different database providers will probably affect the result obtained by using this method. Last, the code is complicated.

esProc has proved its ability in assisting Java to perform cross-database relating. It can work with various databases, such as oracle, db2, mysql, sqlserver, sybase and postgresql, to perform a variety of cross-database related computing, like inner join and outer join involving heterogeneous data. An example will teach you the way esProc works. Requirement: relate table sales in db2 to table employee in mysql through sale.sellerid and employee.eid, and then filter data in both sales and employee according to the criterion state=”California”. The way the code is written in this task applies to situations where other types of databases are involved.


The structure and data of table sales are as follows: 

The structure and data of table employee are as follows:

Implementation approach: Call esProc script using Java program, join the multiple databases together to realize the cross-database relating, perform filtering and return the result to Java in the form of ResultSet.

The code written in esProc is as follows:

A1: Connect to the data source db2 configured in advance.
A2: Connect to the data source mysql configured in advance. In fact oracle and other types of databases can be used too.
A3, A4: Retrieve table sequences: sales and employee, from db2 and mysql respectively. esProc’s Integration Development Environment (IDE) can display the retrieved data visually, as shown in the right part of the figure in the above.
A5: Relate sales to employee through sellerid=eid using esProc’s object reference mechanism.
A6: Filter the two table sequences according to state="California".
A7: Generate a new table sequence and get the desired fields.
A8,A9: Close the connection
A10: Return the result to the caller of esProc program.

This piece of program is called in Java using esProc JDBC to get the result. The code is as follows (save the above esProc program as test.dfx):
//create a connection using esProcjdbc
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
// call esProc program (the stored procedure) in which test is the name of file dfx
com.esproc.jdbc.InternalCStatementst;
st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test()");
// execute esProc stored procedure
st.execute();
// get the result set
ResultSet set = st.getResultSet();

January 14, 2015

esProc Helps with Computation in MongoDB – Subquery

MongoDB doesn’t support the complex subquery which can only be realized by retrieving the data out first and then performing further computation. The operation is the same complex even if Java or other programming languages are used to write the program. In view of this, we can consider using esProc to help MongoDB with the computation, the method of which will be illustrated in detail through an example.

In MongoDB, there are two docments: orders, which stores the orders data, and employee, which stores the employee data. They are as follows:
> db.orders.find();
{ "_id" : ObjectId("5434f88dd00ab5276493e270"), "ORDERID" : 1, "CLIENT" : "UJRNP
", "SELLERID" : 17, "AMOUNT" : 392, "ORDERDATE" : "2008/11/2 15:28" }
{ "_id" : ObjectId("5434f88dd00ab5276493e271"), "ORDERID" : 2, "CLIENT" : "SJCH"
, "SELLERID" : 6, "AMOUNT" : 4802, "ORDERDATE" : "2008/11/9 15:28" }
{ "_id" : ObjectId("5434f88dd00ab5276493e272"), "ORDERID" : 3, "CLIENT" : "UJRNP
", "SELLERID" : 16, "AMOUNT" : 13500, "ORDERDATE" : "2008/11/5 15:28" }
{ "_id" : ObjectId("5434f88dd00ab5276493e273"), "ORDERID" : 4, "CLIENT" : "PWQ",
 "SELLERID" : 9, "AMOUNT" : 26100, "ORDERDATE" : "2008/11/8 15:28" }
> db.employee.find();
{ "_id" : ObjectId("5437413513bdf2a4048f3480"), "EID" : 1, "NAME" : "Rebecca", "
SURNAME" : "Moore", "GENDER" : "F", "STATE" : "California", "BIRTHDAY" : "1974-1
1-20", "HIREDATE" : "2005-03-11", "DEPT" : "R&D", "SALARY" : 7000 }
{ "_id" : ObjectId("5437413513bdf2a4048f3481"), "EID" : 2, "NAME" : "Ashley", "S
URNAME" : "Wilson", "GENDER" : "F", "STATE" : "New York", "BIRTHDAY" : "1980-07-
19", "HIREDATE" : "2008-03-16", "DEPT" : "Finance", "SALARY" : 11000 }
{ "_id" : ObjectId("5437413513bdf2a4048f3482"), "EID" : 3, "NAME" : "Rachel", "S
URNAME" : "Johnson", "GENDER" : "F", "STATE" : "New Mexico", "BIRTHDAY" : "1970-
12-17", "HIREDATE" : "2010-12-01", "DEPT" : "Sales", "SALARY" : 9000 }
It is required to select the orders data in which SELLERID is the EID of records of employees where STATE is California in employee. This can be expressed in SQL like this:
Select * from orders where orders.sellerid in (select eid from employee where employee.state=’ California’)

The data of orders cannot be loaded entirely because their size is big, while the data size of both employee and the final result are not big.


The esProc script used to help MongoDB realize this subquery is as follows:

A1: Connect to MongoDB. localhost:27017 is the IP and the port number. test is the database name, user name as well as the password.

A2: Use find function to fetch the data from MongoDB and create a cursor. orders is the collection, with a filtering condition being null and _id , the specified key, not being fetched. Here esProc uses the same parameter format in find function as that in find statement of MongoDB. As esProc’s cursor supports fetching and processing data in batches, the memory overflow caused by importing big data all at once can thus be avoided.

A3: Fetch the desired data from employee according to the condition that STATE is California.

A4: Sort the EID field of A3’s employee data.

A5: Select the desired data from orders according to the condition requiring that SELLERID must exist in the table sequence in A4. That is, the condition can be expressed as SELLERID in A4. The resulting data can be loaded all at once. But if the data are too big, they can be fetched in batches. To fetch the top 1,000 rows, for example, can be expressed as fetch(1000).

The result is as follows:

NoteesProc isn’t equipped with a Java driver included in MongoDB. So to access MongoDB using esProc, you must put MongoDB’s Java driver (a version of 2.12.2 or above is required for esProc, e.g. mongo-java-driver-2.12.2.jar) into [esProc installation directory]\common\jdbc beforehand.

The esProc script used to help MongoDB with the computation is easy to be integrated into the Java program. You just need to add another line of code - result A6 to output a result in the form of resultset to Java program. For the detailed code, please refer to esProc Tutorial. In the same way, MongoDB’s Java driver must be put into the classpath of a Java program before the latter accesses MongoDB by calling an esProc program.

January 7, 2015

esProc Helps with Computation in MongoDB– Query Indexes in an Array

MongoDB can find out elements of a built-in array according to their indexes, but cannot find the indexes through the values of the elements. For example, the elements of an array are names of people stored according to their rankings. In MongoDB, names can be found according to the rankings (indexes of the array), but the values of rankings cannot be determined through names. esProc can help MongoDB in realizing this operation. The following example will teach you how it works in detail.

b–a collection of MongoDB, stores a name and an array of friends. The names in the array of friends are stored in the order of rankings, as shown below:
>db.b.find({"name":"jim"})
{ "_id" : ObjectId("544f3bf8cdb02668db9ab229"), "name" : "jim", "friends" : [ "t
om", "jack", "luke", "rose", "james", "sam", "peter" ] }

In MongoDB, you can find names through specified rankings. For example, find out the name of the first ranking among Jim’s friends:
>db.b.find({"name":"jim"},{"friends":{"$slice":[0,1]}})
{ "_id" : ObjectId("544f3bf8cdb02668db9ab229"), "name" : "jim", "friends" : [ "t
om" ] }


But you cannot find outthe ranking of “luke”,one of Jim’s friends. To solve the problem, esProc has its own script: 


A1: Connect to the MongoDB database. The IP and port number is localhost:27017, the database name is test and both the user name and the password are test. If any other parameters are needed, write them in line with the format mongo://ip:port/db?arg=value&…

A2: Fetch data from the MongoDB database using find function to create a cursor.The collection isb. The filtering criterion is name=jim and the specified keys are name and friends. It can be seen that this find function is similar to the find function of MongoDB. By fetching and processing data in batches, the esProc cursor can avoid the memory overflow resulting from big data importing.

A3: Since the data are not big, fetch function will fetch them all at once.

A4: Find out the position of luke using pos function.

After the code is executed, the result is as follows:

Please note that esProc hasn’t the java driver of MongoDB. To access MongoDB with esProc, the latter (a driver of 2.12.2 version or above is required, i.e. mongo-java-driver-2.12.2.jar) should be put into the [esProc installation directory]\common\jdbc beforehand.

The script for computation in MongoDB with the assistance of esProc is easy to integrate with Java program. By adding another line of code – A5, which is result A4, the result in the form of resultset can be output to Java program. For detailed code, please refer to esProc Tutorial. In the same way, to access MongoDB by calling esProc code with Java program also requires putting the java driver of MongoDB into the classpath of Java program.

January 5, 2015

esProc Helps with Computation in MongoDB – Digital Comparison

MongoDB script has limited computational ability in realizing complicated operations, so it is difficult to solve problems of this kind using it alone. In many cases, you can only perform further computations after retrieving the desired data out. And there is no less difficulty in trying to realizing this kind of set operations with high-level programming languages like Java. In this case, you can use esProc to help with the computation in MongoDB. An example will be provided for explaining how esProc works.

There is a collection – test – in MongoDB, as shown below:
> db.test.find({},{"_id":0})
{ "value" : NumberLong(112937552) }
{ "value" : NumberLong(715634640) }
{ "value" : NumberLong(487229712) }
{ "value" : NumberLong(79198330) }
{ "value" : NumberLong(440998943) }
{ "value" : NumberLong(93148782) }
{ "value" : NumberLong(553008873) }
{ "value" : NumberLong(336369168) }
{ "value" : NumberLong(369669461) }
Specifically, test includes multiple values, each of which is a digital string. It is required that each digital string be compared with all the other digital strings and find the biggest same digit and the biggest different digit in each digital string. If the number 1 exists both in the first row and in the nth row, their same digit will be counted as one. If the number exists only in the first row, and there is no such a number in the nth row, you can count one different digit.

esProc code:
A1: Connect to MongoDB. Both IP and the port number is localhost:27017. The database name, user name and the password all are test.

A2: find function is used to fetch data from MongoDB and create a cursor. orders is the collection, the filtering condition is null and _id , the specified key, won’t be fetched. It can be seen that esProc uses the same parameter format in find function as that in find statement in MongoDB. esProc’s cursor supports fetching and processing data in batches, thereby avoiding the memory overflow caused by importing big data at once. As the data size is not big, fetch function is used to get the records altogether from the cursor.

A3: Add two new columns to A2 for storing the biggest same and different numbers. And, at the same time, convert values into strings.

A4: Perform loop on the collection in A3, the loop body covers an area of B4-D10.

B4: Get the value on the current loop.

C4: Use array@s to split the column value into a sequence consisting of single characters and remove the duplicate values.

B5: Perform an inner loop on the collection in A3. The loop body is C6-D10.

C5: If the loop position of the inner loop is the same as the current one in the outer loop, that is, they hold the same value, skip the current inner loop and move on to the next.

C6: Get the value on the current inner loop.

C7: Define two variables - same and diff – for storing the same numbers and different numbers respectively got through the current comparison. The initial value is defined as zero.

C8: loop function is used to examine one by one in the inner loop the numerical values of the sequence formed by splitting values in the outer loop. If a same value is caught, the value of same will increase by one; otherwise the value of diff will increase by one.

C9, C10: Compare same and diff with those in A4, and reassign the bigger values to the same and diff in A4.

The final result after the code is executed is as follows:
Note: esProc isn’t equipped with a Java driver included in MongoDB. So to access MongoDB using esProc, you must put MongoDB’s Java driver (a version of 2.12.2 or above is required for esProc, e.g. mongo-java-driver-2.12.2.jar) into [esProc installation directory]\common\jdbc beforehand.


The esProc script used to help MongoDB in the computation is easy to be integrated into the Java program. You just need to add another line of code – A11 – that is, result A3, for outputting a result in the form of resultset to Java program. For the detailed code, please refer to esProc Tutorial. In the same way, MongoDB’s Java driver must be put into the classpath of a Java program before the latter accesses MongoDB by calling an esProc program.