July 16, 2015

Merge MongoDB Documents in esProc


Below is a selection of Collection C1

{
       "_id" : ObjectId("55014006e4b0333c9531043e"),
       "acls" : {
              "append" : {
                     "users" : [ObjectId("54f5bfb0336a15084785c393") ],
                     "groups" : [ ]
              },
              "edit" : {
                     "groups" : [ ],
                     "users" : [
                            ObjectId("54f5bfb0336a15084785c392")
                     ]
              },
              "fullControl" : {
                     "users" : [ ],
                     "groups" : [ ]
              },
              "read" : {
                     "users" : [ ObjectId("54f5bfb0336a15084785c392"), ObjectId("54f5bfb0336a15084785c398")],
                     "groups" : [ ]
              }
       },
        name: "ABC"
}

{
       "_id" : ObjectId("55014006e4b0333c9531043f"),
       "acls" : {
              "append" : {
                     "users" : [ObjectId("54f5bfb0336a15084785c365") ],
                     "groups" : [ ]
              },
              "edit" : {
                     "groups" : [ ],
                     "users" : [
                            ObjectId("54f5bfb0336a15084785c392")
                     ]
              },
              "fullControl" : {
                     "users" : [ ],
                     "groups" : [ ]
              },
              "read" : {
                     "users" : [ ObjectId("54f5bfb0336a15084785c392"), ObjectId("54f5bfb0336a15084785c370")],
                     "groups" : [ ]
              }
       },
        name: "ABC"
}


You need to group the collection by name. Each group contains the users field of the document corresponding to a same name and does not allow duplicate members. The expected result may like this:

{
  result : [
     {
          _id: "ABC",
          readUsers : [
                  ObjectId("54f5bfb0336a15084785c393"),
                  ObjectId("54f5bfb0336a15084785c392"),
                  ObjectId("54f5bfb0336a15084785c398"),
                  ObjectId("54f5bfb0336a15084785c365"),
                  ObjectId("54f5bfb0336a15084785c370")
           ]
      }
  ]
}


esProc code

A1: Connect to MongoDB. The connection string format is mongo://ip:port/db?arg=value&…

A2: Use find function to retrieve data from MongoDB, sort it and create a cursor. c1 is the collection name; no filtering criterion is specified; and all fields except _id will be retrieved and sorted by name. In esProc find function, which is analogous to the combination of MongoDB findsort and limit function, the filtering criterion syntax follows the MongoDB rules.

A3: Fetch data from the cursor by loop, getting a group of documents with the same name field each time. A3’s working range is the indented B3 to B5, where A3 can be used to reference the loop variable.

B3: Retrieve all users fields from the current group of documents, as shown below:

B4: Merge users fields from all documents of the current group and remove duplicate members.

B5: Append each result of B4’s loop to B2. Finally B2 becomes this: 

B2 is the final result we want. If the result is too big to be loaded into the memory, you can use export@j function in B5 to convert each of B4’s results to a JSON string and then append them to the text file one by one.

A6: Disconnect from MongoDB.