July 22, 2015

esProc Integrates HeterogeneousData Sources for Report Development

In addition to conventional databases, data sources of a reporting tool could also involve JSON files, MongoDB, txt files, Excel and HDFS files. Normally reporting tools can handle a single data source, but they are unable to manage various data sources requiringconsolidation. Even though the data sources are of the same type, you still need to write a lot of code for the report developmentif they come from a database without effective computability. 

However, esProc (free edition is available)can solve both problems. It offers a large number of functions for manipulating (semi)structured data, and supports heterogeneous data sources with the ability of integrating them. Besides, esProc provides a simple and easy-to-use JDBC interface, through which a reporting tool will call an esProc script as a database stored procedure, pass parameters to it, execute it and get the result set.



Below is the structure of integration of an esProc script and a reporting tool:


This is an example of how esProcimplementsqueryinga multi-level subdocument in a JSON file for creating a report:

jsonstr.json has a subdocument, runners field, which has three fields - horseId, ownerColours and trainer– in which trainer contains a subfield -trainerId. The report needs to present the horseId, ownerColours and trainerId field for each subdocument within runners filed according to its serial number.

The source data:
[
    {
        "race": {
            "raceId": "1.33.1141109.2",
            "startDate": "2014-11-09T13:15:00.000Z",
            "raceClassification": {
                "classification": "Novices'"
            },
            "raceType": {
                "key": "H"
            },
            "raceClass": 4,
            "course": {
                "courseId": "1.33"
            },
            "meetingId": "1.33.1141109"
        },
        "numberOfRunners": 2,
        "runners": [
            {
                "horseId": "1.00387464",
                "trainer": {
                    "trainerId": "1.00034060"
                },
                "ownerColours": "Maroon, pink sleeves, dark blue cap."
            },
            {
                "horseId": "1.00373620",
                "trainer": {
                    "trainerId": "1.00010997"
                },
                "ownerColours": "Black, emerald green cross of lorraine, striped sleeves."
            }
        ]
    },
……
]


esProc script:



A1:Read in the JSON file.

A2:Retrieve runners field according to the serial number of each of its subdocument. Here which is a report parameter. The result is like this:

A3:Get the desired fields to generate the result set the report needs. The result is as follows:


The reporting tool calls the esProc script via JDBC, in a same manner as it calls the stored procedure from a normal database. The syntax is this: call esProc script name (para1…paraN). The result returned from the script participates in report creation in the form of a normal data set. Details are covered in the following documents: esProc Integration & Application: Integration with JasperReport and esProc Integration & Application: Integration with BIRT.

As a professional tool for processing data sources of reports, esProccan be used to implement more scenarios, as shown by the following examples.

Create a grouped report from a multi-level JSON file

Cells.json is a multi-level nested JSON file, which you want to display with a grouped report. The grouping fields are name, type and image."xlink:href". There is also a field with 3 subdocuments: custom.Identifier, custom.Classifier and custom. Output, which are of the same structure but contain different number of documents each.

The source data:
{
    "cells": [
        {
            "name": "b",
            "type": "basic.Sensor",
            "custom": {
                "identifier": [
                    {
                        "name": "Name1",
                        "URI": "Value1"
                    },
                    {
                        "name": "Name4",
                        "URI": "Value4"
                    }
                ],
                "classifier": [
                    {
                        "name": "Name2",
                        "URI": "Value2"
                    }
                ],
                "output": [
                    {
                        "name": "Name3",
                        "URI": "Value3"
                    }
                ]
            },
            "image": {
                "width": 50,
                "height": 50,
                "xlink:href": "
HNCSVQICAgIfAhkiAAAAAlwSFlzAABEJAAARCQBQGfEVAAAABl0RVh0U29mdHdhcmUAd3Vi8f+k/EREURQtsda2Or/+nFLqP6T5Ecdi0aJFL85msz2Qxy
f4JIumMAx/ClmWt23GmL1kO54CXANAVH+WiN4Sx7EoNVkU3Z41BDHMeXAxjvOxNr7RJjzHX7S/jAflwBxkJr/RwiOpWZ883Nzd+Wpld7tkBr/SJr7ZHZb
HZeuVweSnPfniocMAWYwcGBafH0OoPamFGAaY4ZBZjmmFGAaY4ZBZjmmFGAaY4ZBZjmmFGAaY7/B94QnX08zxKLAAAAAElFTkSuQmCC"
            }
        },
……
    ]
}


esProc merges the three subdocuments into a single two-dimensional table, gives them a new field name ctype to be identified and joins them with the grouping fields. By doing so, a typical “table with subtables” will be created. esProc code is as follows:


A1: Import the JSON file. The relationships between different fields are shown below:

A2: Convert the multi-level nested JSON file to a simple two-dimensional table. The sign “|”means concatenation. new function creates a two-dimensional table based on the source data. conj function calculates based on each record of the source table and concatenates the results. A2’s resulting two-dimensional table is what you need to create the report, as shown below:


Then it’s easy for you to build a grouped report according to this esProc result.

Create a report with subreports using different JSON files

You want to create a report containing multiple subreports, where the main report and each subreport use different JSON files as their sources. Below is a selection of the source data:
MainReport.json
{"menu": [
         {
                   "id": "A1",
                   "value": "File",
                   "popup": "Yes"
    },
         {
                   "id": "A2",
                   "value": "Edit",
                   "popup": "No"
    }
  ]
}
SubReport1.json
{"menuitem": [
    {"value": "New", "onclick": "CreateNewDoc()"},
    {"value": "Open", "onclick": "OpenDoc()"},
    {"value": "Close", "onclick": "CloseDoc()"}
  ]
}
SubReport2.json
{"menuitem": [
    {"value": "Undo", "onclick": "onUndo()"},
    {"value": "Redo", "onclick": "onRedo()"},
    {"value": "Copy", "onclick": "onTextCopy()"},
         {"value": "Past", "onclick": "onTextPast()"}
  ]
}

A reporting tool with support only for a single data source, such as Jasper and BIRT, would combine the multiple sources into one using JAVA classes, while esProc would use a simple script as follows:

Read in the JSON file and get its first field, which is represented by “.#1”. By assigning different file names to the parameter argFileName, the report will receive different data sets, as the following shows:


Perform a join between MongoDBand MySQL

emp1 is a MongoDB collection, whose CityID field is the logical foreign key pointing to CItyID field of cities, a MySQL table that has two fields –CityID and CityName. You need to query employee records from emp1 according to specified time period and switch its CityID field to CityName of cities.

esProc script:


A1:Connect to MongoDB.

A2:Query emp1using MongoDB syntax by the specified time period. find function returns a cursor. @x option means closing the MongoDB connection automatically after the data is all fetched. The result would be like this:


A3:Execute SQL statement to query the MySQL database. Here is the result:



A4: Replace A2’s CityID field with the corresponding records in A3. switch function works as a left join does. To perform an inner join, use @i option. By performing field replacement using switch function, the key field linkingthe two tables can be accessedthrough the object. This object-type access is simple and intuitive, whose merits are especially obvious when performing a multi-level, multi-table join. Here is the result of switch:

A5:Retrieve the desired fields to generate a table as follows:


A7:By default the esProc script will return the last calculation cell (here is A5) to the reporting tool.

Perform joins between MongoDB collections

Both sales and emp are two-dimensional MongoDB collections. sales has SellerId field as its logical foreign key that points to emp’sEId field. You need to query orders in sales by the specified time period and associate with emp through a left join, and then present the result in a report.

esProc script:


A1,A4:Connect to/disconnect from MongoDB.

A2:Query the sales collection using MongoDB syntax and fetch the cursor data into memory using fetch function (as the data size is small). Here is the result:

A3:Retrieve data from the emp collection. Here is the result:

A5:Join the two collections together. join function performs the join operation. @1 means left join and @f means full join. Without any of the options, thisfunction performs an inner join. The result is as follows:

A6:Retrieve the fields of interest from the result of join to generate a new two-dimensional table, as shown below:


Join an Oracle table and an Excel file

Here are table1, which is stored in an Oracle database, and table2, an .xlsx file. Both have the same structure. Below are selections from them:

You need to group table1 and table2 respectively by name, count the number of members in each group, calculate the sum for each group by active field, and then present the resultsfrom the two tables in sequence. The expected report layout is as follows:

esProc script:

A1:Execute the SQL statement to group and aggregate data from table1. Here is the result:


A2:Import the Excel file and make the first row the column headers.

A3:Group and aggregate A2’s data. Here is the result:

A4:Perform a left join between A1 and A3. You’ll get the following result:

A5:Retrieve the fields you want from A4 and rename them. This is the result you’ll get:


Join a txt file and a JSON file

structure.txt is a structured text separated by tabs. json.txt contains unstructured JSON strings. There is a foreign key relationship between the second field of structure.txt and part of the text in json.txt. Below are selections from them:

structure.txt
Name1     BBBBBBBBBBBB     99.40        166 1        0       1       166 334 499 3e-82   302
Name2     DDDDDDDDDDDD 98.80        167 2        0       1       167 346 512 4e-81   298

Json.txt
[
    { "Cluster A": { "member": { "Cluster A": "BBBBBBBBBBBB This is Animal A" }, "name": "Cluster A" } },
    { "Cluster B": { "member": { "Cluster B": "DDDDDDDDDDDD This is Animal B" }, "name": "cluster B" } }
]

You need to create a report to present the above relationship. This is the expected report layout:
Name1   BBBBBBBBBBBB    99.40   166 1   0   1   166 334 499 3e-82    302 Cluster A This is Animal A
Name2   DDDDDDDDDDDD    98.80   167 2   0   1   167 346 512 4e-81    298 Cluster B This is Animal B

esProc script:

A1-A3: Read in the JSON file, get the desired data and append a calculated column. Here’s the result:


A4:Import the text file as a two-dimensional table. Note that esProc can import not only a local file, but a file stored on LANs orin the HDFS file system.

A5:A join operation. The result is as follows:


A6: Retrieve the desired fields to generate a table as follows

No comments:

Post a Comment