July 28, 2015

esProc Assists BIRT in Handling Irregular Month Grouping

Problem source:
 
Irregular month grouping: If the start date is 2014-01-10, group dates from this date to 2014-02-09 together and dates from 2014-02-10 to 2014-03-9 together. If the start date is 2014-01-31, put dates from this date to 2014-02-27 into a group and take dates from 2014-02-28 to 2014-03-30 as a group.

esPro code:

A1: Query database according to the start date and end date. Both startDate and endDate are external parameters.

A2: Count the number of months between the start date and the end date. For example, there are 6 month between 2014-01-31 and 2014-07-31.

B2: Find the day when each of the irregular months begins according to the initial start date and the intervals. Put the start date before the expression for this computation. The sign “|” represents concatenation and “~” represents the current member in the set, i.e. the numbers from 1 to 6. after function is used to get the irregular months.

A3: Group A1’s data by B2’s intervals and calculate sales amount for each of these irregular months; B2 is appended as the last column. pseg function returns the sequence number of the interval in which the data is held. “~” in ~.sum(Amount)represents the current group. “#” represents the sequence number of the current group. Result is as follows:

A4: Retrieve the second and third column from A3 and return them to the reporting tool. esProc provides JDBC interface outward to be identified by a reporting tool as a normal database.

Create a simple list table with BIRT:

esProc script in the same way as it calls the stored procedure. Save the above script as BirtUnregulMonth.dfx. You can invoke it with call BirtUnregulMonth(?,?) from BIRT’s stored procedure designer. Below is a preview of the appearance and layout of the result report:

July 27, 2015

esProc Assists BIRT in Splitting Fields and Converting Them to Records


To split fields apart and recombine them into records, you can use the stored procedure or a report script. But as the code is complex, an alternative choice is using esProc to help the reporting tool with the job. Let’s look at an example.

Database table data has two fields: ID and ANOMALIES. You need to split ANOMALIES field into strings by the space and combine each string with the corresponding original ID value to generate new records. Below is the original data:

esProc code:

A1: Query the database.

A2: Split ANOMALIES field to create a new two-dimensional table. Then use conj function to concatenate the records generated from the processing of each of A1’s members. “~” represents the set member under processing. A2’s result is like this:

A3: Return A2’s result to the reporting tool. esProc provides JDBC interface outward to be identified by a reporting tool as a normal database.

Create a simple list table with BIRT:

A preview would be like this:
The way a report calls an esProc script is the same as that it calls the stored procedure. Save the above script asBIRTsplitrow.dfx. You can invoke it with call BIRTsplitrow() and input parameters into it from BIRT stored procedure designer. 

July 24, 2015

esProc Assists BIRT in Handling Intragroup Inter-row Calculations

Problem source:

Generally intragroup inter-row calculations are handled with window functions or report scripts. But the code is rather complicated. Instead, you can use esProc to assist reporting tools in dealing with them. Below is such an example.


Database table sample has three fields, among which id is the grouping field. You need to design a grouped table where id is the grouping field and the other fields holding detail data include v1, v2 and the computed field crossline. 

crossline will be calculated like this: v1+v2+previous v1+previous v2. Below is the original data:

esProc code for data preparation:

A1: Query the database and create an additional field with constant values for later use.

A2: Group A1 by id, modify crossline values for each group and combine groups of data together. Result is as follows:
A3: Return A2’s result to the reporting tool. esProc provides JDBC interface outward to be identified by a reporting tool as a normal database.

July 23, 2015

esProcImplementsDynamic Data Sources for Reporting Tools

Sometimes you need to reference data sources dynamically through a parameter, merge data sources into one, or dynamically pass data source name to a subreport/Table control.Often reporting tools - especially those with support for single data source, such as BIRT and Jasper –have to use a high-level language to accomplish these requirements, or trade security for reduced complexity.

However, they can work with esProc (free edition is available) to offset their disadvantages. esProc encapsulates a large number of functions for handling structured data, supports parsing expression dynamically and handling computations involving multiple data sources, and creates dynamic data sources with simple scripts. In addition, it provides simple and easy-to-use JDBC interface through which a reporting tool gets the result of executing esProc script, which will be recognized as a database stored procedure and receive parameters from the reporting tool.


Below is structure of integration of an esProc script and a reporting tool:

Now, through an example, let’s look at the basic process of esProc’s switching data sources using the parameter:

myDB1 and oraDB are data sources that point to different databases. Each holds a sOrder table with the same structure. The report requires connecting to data sources dynamically via a parameter, querying sOrder for orders whose amounts are greater than 1,000, and displaying them.

Below is a selection from the sOrder table in myDB1:

Here is a selection from the sOrder table in oraDB:

esProc script:
=${pSource}.query("select * from sOrder where Amount>?",pAmount)

Both pSource and pAmount are report parameters. pSource represents the data source name; ${…} indicates parsing a string or a string variable into an expression. pAmount stands for the order amount.

When pSource=“myDB1”, A1 has the following result:

When pSource=“oraDB”, A1 gets this result:

The reporting tool calls the esProc script via JDBC, in a same manner as it calls the stored procedure from a normal database. The syntax is this: call esProc script name (para1…paraN). The result returned from the script participates in report creation in the form of a normal data set. Details are covered in the following documents:esProc Integration & Application: Integration with JasperReport and esProc Integration & Application: Integration with BIRT.

As a professional tool for handling data sources for reporting tools, esProchas more abilities, which you’ll see in the following cases. 

Perform a multi-data-source pre-join

The mySQL database stores a Sales table holding orders from different sellers per day. Its SellerID field contains seller numbers. In MSSQL database there is an emp table of seller information in which EID field contains seller numbers. Create a report to display order numbers, order dates, order amounts, seller names and their departments, based on the condition that the orders should be within the last N days (say 30 days) or belong to certain important departments (say Marketing and Finance). Below are selections of the original tables:

Database table sales

Database table emp

esProc script:

A1,A2:Database queries. myDB1 and myDB2 point to MySQL and MSSQL respectively.

A3: Replace A1’s SellerID field with the corresponding records in A2 according to the key field EID. The result is as follows (the data items in blue contain sub-members):

By default, when there is not a corresponding record in A2 for a SellerID value, switch function retains the A1’s record while the SellerID shows an empty value. The effect is like a left join. Use @i option if you want to perform an inner join. The code is A1.switch@i(SellerId,A2:EId).

A4: Filter on the result of join. The first filtering criterion is that orders are within the last N days (this corresponds to parameter days),whose expression isOrderDate>=after(date(now()),days*-1). The second one is that orders belong to several important departments (this corresponds to parameter depts), whose expression is depts.array().pos(SellerId.Dept). The operator || denotes the logical OR operation.

after function calculates the relative time duration. array function splits a string into a set using delimiters. posfunction locates a member in a set. SellerId.Dept means Dept field in the record corresponding to SellerID field.

Both days and deptsare parameters transferred from the reporting tool. Suppose their values are respectively 30 and “Marketing,Finance”, then A4’s result is as follows:

A5: Get the fields of interest from A4. Here is the final result:

Combine result sets with union

Result sets ds1 and ds2 have the same structure, but they come from different data sources – MySQL and a text file respectively. Now concatenate them and display the result in a cross table. Below is the original data:

esProc script:

A3: Concatenate the two data sets. The reporting tool’s work is just to create a simple cross table based on the resulting one data set. 

Handle different data sources for main report and subreport

For a reporting tool that can support only one data source, if the reporting requires different data sources for the main report and the subreport, it needs to pass in the database URL explicitly or use Java classes to combine the different data sources into one. The former approach is vulnerable to security problems and the latter one produces complicated code. esProc, however, is able to cope easily. Here is an example.

Build a report with a subreport to display order information of sellers whose salaries are within a certain range. The main report’s data source is an emp table (in MySQL database) and the subreport’s data comes from a sales table (MSSQL database).

esProc scripts:
empEsProc.dfxfor the main report

A1: Query the emp table in MySQL database by the salary range.

salesEsProc.dfxfor the subreport

A1: Rrtrieve orders records from the sales table in MSSQL database according to employee IDs. Suppose eid=1, the result would be:

You can see that the two data sources have been joined into one source with esProc. The reporting tool just needs to call the corresponding esProc script for the main report and the subreport.

Some reporting systems do support multiple data sources. But it is hard to handle reports with different and changeable data sources. In that case, using esProc to generate a single data source can make the handling work easier.

For the same reason, another reporting problem can be solved through esProc’s single-data-source solution. That is the “multiple subreports, multiple data sources” problem, which means there are many subreports (or table controls) within one report and each has its own data source.

Perform dynamic join between main report and its subreports

A main report may use many subreports whose data sources come from multiple databases. The reporting needs to display the result of dynamic join between these data sources and the one the main report uses. esProc implements the task in a simple way. For example:

The main table org is stored in the data source Master. Every record of the org table corresponds a subtable that has a separate data source. For example when org.org_id=“ORG_S”, the record’s subtable is the User able in the data source S_odaURL; when org.org_id=“ORG_T”, the record’s subtable is the User able in the data source T_odaURL. There are more subtables with names all being User. You need to join all subtables dynamically with the main table and display the result data setin a report. Below is the logical relationship between these tables:

esProc script:

A1:Execute the SQL statement to retrieve data from the org table in Master data source. arg1 is a parameter passed from the reporting tool. When arg1=”ORG”, the result would be:

A2:Loop through A1’s records to associate one subtable each time,concatenatingeach result of join into B2. esProc uses the natural indentation to represent the loop statement’s working range. Here the loop body is B2-B7 where A2 is used to reference a loop variable and #A2 is used to reference the loop number.

B2: Get the data source name for each record according to its org_id field. The result during the first loop is “S_odaURL”.

B3: This is the explicit connectionto corresponding data source.

B4: Filter on data in the User table.

B5: Append three columns, which derive from the main table, to B4’s subtable. The result during the first loop is:

B6:Concatenate B5’s result into B1. The operator “|” is equivalent to union function. When the loop is over, B1 will have collected all data the reporting needs, as shown below:

B7: Close data source connection explicitly.

A8: Return B1 to the reporting tool explicitly. The default execution is to return the result of the last cell.

Display data by different time units specified by parameter

Here is a reporting task that requires using a line graph to show the change of sales over a recent period. unitType is a report parameter, representing time units. If unitType="hour", show sales amount every five minutes in the past one hour. If unitType="day", show sales amount per hour during the past day. And show sales amount per day over the past one week if unitType="week". The data originates from the orders table in which Amount field contains order amount. t is used to reference the order time.

esProc script:

A1:An empty result set used to store the time sequences generated from B2-B4.

A2-B4:Generate different time sequences according to the report parameter unitType. B2 generates 12 time points, with an interval of 5 minutes between each other. B3 generates time points in one day, and B4 generates time points during one week.

A5:Loop through A1 to calculate the sales amount of each time interval. “~” represents the current member and “~[-1]" represents the previous one. In the case of unitType="day", a one-field result set containing 12 records will be generated.

A6:Return A5 to the reporting tool via JDBC. Then you can plot the graph to show the data. 

July 22, 2015

esProc Integrates HeterogeneousData Sources for Report Development

In addition to conventional databases, data sources of a reporting tool could also involve JSON files, MongoDB, txt files, Excel and HDFS files. Normally reporting tools can handle a single data source, but they are unable to manage various data sources requiringconsolidation. Even though the data sources are of the same type, you still need to write a lot of code for the report developmentif they come from a database without effective computability. 

However, esProc (free edition is available)can solve both problems. It offers a large number of functions for manipulating (semi)structured data, and supports heterogeneous data sources with the ability of integrating them. Besides, esProc provides a simple and easy-to-use JDBC interface, through which a reporting tool will call an esProc script as a database stored procedure, pass parameters to it, execute it and get the result set.



Below is the structure of integration of an esProc script and a reporting tool:


This is an example of how esProcimplementsqueryinga multi-level subdocument in a JSON file for creating a report:

jsonstr.json has a subdocument, runners field, which has three fields - horseId, ownerColours and trainer– in which trainer contains a subfield -trainerId. The report needs to present the horseId, ownerColours and trainerId field for each subdocument within runners filed according to its serial number.

The source data:
[
    {
        "race": {
            "raceId": "1.33.1141109.2",
            "startDate": "2014-11-09T13:15:00.000Z",
            "raceClassification": {
                "classification": "Novices'"
            },
            "raceType": {
                "key": "H"
            },
            "raceClass": 4,
            "course": {
                "courseId": "1.33"
            },
            "meetingId": "1.33.1141109"
        },
        "numberOfRunners": 2,
        "runners": [
            {
                "horseId": "1.00387464",
                "trainer": {
                    "trainerId": "1.00034060"
                },
                "ownerColours": "Maroon, pink sleeves, dark blue cap."
            },
            {
                "horseId": "1.00373620",
                "trainer": {
                    "trainerId": "1.00010997"
                },
                "ownerColours": "Black, emerald green cross of lorraine, striped sleeves."
            }
        ]
    },
……
]


esProc script:



A1:Read in the JSON file.

A2:Retrieve runners field according to the serial number of each of its subdocument. Here which is a report parameter. The result is like this:

A3:Get the desired fields to generate the result set the report needs. The result is as follows:


The reporting tool calls the esProc script via JDBC, in a same manner as it calls the stored procedure from a normal database. The syntax is this: call esProc script name (para1…paraN). The result returned from the script participates in report creation in the form of a normal data set. Details are covered in the following documents: esProc Integration & Application: Integration with JasperReport and esProc Integration & Application: Integration with BIRT.

As a professional tool for processing data sources of reports, esProccan be used to implement more scenarios, as shown by the following examples.

Create a grouped report from a multi-level JSON file

Cells.json is a multi-level nested JSON file, which you want to display with a grouped report. The grouping fields are name, type and image."xlink:href". There is also a field with 3 subdocuments: custom.Identifier, custom.Classifier and custom. Output, which are of the same structure but contain different number of documents each.

The source data:
{
    "cells": [
        {
            "name": "b",
            "type": "basic.Sensor",
            "custom": {
                "identifier": [
                    {
                        "name": "Name1",
                        "URI": "Value1"
                    },
                    {
                        "name": "Name4",
                        "URI": "Value4"
                    }
                ],
                "classifier": [
                    {
                        "name": "Name2",
                        "URI": "Value2"
                    }
                ],
                "output": [
                    {
                        "name": "Name3",
                        "URI": "Value3"
                    }
                ]
            },
            "image": {
                "width": 50,
                "height": 50,
                "xlink:href": "
HNCSVQICAgIfAhkiAAAAAlwSFlzAABEJAAARCQBQGfEVAAAABl0RVh0U29mdHdhcmUAd3Vi8f+k/EREURQtsda2Or/+nFLqP6T5Ecdi0aJFL85msz2Qxy
f4JIumMAx/ClmWt23GmL1kO54CXANAVH+WiN4Sx7EoNVkU3Z41BDHMeXAxjvOxNr7RJjzHX7S/jAflwBxkJr/RwiOpWZ883Nzd+Wpld7tkBr/SJr7ZHZb
HZeuVweSnPfniocMAWYwcGBafH0OoPamFGAaY4ZBZjmmFGAaY4ZBZjmmFGAaY4ZBZjmmFGAaY7/B94QnX08zxKLAAAAAElFTkSuQmCC"
            }
        },
……
    ]
}


esProc merges the three subdocuments into a single two-dimensional table, gives them a new field name ctype to be identified and joins them with the grouping fields. By doing so, a typical “table with subtables” will be created. esProc code is as follows:


A1: Import the JSON file. The relationships between different fields are shown below:

A2: Convert the multi-level nested JSON file to a simple two-dimensional table. The sign “|”means concatenation. new function creates a two-dimensional table based on the source data. conj function calculates based on each record of the source table and concatenates the results. A2’s resulting two-dimensional table is what you need to create the report, as shown below:


Then it’s easy for you to build a grouped report according to this esProc result.

Create a report with subreports using different JSON files

You want to create a report containing multiple subreports, where the main report and each subreport use different JSON files as their sources. Below is a selection of the source data:
MainReport.json
{"menu": [
         {
                   "id": "A1",
                   "value": "File",
                   "popup": "Yes"
    },
         {
                   "id": "A2",
                   "value": "Edit",
                   "popup": "No"
    }
  ]
}
SubReport1.json
{"menuitem": [
    {"value": "New", "onclick": "CreateNewDoc()"},
    {"value": "Open", "onclick": "OpenDoc()"},
    {"value": "Close", "onclick": "CloseDoc()"}
  ]
}
SubReport2.json
{"menuitem": [
    {"value": "Undo", "onclick": "onUndo()"},
    {"value": "Redo", "onclick": "onRedo()"},
    {"value": "Copy", "onclick": "onTextCopy()"},
         {"value": "Past", "onclick": "onTextPast()"}
  ]
}

A reporting tool with support only for a single data source, such as Jasper and BIRT, would combine the multiple sources into one using JAVA classes, while esProc would use a simple script as follows:

Read in the JSON file and get its first field, which is represented by “.#1”. By assigning different file names to the parameter argFileName, the report will receive different data sets, as the following shows:


Perform a join between MongoDBand MySQL

emp1 is a MongoDB collection, whose CityID field is the logical foreign key pointing to CItyID field of cities, a MySQL table that has two fields –CityID and CityName. You need to query employee records from emp1 according to specified time period and switch its CityID field to CityName of cities.

esProc script:


A1:Connect to MongoDB.

A2:Query emp1using MongoDB syntax by the specified time period. find function returns a cursor. @x option means closing the MongoDB connection automatically after the data is all fetched. The result would be like this:


A3:Execute SQL statement to query the MySQL database. Here is the result:



A4: Replace A2’s CityID field with the corresponding records in A3. switch function works as a left join does. To perform an inner join, use @i option. By performing field replacement using switch function, the key field linkingthe two tables can be accessedthrough the object. This object-type access is simple and intuitive, whose merits are especially obvious when performing a multi-level, multi-table join. Here is the result of switch:

A5:Retrieve the desired fields to generate a table as follows:


A7:By default the esProc script will return the last calculation cell (here is A5) to the reporting tool.

Perform joins between MongoDB collections

Both sales and emp are two-dimensional MongoDB collections. sales has SellerId field as its logical foreign key that points to emp’sEId field. You need to query orders in sales by the specified time period and associate with emp through a left join, and then present the result in a report.

esProc script:


A1,A4:Connect to/disconnect from MongoDB.

A2:Query the sales collection using MongoDB syntax and fetch the cursor data into memory using fetch function (as the data size is small). Here is the result:

A3:Retrieve data from the emp collection. Here is the result:

A5:Join the two collections together. join function performs the join operation. @1 means left join and @f means full join. Without any of the options, thisfunction performs an inner join. The result is as follows:

A6:Retrieve the fields of interest from the result of join to generate a new two-dimensional table, as shown below:


Join an Oracle table and an Excel file

Here are table1, which is stored in an Oracle database, and table2, an .xlsx file. Both have the same structure. Below are selections from them:

You need to group table1 and table2 respectively by name, count the number of members in each group, calculate the sum for each group by active field, and then present the resultsfrom the two tables in sequence. The expected report layout is as follows:

esProc script:

A1:Execute the SQL statement to group and aggregate data from table1. Here is the result:


A2:Import the Excel file and make the first row the column headers.

A3:Group and aggregate A2’s data. Here is the result:

A4:Perform a left join between A1 and A3. You’ll get the following result:

A5:Retrieve the fields you want from A4 and rename them. This is the result you’ll get:


Join a txt file and a JSON file

structure.txt is a structured text separated by tabs. json.txt contains unstructured JSON strings. There is a foreign key relationship between the second field of structure.txt and part of the text in json.txt. Below are selections from them:

structure.txt
Name1     BBBBBBBBBBBB     99.40        166 1        0       1       166 334 499 3e-82   302
Name2     DDDDDDDDDDDD 98.80        167 2        0       1       167 346 512 4e-81   298

Json.txt
[
    { "Cluster A": { "member": { "Cluster A": "BBBBBBBBBBBB This is Animal A" }, "name": "Cluster A" } },
    { "Cluster B": { "member": { "Cluster B": "DDDDDDDDDDDD This is Animal B" }, "name": "cluster B" } }
]

You need to create a report to present the above relationship. This is the expected report layout:
Name1   BBBBBBBBBBBB    99.40   166 1   0   1   166 334 499 3e-82    302 Cluster A This is Animal A
Name2   DDDDDDDDDDDD    98.80   167 2   0   1   167 346 512 4e-81    298 Cluster B This is Animal B

esProc script:

A1-A3: Read in the JSON file, get the desired data and append a calculated column. Here’s the result:


A4:Import the text file as a two-dimensional table. Note that esProc can import not only a local file, but a file stored on LANs orin the HDFS file system.

A5:A join operation. The result is as follows:


A6: Retrieve the desired fields to generate a table as follows