esProc, A Script Language for Data Analytics with Parallel Mechanism: esProc Improves Text-processing

Many databases support the MERGE statement and thus can merge two tables conveniently. Database syntax cannot be used directly, however, if the data to be merged are the text data. In this case esProc is able to perform what MERGE statements can do based on the text files.

The two files - table1.txt and table2.txt - have the same structure but contain different data. According to the logical primary key defined on column A and column B, table1 will be merged with table2. The operation goes like this: update table1 when the primary key values of the two files are same but values of other columns are different; insert data into table1 when the primary key values are different.

Assume that the original data have been sorted by the logical primary key and make the result set ordered based on the primary key, then you can perform the operation repeatedly and merge new data to the target file.

Below is the original data：

Merge Table 1 with Table 2 and the result is as follows:

esProc code is simple：

A1,B1：Read table1.txt and table2.txt as cursors.

A2：Merge A1 with B1 by the logical primary key. merge function unions data together, ensuring the result set is still ordered. @x represents cursor-handling and @u represents union operation.

A3：Write the result to a new a new file.

Cursor is used in the code to handle data files of any size, even those having big data.

menu

May 28, 2015

esProc Improves Text-processing – Merge Sorted Data

No comments:

Post a Comment