esProc, A Script Language for Data Analytics with Parallel Mechanism: Transforming Records with esProc

Problem source：http://bbs.csdn.net/topics/390991336 .

Table 1 (T1) stores single-value id field and name field. A selection of the source data is as follows:

id name

1 a

2 b

3 c

4 d

5 e

Table 2 (T2) stores multi-value ids field as follows:

ids

1,2,3

4,3

1,2,3

According to T1’s id field and name field, we want to transform T2’s ids field into a field with values being the corresponding names. The transformed T2 is expected to be like this:

a,b,c
d,c
a,b,c

esProc approach：

A1，A2：Execute SQL statements to retrieve data from T1 and T2.

A3：Convert ids values in A2’s records to sequences one by one, and match them with T1’s records to get sequences of names, which then will be converted to records separated by the comma. The result is like this:

Of course, the above approach is based on the assumption that values of id are consecutive integers. If they are not, then the last step could be rephrased as follows:

A3：>A1.primary(id)

A4：=A2.(ids.array().(A1.find(~).name ).string@d())

The SQL approach uses subquery to generate distinct group numbers, and then solves the problem with relatively complex JOIN. The following shows how it is coded in MySQL:

select group_concat(t1.name) names from(

select @row:=@row+1 as gid,ids from t2,(SELECT @row :=0 ) r

) t2 left join t1 on find_in_set(t1.id,t2.ids) group by t2.gid

But note that this SQL approach can only apply in the situation where there are no duplicate values for ids in T2. Otherwise only one of the duplicates is allowed to be kept. For instance, the transformed result of “2,1,2,3” is “a,b,c” instead of “b,a,b,c”. By comparison, esProc syntax is more intuitive, easier to understand and can be used more widely.

esProc can be called by reporting tools as well as the JAVA program. The way is much like the one a database is called. Through JDBC interface provided by esProc, result can be returned to the JAVA main program in the form of ResultSet. See related documents for detailed method.

menu

July 1, 2015

Transforming Records with esProc

No comments:

Post a Comment