July 5, 2015

esProc Divides Letters into Segments

The RBP_VALUE field in the rbd_dtl table stores the string “all” and some letters. We want to convert values in the field to strings of segments. The desired result is A-F,H,J-K,N-O,Q-S,U-V,X-Z.

Below is the original data

Rules to be followed are

If RBD_VALUE= “ALL”, then discard the value;
According to the alphabetical order if certain RBD_VALUE values are consecutive letters , then make them a segment with both the first and last letter included and “-” between them, like “A-F”;

If two values of RBD_VALUE are inconsecutive, like K and N between which L and M are absent, then unite the former with the letter or letters before it and join the latter with the letter or letters after it. Separate the two segments with a comma. This is like “J-K”, “N-O”;

If a value in RBD_VALUE has not consecutive letters both before and after it, then take it as a separate segment, like the letter H. G and I are absent respectively before and after it.

esProc approach


A1: Execute the SQL statement.

A2: Align RBD_VALUE field in A1 with the alphabet. Below is a selection of the result:


A3: Group rows in A2. Group rows where RBD_VALUE is null together and put the other rows where RBD_VALUE is non-null into another group. @o means performing merge-style data grouping, which only compares the neighboring values. The result is as follows:

A4: Select the group in which RBD_VALUE is non-null.

A5: Compose strings according to the number of members in each group. ~.m(-1) means getting the last record from the current group. The result is as follows:
A6With commas being delimiters, concatenate strings to get the final result A-F,H,J-K,N-O,Q-S,U-V,X-Z.

Actually we simply need to retrieve data from RBD_VALUE field. Thus the lines of code can be combined into one:

myDB1.query("select RBD_VALUE from rbd_dtl").(RBD_VALUE).align(26.(char(64+~))).group@o(!~).select(~(1)).(~(1)+if(~.len()>1,"-"+~.m(-1))).string@d()

July 3, 2015

esProc Performs Random UPDATE with Priorities


Target:Among the records that satisfy the specified condition in the TMP_SURVEY_TRAN_BZ_3_WORKING table, select 20 ones at random and change values of their Quota_Include_Ind field into “Y”. But there is a list of priorities for the update: If the number of records satisfying customer_type=’r’ is greater than 20, then the 20 random records for updating will be chosen from them; if the number of records satisfying the same condition is less than 20 (say 15), then we’ll update all these 15 records, plus another 5 ones chosen randomly from records that satisfy customer_type<>’r’.

Below is a selection from the original data:


esProc approach: 

A1,A2:Both execute SQL statement to retrieve primary key values of the eligible records according to the parameters. A1’s records meet condition customer_type=’r’ and A2’s record meet condition customer_type<>’r’.

A3: Get primary key values of the records to be updated according to the number of records in A1. A1.sort(rand()) means sorting A1 randomly; to(20) equals [1,2…20], which means getting 20 records in order. The operator | is used to concatenate two sets.

Results of A1, A2 and A3 are listed separately as follows:


A4: Update the table based on A3. @u indicates that only UPDATE statement will be generated.

July 2, 2015

esProc Codes Column Shuffling

In database table REF_VALUES, ID column acts as the primary key and ORIGINAL_VALUE is used as the original column, whose values are to be shuffled and updated to the empty SHUFFLED_VALUE column. Below is a selection from the table: 

SQL approach:

Databases differ in their solution to the problem. Here Oracle is used as the example:

create or replace procedure shuffle as
TYPE My_CurType IS REF CURSOR; 
CUR_1 My_CurType;
V_id NUMBER(10,0);
V_result varchar2(20);
v_sql varchar2(2000);
begin
  OPEN CUR_1 FOR  select t1.id,t2.result from (
        select rownum rn,id,ORIGINAL_VALUE from REF_VALUES order by ORIGINAL_VALUE) t1
    join(
        select rownum rn,result from(
            select ORIGINAL_VALUE result from REF_VALUES order by dbms_random.value()))t2
    on t2.rn=t1.rn;
  LOOP
    fetch CUR_1 INTO V_id,V_result ; 
       EXIT WHEN CUR_1%NOTFOUND; 
    v_sql:=’update REF_VALUES set SHUFFLED_VALUE=”’||V_result||”’ where id=’||V_id;
    EXECUTE immediate v_sql;  
  END LOOP;
  CLOSE  CUR_1;
end shuffle;

You need to write multilayered nested subqueries to get the shuffling result (as shown by the first half of the above code), then you need cursors (or temporary tables) and the stored procedure to insert it to the target table. The code will be rather tedious. 

The esProc approach allows us to dispense with the complex nested subqueries and applies to various types of databases. The code is as follows:

A1: Execute the SQL statement to get data from ID column and ORIGINAL_VALUE column.

A2: Shuffle the values in ORIGINAL_VALUE column.

A3: Join A1’s ID column with A2’s ORIGINAL_VALUE column to create a two-dimensional table, as shown below:


A4:Update REF_VALUES table with A3’s table. @u option means only generating the UPDATE statement. The updated REF_VALUES table is as follows:

July 1, 2015

Transforming Records with esProc


Table 1 (T1) stores single-value id field and name field. A selection of the source data is as follows:
id    name
1     a
2     b
3     c
4     d
5     e

Table 2 (T2) stores multi-value ids field as follows:
ids
1,2,3
4,3
1,2,3

According to T1’s id field and name field, we want to transform T2’s ids field into a field with values being the corresponding names. The transformed T2 is expected to be like this:
a,b,c
d,c
a,b,c

esProc approach

A1A2Execute SQL statements to retrieve data from T1 and T2.

A3Convert ids values in A2’s records to sequences one by one, and match them with T1’s records to get sequences of names, which then will be converted to records separated by the comma. The result is like this: 

Of course, the above approach is based on the assumption that values of id are consecutive integers. If they are not, then the last step could be rephrased as follows:
  A3>A1.primary(id)
A4=A2.(ids.array().(A1.find(~).name ).string@d())

The SQL approach uses subquery to generate distinct group numbers, and then solves the problem with relatively complex JOIN. The following shows how it is coded in MySQL:
  select group_concat(t1.name) names from(
    select @row:=@row+1 as gid,ids from t2,(SELECT @row :=0 ) r
) t2 left join t1 on find_in_set(t1.id,t2.ids) group by t2.gid

But note that this SQL approach can only apply in the situation where there are no duplicate values for ids in T2. Otherwise only one of the duplicates is allowed to be kept. For instance, the transformed result of “2,1,2,3” is “a,b,c” instead of “b,a,b,c”. By comparison, esProc syntax is more intuitive, easier to understand and can be used more widely.

esProc can be called by reporting tools as well as the JAVA program. The way is much like the one a database is called. Through JDBC interface provided by esProc, result can be returned to the JAVA main program in the form of ResultSet. See related documents for detailed method. 

June 30, 2015

Segmented Data Exports to Text Files of Specified Format in esProc



Below is a selection from a database table, TB1:

Based on this data, you need to generate text files of the specified format and with the following names (yyyyMMddHH) by the hour.
2011101106.txt
1|2011-10-11 6:24:21|aaa
2|2011-10-11 6:30:45|dddd

2011101107.txt
3|2011/10/11 7:10:12|dsf
4|2011/10/11 7:50:38|dffew
5|2011/10/11 7:59:59|dfae

2011101108.txt
6|2011/10/11 8:00:00|edfae

After grouping data by the hour, you export every group to a text file according to the specified format (|) and name it after the hour: 

A1: Use a SQL to create a database cursor sorted by CreateTime. db represents the connection to a database.
A2-B2: Group data by CreateTime (yyyyMMddHH), and fetch data by loop, one group each time, to export, through B2, to text files named after the create time. And specifiy format for the text data at the same time.

Text files are as follows: