esProc, A Script Language for Data Analytics with Parallel Mechanism: Alignment Grouping and Enumeration Gouping in esProc

Grouping records is often required during presenting and analyzing data in databases. Though records can group the designated fields by using Group By statement in SQL statements, this type of grouping serving the purpose of summaryis too simple to deal with some complex situations, like grouping according to designated order or grouping with identical records. But with esProc, we can use alignment grouping function P.align@a() or enumeration grouping function P.enum() to manage various complicated requirements for grouping.

We'll look at how to use alignment grouping and enumeration grouping in esProc through the following example.

1. Alignment grouping

Database table LIQUORS has information of some wines:

But how can we group these wines by varieties in the order of Vodka, Gin, Rum, Whisky, Brandy, Tequila and Cordial?

We can manage it easily in esProc with P.align@a() function:

We just need to execute alignment grouping on records in A1 by using the field TYPE according to the designated sequence A2. The computed results in A3 are as follows:

Among these results, each group is made up of records in A1. Double-click to see more.

When P.align@a() is at work, it will look up every member of the record sequence P and put it into the group that meets the condition. A member cannot appear more than once during this process. If we want to modify A2 and add another Gin as the grouping value as follows:

Seen from the computed results of A3, alignment grouping will only put records to the first eligible group and make the last group an empty one:

esProc's P.align() function provides many options that can manage various situations in alignment operations.

If @n is used by P.align@a() function, a new group for taking all the ungrouped records will appear except those designated ones. Because @n option is only used in alignment grouping operations, here @a can be omitted:

Computed results of A3 are as follows:

If @a option is not used by P.align() function, only the first value of each group will be retained:

Computed results of A3 are:

If @p option is used by P.align@a() function,groups won't store records except their sequence numbers:

Computed results of A3 are:

It can be seen that only sequence number of each record is stored in the group.

If @s option is used by P.align() function, result returned will be a record sequence, that is, the re-sorted records in the table according to designated sequences. This is similar to the result obtained by using order by clause in SQL statements, only with a different sorting principle:

Now the computed results of A3 are as follows:

Note that when @s option is at work, it is invalid to introduce @a option at the same time.

If @b option is used by P.align@a() function, binary search will be used for alignment grouping. Using binary research will bring higher efficiency, but meanwhile, the alignment sequence designated by grouping must be ordered:

Sort A2 in A3,and the computed results are as follows when @a option and @b option are used simultaneously:

While A4 doesn't execute sorting of A2, so the results of alignment grouping are incorrect when @a option and @b option are used simultaneously:

If grouping expressions directly compute each record's place in alignment grouping, then P.align@a(n,y)function can be used:

Because the places have been directly set, the specific value of each member in grouping value sequence of align function won't affect the result, and we can use the simplest sequence to(n) to complete alignment grouping according to the number of groups. Here to(n) can be abbreviated to n. Expressions in A3 and A4 have the same computed results:

If @r option is used by P.align@a(n,y) function, each record will correspond to a sequence of group numbers and can be put into more than one group. Such as:

Computed results of A3 are:

P.align() function can manage more complicated jobs through combinations of its options.

2.Enumeration grouping

Enumeration grouping is, in fact, a type of alignment grouping. Its alignment basisis the computed results of designated expressions.

For example, divide the wine information in table LIQUORS into three groups by names: ?<"D",?<"K", and ?>="K". First create a sequence according to the conditions, then execute enumeration grouping on all information of wines by using P.enum() function according to the condition sequences:

Computed results of A3 are:

By default, records of P only appear once when P.enum() function is used for grouping. It can be noticed that the grouping condition of the second group is ?<"K", and though the records in the first group satisfy this condition, they won't appear again in the second group.

Various options can be used by enum function to realize its different functions.

If @r option is used by P.enum() function, the same records are allowed to appear in more than one groups:

Computed results of A3 are as follows:

It can be seen that, for the time being, the second group contains all eligible records, including those in the first group.

If @p option is used by P.enum() function, the groups will store only the corresponding record numbers:

Now the computed results of A3 are as follows:

It can be seen that each group stores only the corresponding record numbers.

If @n is used by P.enum() function, a new group will be appended in the end to take the ungrouped records. If the third grouping condition is modified and @n option is used in enumeration grouping:

Results of A3 are as follows:

menu

August 18, 2014

Alignment Grouping and Enumeration Gouping in esProc

1. Alignment grouping

2.Enumeration grouping

No comments:

Post a Comment