esProc, A Script Language for Data Analytics with Parallel Mechanism: Concepts of esProc Sequence,Table Sequence and Record Sequence

Sequence,table sequence and record sequence are commonest data types of esProc. This article tries to expound their respective characteristics as well as relations between them.

A sequence is an ordered generic set
Collectivity:

A sequence consists of multiple data,which are members of the sequence. The members can be any type of data, such as string,integer,decimal and date, or null.A sequence has the general characteristics of a set, and can make set operations. A1, A2 and A3 in the following are all sequences.

A1=[] /empty sequence
A2=[5,6,7] /a sequence whose members are integers is also called an integer sequence
A3=["red","blue","yellow"] /a sequence whose members are strings
A4=["blue","yellow","white"] /a sequence whose members are strings
A5=A3^A4 /intersection operation of sequences. A5 is also a sequence whose value is ["blue","yellow"]

Genericity
A sequence is a genericity set, which contains members of various data types. Its member can also be a sequence, like B1 and B2:

B1=[1,date("2014-05-01")] / members of sequence include an integer and a date
B2=["blue",[],[5,6,7]] /a sequence consisting of a sequence equals to ["blue",A1,A2]

Orderliness
Generally, a set is unordered, that is, two sets with same members of different order are equal.A sequence is ordered.Two sequences with same members of different order are not equal.e.g. There are two sequences A1= ["Mike","Tom"] and B1=["Tom","Mike"]. When examined with expression A1==A1 to see whether they are equal or not,the result is false.

Orderliness is a common feature of business data. For instance,that Mike comes before Tom may mean that Mike has done a better job in school study.Sorting sales amount monthly can clearly present its changing rule. It is more convenient to use a sequence to do ordered computation. Such as,

A1(2) /Fetch the second member. This operation can also be expressed with A1.m(2).
A1.m(-1) /Fetch the last member.
A1.p("Tom")/Fetch the sequence number of member Tom.
A1.rvs() /Reverse the sequence.

In addition, there are operations like insert, delete, modify, copy, compare, convert, sub-sequence,sorting,rank, sets computation, mutual transformation of strings and sequences, etc.
An integer sequence is a sequence whose members are integers. It has a more detailed access method.Such as,
to(2,5) /Create sequence [2,3,4,5]. If the sequence begins with 1, it can be abbreviated to to(5).

Basic Computations of esProc Sequences

A table sequence is a structured sequence
Structured two-dimensional data objects
Members of a sequence can be any types of data, such as,atom type, another sequence or a record. If members of a sequence are records of the same structure(of the same field number and field name), it is called a table sequence. The data objects in the following figure constitute a table sequence:

Because a table sequence isa structured two-dimensional data object,it is usually created from SQL, text files, binary files, Excel files or anempty table sequence. A1,B1 and C1 in the following are table sequences.

A1=file("e:/sales.txt").import@t() /table sequence created from a text file
B1=Oracle1.query("select * from sales") /table sequence created from SQL
C1=create(OrderID,Client,SellerId,Amount,OrderDate) /create anempty table sequence

A great deal of structured data operations can be performed within a table sequence, including query, sort, sum, average value, merging repeated records, etc. Such as,

A1.select(Amount>=2000 && year(OrderDate)==2009) /Query out records whose Amount field is greater than or equal to 2000 and OrderDate is the year 2009.
A1.sort(SellerId,OrderDate:-1) /Sort records in ascending order according to SellerID field. With the same SellerID, sort records in descending order according to OrderDate.
A1.groups(SellerId,year(OrderDate);sum(Amount),count(~)) / Sum up Amount of each group of data and count up orders of each group according to SellerID and the year.

A table sequence is a special sequence
A table sequence is still a sequence.The latter’s collectivity, orderliness and related functions apply in a table sequence. A table sequence hasn’t the feature of genericity because its members must be records of the same structure. But, the field values of the records can be generic data, which in this sense is another form of genericity.Thanks to these features, a table sequence is better at handling complicated computation than traditional program language does.

For instance, based on orderliness, we can find the growth rate of each month compared with the previous one in a table sequence. The statement will be:
sales.derive(salesAmount /salesAmount [-1]-1:compValue)

Another instance. Assume that, a big contract in business is that whose amount of order is greater than 40 and an important contract is that whose unit price is more than 2000. Please find, according to collectivity, 1. contracts of this year which are both big and important; 2. the other contracts.
thisYear=Contract.select(year(SellDate)=2014) /contracts of this year
big= Contract.select(Quantity>40) /big contracts
importance = Contract.select(AMOUNT>2000) /important contracts
answer=thisYear^big^importance /answer to question 1
others= Contract\answer /answer to question 2

Note that Contracts in the code are table sequences while thisYear, big, Importance,answer,others are record sequences originating from table sequences. The difference and relation between a table sequence and a record sequence will be explained in the following.
Make contextual computing according to genericity: Among subordinates of department managers who have won the President Award, who have been awarded the Annual Outstanding Staff?
employee.select(empHonor: "EOY",empDept.manager.empHornor:"PA")

A record sequence is the reference of table sequence records
Obviously, if each computation in a table sequence is to produce a new table sequence, a great deal of the memory will be occupied. For instance, a table sequence, sales, has 5,000 records, and 3,000 ones will be produced by query. If new table sequences are to created, there will be 8,000 records in memory. In fact, as the 3,000 records are part of the original table sequence, it is unnecessary to create table sequences anew. We just need to store their references by using certain data objects. This type of data objects is called a record sequence.

Transparency of record sequences
Usually, it’s not necessary for programmers to differentiate record sequences and table sequences, as what they do with references and physical data. Operations in the preceding example, such as, query, sorting and intersection, can be used in both record sequences and table sequences with the same syntax:
A1.select(Amount>=2000 && year(OrderDate)==2009)
A1.sort(SellerId,OrderDate:-1)
Note: when A1 is a table sequence, the result will be a record sequence; and the same result will be obtained when A1 is a record sequence. Sets operation can be performed between record sequences, e.g.,
answer=thisYear^big^importance

It hasn’t any practical significance to make sets operation between table sequences, for members of different table sequences are always different objects. For instance, intersection of this year and big is null forever if both of them are table sequences.

When data structure changes, esProc will automatically create new table sequences, like what will happen in grouping and summarizing.
A1.groups(SellerId,year(OrderDate);sum(Amount),count(~))

A table sequence has a one-way influence on a record sequence
Different table sequences represent different physical records, so modifying a table sequence will not affect the other ones. But, a record sequence is the reference of the records of a table sequence and the two have the same physical data, any change of the table sequence will affect the record sequence. Such as,

TSeq=file("e:/sales.txt").import@t() / TSeq is a table sequence, Client field value of the record represented by OrderID=5 is DSG
RSeq=TSeq.select(Amount>2000) /RSeq is a record sequence created from TSeq. Amount of the record represented by OrderID=5 is greater than 2000, so the record is a member of RSeq
TSeq.modify(5, "WVF":Client) /Modify Client field of the record represented by OrderID=1 into WVF

Now it can be seen that Client field of the record represented by OrderID=5 in RSeq has also changed into WVF. If there are multiple record sequences that originate from the same table sequence, their data will change accordingly. Often, this is not what a programmer wants to see. For instance, the Amount field value of the record represented by OrderID=5 is 3730, and if the following computation is to executed:

RSeq =TSeq.select(Amount>2000) / The record represented by OrderID=5 is in RSeq
TSeq.modify(5, 1000: Amount) /Modify Amount field of the record represented by OrderID=5 into 1000, i.e., less than 2000

Now it can be found that the record represented by OrderID=5 in RSeq is still there (because its reference didn’t change), and its Amount value is 1000, which is inconsistent with the condition Amount>2000, and sometimes incorrect in business.
In order to avoid this influence, users should finish modifying the table sequence before the record sequence is created. Such as,

TSeq.modify(5, 1000: Amount) /Modify Amount field of the record represented by OrderID=5 into 1000, i.e., less than 2000
RSeq =TSeq.select(Amount>2000) /The record represented by OrderID=5 won’t appear in RSeq

This computed result is what the business is required. It’s only natural for the above computing order as long as the reference relation between a table sequence and a record sequence is understood.

It’s important to note that operations to modify the original table sequence, like modify, cannot be accepted in a record sequence, the relation between the two is a safe one-way influence. So users can safely use record sequences and table sequences.

Basic Computation of esProc Table Sequence and Record Sequence

menu

July 21, 2014

Concepts of esProc Sequence,Table Sequence and Record Sequence

No comments:

Post a Comment