In an esProc table sequence, a field or certain fields can be designated
as the prime key(s). In this case, some special functions can be employed to
make query based on the prime key(s). This can both simplify the code and
increase computational performance effectively.
1. find and pfind
Prime keys are common in database tables. The field value of a prime key is solely
used to mark a record, so identical values of prime keys are forbidden, which
is a mandatory rule in many databases.
Under esProc, it is assumed that
each value of the prime keys is unique
for the records. But no mandatory check will be executed and there won't be error reporting even if there
are identical values. Both T.pfind(v) function and T.find(v) function can be used to query records in table
sequence T according to v, value(s) of the prime key(s). find returns the first found record and pfind returns the sequence number of this
record.
It is not necessary to set a primary key, or primary keys. When no
primary key is set in esProc, the first field will be regarded as the primary
key. If we want to set specified fields as primary keys, we use T.primary(Fi,…) function, meaning setting Fi,… as
the primary keys of table sequence T.
Usually, we use conventional position function T.select() and T.pselect() to query records in a table sequence. Now we'll compare usages of
this pair of function with pfind and find.
Here EMPLOYEE table in demo database is the table sequence in
which query will be executed. Add field FullName
to it and use the employees' full names to make query:
In order to manifest the performance advantages of using prime keys to query data, 10,000 full names will be generated arbitrarily and use pselect and pfind to make query according to these full names. Compute the time consumed by these two methods:
Based on the same data, A5 and A9 use pselect and pfind respectively to query positions of the records in the table sequence. In A9, primary function is used to set primary keys for the table sequence before pfind starts to work. A6 and A10 work out respectively the milliseconds consumed as follows:
Though the query results in A5 and A9 are same:
Using similar method, we may compare select function and find function. To keep in line with find function, @1 option is used in select function to get the first result and return it:
A6 and A10 compute respectively the milliseconds consumed as follows:
The query results in A5 and A9 are still same:
It can be seen easily from the comparison that the query functions
based on primary keys are much more efficient than conventional position
functions.
2. Primary keys and the index table
Why will the efficiency increase greatly when primary keys are
employed to make query in esProc? The reason is that the index table of the
primary keys is used in computing.
In calling a primary function, or making query based on primary keys
in a table sequence without an index table for the first time, an index table
will be generated according to the primary keys. While an index table is generated, a hash table will be generated according to all values
of the primary keys which will be divided into many groups by hash values.
These hash values are the corresponding group numbers.
Normally, when querying a certain record in a table sequence
according to field value, it needs to examine the records one by one until the
target is found. For a table sequence containing n records, an average of n/2 examinations are
needed.
Thanks to the index table, it's turning around in the case of
querying a certain record in a table sequence according to field value of the
primary key. First, we can compute the hash value according to the primary key
value. This enables us to find out directly the corresponding group in the index
table. Then we just need to examine records of the same group. In the same way,
for a table sequence containing n
records, if its primary keys are distributed in k groups according to hash values, we only have to make an average
of n/2k comparisons. Using this method, despite
the price paid due to computing hash values before generating an index table
and executing the query, the number of comparisons is reduced significantly,
and especially, it is enough to generate an index table once. Therefore, the
more the data in a table sequence and the times needed for query, the higher
the efficiency.
During computing, T.index(n) function can be used to create an index table for T's primary keys in advance. n represents the length of the index
table. Default length will be used if there is no particular setting.
This time 10,000 sequence numbers of employees are generated arbitrarily. A4 queries the corresponding records directly according to these sequence numbers, yet A7 still uses find function to query. A5 and A8 compute respectively the time consumed as follows:
We can see that it is faster to locate records directly using sequence numbers. Because this method doesn't have to compare field values, nor does it compute the hash values and create an index table. The query results in A4 and A7 are same:
Thus it can be seen that suitability should be taken into
consideration in esProc if we are to use index function of primary keys in a
table sequence to increase efficiency.
3. switch function
The role of switch function is to relate the value of a certain field F in table sequence T1
to the primary keys or specified fields of another table sequence T2, thus foreign keys can be created in T1 to reference corresponding records in T2. The operation of creating foreign references is
similar to that of using find to query
records according to values of primary keys in T2.This
process also involves creating an index table. However,T1.run(F=T2.select@1(...))can also be used to perform
this operation. We’ll first compare the efficiency of the two methods:
Both A2 and A3 contain personnel information loaded from binary text file PersonnelInfo:
A4 contains states information:
In both A6 and A9, field State of PersonnelInfo is switched into corresponding states information. Their difference is that A6 uses select@1 function while A9 uses switch function. A7 and A9 compute respectively the time consumed as follows:
After the code in the cellset is executed, values of A2 and A3 are same:
Field State has been
switched into corresponding records in states information table.
Before switch is executed, an index table will also be created for corresponding
fields in the table sequence. In the example, an index table is created for
field ABBR of states information
table in A4 to increase matching efficiency. Note that here field ABBR equals a primary key set
temporarily. The requirement for the field of a primary key that no identical
values are allowed applies to its data too.
So the computational efficiency of switch function is obviously higher
than that of cases using run+select@1. Meanwhile, since the relational computation of single foreign key
field is relatively common, esProc specially offers the switch function to simplify the code produced
by using functions as well as to increase the computational efficiency
effectively.
No comments:
Post a Comment