esProc, A Script Language for Data Analytics with Parallel Mechanism: Application of Index Sequences in esProc

In databases, creating appropriate indexes for some tables can greatly increase query efficiency. Similarly, index sequences can be created for record sequences or table sequences in esProc to increase efficiency in querying data repeatedly.

For example, we need to query food order Order_Foods.txt repeatedly.

Records of food orders queried by A1 are as follows (altogether 50,000 records):

A2 gets 1,000 records of food orders arbitrarily and records their product names and purchase quantities as query conditions for use in the later test query (here repetition is allowed). Data in A2 are as follows:

In the following, in order to test the role of index sequences, we'll query data of food orders among A1's data according to 1,000 names in A2 with and without an index sequence respectively.

First let’s look at the situation without an index sequence. Since records in Order_Foods.txt are sorted by Date, i.e. the order date, binary search cannot be used when searching by product names, otherwise errors will occur.

Expressions in B2 and B3 get the current time through now()function and roughly estimate query time (millisecond). Query results are stored in B3 as follows:

Estimated time for B3 is as follows:

Then let's move to situation where anindex sequence is used:

First create an index sequence corresponding to PName and Quantity, thus binary search can be used to make query by making use of indexes. In order to compare the efficiency of the two situations, time for creating an index sequence is also be included. The index sequence in A4 is as follows:

As binary search is used in A5 to query data, the query condition should be modified to mode x==0. Results are the same as those in A3:

Estimated time for B5 is as follows:

By comparing results in B3 and B5, it can be seen that the second method is much more efficient. That is to say, query speed can be significantly increased by using binary search on the condition that anindex sequence is created. Note that computations are involved to create index sequences. The more we use anindex sequence to query, the more efficient the query becomes. So it is unnecessary to create anindex sequence if query is not frequent.

For specific databases and query modes, it is not necessary to create an index sequence each time query is executed. The index sequence can be stored after it is created. For example:

Thus it is no need to recreate the index sequence for the next query. It will do by simply importing the index file.

In this way, the query speed is faster than creating a new one.

menu

August 19, 2014

Application of Index Sequences in esProc

No comments:

Post a Comment