1.Testing purposes
Testing
esProc and Oracle on the same hardware for single machine performance, to
compare the two for performance difference either in large data volume single
task computation or small volume multiple tasks concurrent computation use
cases.
2.Testing contents and methods
Data
volume:
Small data
volume: single fact table around 10G. To avoid the testing results being
affected by operating system cache, multiple concurrent request will be
accessing different fact tables.
Large data
volume: single data table is about 100G.
Algorithm
category:
Testing
the performance of several typical SQL algorithms, including data scan,
grouping, join and large grouping, etc.. Note that the use of these simple
algorithms is for better understanding and direct comparison of the
performance, not because esProc and SQL are identical to each other. In fact,
the two of them focus on different things. esProc is good at procedure
computation with more
complicated business logic, while SQL is for computation
of average complexity.
Complicated algorithm can be realized with different SQL
execution plan, which is out of manual control, and thus not good for
comparison. We'll not do such test.
Category
of Use Cases:
Each
algorithm are tested with multiple use cases according to the width of the
table, data types, etc..
Among
them, the purpose for large grouping use cases is to test the scenario when the
resulting data sets of the grouping is too large to be fit in the memory. So
this report will only test the situation of parallel computing with 100G large
data volume, rather than concurrent computing with small data volume.
Note: the article esProc Oracle Single Machine Performance Comparison Process is an
appendix of this report. Please refer to it for details such as data structures,
test code, test reproduction, etc..
3.Testing Environment
Testing
Machine:Dell Power Edge T610
CPU:Intel Xeon E5620*2
RAM:20G
HDD:Raid5 1T
Operating
System:CentOS 6.4
JDK:1.6
Oracle
Version:11g
esProc
Version:3.1
4.Data Description
The
volume of the data is decided by the amount allowed by exported text file.
During
the single machine test, esProc uses proprietary file format.
As
the purpose is mainly to test big data computation and whole table scan
performance, no primary key or index is build for any table in the database.
For purpose of big data computation test, the primary key field described in
data structure simply means that this field is the logic primary key. No data
repetition is allowed. Primary key is not physically built.
4.1.Data tables and the associated tables
Facts TableT1, T11, T12, T13
Wide
table T1, T11, T12, T13, is to simulate the fact table with large numbers of
data fields. Total number of designed fields is 100. The four tables have
identical structures. T11, T12, T13 are used for small data volume multi-task
concurrent access to different tables to avoid system cache.
Fact Table T2, T21, T22, T23
Narrow
tables T2, T21, T22, T23 are used to simulate fact tables with less fields.
Total number of designed fields is 11. The four tables have identical
structures. T21, T22, T23 are used for small data volume multi-task concurrent
access to different tables to avoid system cache.
Fact
tables are the main data source for this test, and are used in scan, group,
join computation. Tests are done for both large and small data volumes, with
are controlled by inserting different rows of records.
Dimensional table DL2, DL6, DD2, DD6,
DC2, DC6
Dimension
table is only used to test the join (and multiple joins) use cases. These
dimension tables and fact tables will be joined, so will the dimension tables.
These tables have fixed data volume.
4.2.Data volume
Numbers
in the table stands is the number of record rows, not the number of occupied
spaces in bytes.
Note:
the goal for large data volume single machine test is to test the performance
when the memory used by the data is well above physical memory. Our test
standard is the size of a single fact table to be approximately 5 times the
size of physical memory. The memory used by dimension table is far below the
physical memory. Actual number of record rows used could be adjusted according
to the configuration of the machine.
The
goal for small task concurrent single machine test is to show the performance
when memory occupied by the data is less than the physical memory, and the
computation could be done completely in the memory. Our test standard is to use
a single fact table of 50% the size of the physical memory. Memory used by
dimension table is far less than the physical memory. Actual number of record rows
used could be adjusted according to the configuration of the machine.
T11/T12/T13 and T21/T22/T23 are only used for small data volume multi-task
concurrent test.
To make sure
that esProc and Oracle is computing against exactly the same data, we will
export the data generated in Oracle to a binary file format defined by esProc, as the data source for esProc
during the test.
Related:
esProc/Oracle: Single Machine Performance Comparison Test(II)
esProc/Oracle: Single Machine Performance Comparison Test(III)
esProc/Oracle: Single Machine Performance Comparison Test(IV)
Please Click here to download the full version。
Related:
esProc/Oracle: Single Machine Performance Comparison Test(II)
esProc/Oracle: Single Machine Performance Comparison Test(III)
esProc/Oracle: Single Machine Performance Comparison Test(IV)
Please Click here to download the full version。
No comments:
Post a Comment