April 24, 2014

Several Methods for Structured Big Data Computation

All data can only have the existence value by getting involved in the computation to create value. The big data makes no exception. The computational capability on structural big data determines the range of practical applications of big data. In this article, I'd like to introduce several commonest computation methods: API, Script, SQL, and SQL-like languages.

API: The "API" here refers to a self-contained API access method without using JDBC or ODBC. Let's take MapReduce as an example. MapReduce is designed to handle the parallel computation cost-effectively from the very bottom layer. So, MapReduce offers superior scale-out, hot-swap, and cost efficiency. MapReduce is one of the Hadoop components with open-source code and abundant resources.
Sample code:
                public void reduce(Text key, Iterator<Text> value,
                OutputCollector<Text, Text> output, Reporter arg3)
                throws IOException {
            double avgX=0;
            double avgY=0;
            double sumX=0;
            double sumY=0;
            int count=0;
            String [] strValue = null;
            while(value.hasNext()){
                count++;
                strValue = value.next().toString().split("\t");
                sumX = sumX + Integer.parseInt(strValue[1]);
                sumY = sumY + Integer.parseInt(strValue[1]);
            }

            avgX = sumX/count;
            avgY = sumY/count;
            tKey.set("K"+key.toString().substring(1,2));
            tValue.set(avgX + "\t" + avgY);
            output.collect(tKey, tValue);
        }
Since the universal programming language adopted is unsuitable for the specialized data computing, MapReduce is less capable than SQL and other specialized computation languages in computing. Plus, it is inefficient in developing. No wonder that the programmers generally complain it is "painful". In addition, the rigid framework of MapReduce results in the relatively poorer performance.
There are several products using API, and MapReduce is the most typical one among them.

Script: The "Script" here refers to the specialized script for computing. Take esProc as an example. esProc is designed to improve the computational capability of Hadoop. So, in addition to the inexpensive scale-out, it also offers the high performance, great computational capability, and convenient computation between heterogeneous data sources, especially ideal for achieving the complex computational goal. In addition, it is the grid-stylescript characterized with the high development efficiency and complete debug functions.
Sample code:

A
B
1
=file(“hdfs://192.168.1.200/data/sales.txt”).size() 
//file size
2
=10     
//number of tasks
3
=to(A2)
//1 ~ 10, 10 tasks
4
=A3.(~*int(A1/A2))
//parameter list for start pos
5
=A3.((~-1)*int(A1/A2)+1)
//parameter list for end pos
6
=callx(“groupSub.dfx”,A5,A4;[“192.168.1.107:8281”, 
 “192.168.1.108:8281”]) 
//sub-program calling, 10 tasks to 2 parallel nodes
7
=A6.merge(empID)    
//mergingtaskresult
8
=A7.group@i(empID;~.sum(totalAmount):orderAmount
,~.max(max):maxAmount,~.min(min)
:minAmount,~.max(max)/~.sum(quantity):avgAmount)       
//summarizing is completed
Java users can invoke the result from esProc via JDBC, but they are only allowed to invoke the result in the form of stored procedure instead of any SQL statement. Plus,esProc is not open source. These are two disadvantages of esProc.

The Script is widespread used in MongoDB, Redis, and many other big data solutions, but they are not specialized enough in computing. For another example,the multi-table joining operation for MongoDB is not only inefficient, but also involves the coding of one order of magnitude more complex than that of SQL or esProc.

SQL: The "SQL" here refers to the complete and whole SQL/SP, i.e. ANSI 2000 and its superset. Take Greenplum as an example, the major advantages of Greenplum SQL are the powerful computing, highly efficient developing, and great performance. Other advantages include the widespread use of its language, low learning cost, simple maintenance, and migration possibility -not to mention its trump-card of offering support for stored procedure to handle the complex computation. By this way, business value can be exploited from the big data conveniently.
Sample code:
CREATE OR REPLACE function view.merge_emp()
returns voidas$$
BEGIN
                  truncate view.updated_record;
                  insert into view.updated_recordselect y.* from view.emp_edw x right outer join        emp_src y       on x.empid=y.empid where x.empid is not null;
                  update view.emp_edwset deptno=y.deptno,sal=y.salfrom view.updated_record y       where          view.emp_edw.empid=y.empid;
                  insert into emp_edwselect y.* from emp_edw x right outer join emp_src y on    x.empid=y.empid where  x.empid is null;
end;
$$ language 'plpgsql';

The other databases with the similar structure to MPP include Teradata, Vertical, Oracle, and IBM. Their syntax characteristics are mostly alike. The disadvantages are similar. Theacquisition cost and the ongoing maintenance expenses are extremely high. Charging its users by data scale, the so-called inexpensive Greenplum is actually not a bargain at all - it is way more like making big money under cover of big data. Other disadvantages include awkward debugging, incompatible syntax, lengthy down-time if expansion, and awkward multi-data-source computation.

SQL-likelanguage: It refers to the output interfaces like JDBC/ODBC and only limited to those scripting languages that are the subset of standard SQL. Take Hive QL as an example. The greatest advantage of Hive QL is its ability to scale out cost-effectively while still a convenient tool for users to develop. The SQL syntax feature is kept in Hive QL, so that the learning cost is low, development efficient, and maintenance simple. In addition, Hive is a component of Hadoop. The open-source is another advantage.
Sample code:

SELECT e.* FROM (
         SELECT name, salary, deductions["Federal Taxes"] as ded,
                  salary * (1 – deductions["Federal Taxes"]) as salary_minus_fed_taxes
         FROM employees
         ) e
WHERE round(e.salary_minus_fed_taxes) > 70000;
The weak point of Hive QL is its non-support for stored procedure. Due to this, it is difficult for HiveQL to undertake the complex computation, and thus difficult to provide the truly valuable result. The slightly more complex computation will rely on MapReduce. Needless to say, the development efficiency is low. The poor performance and the threshold time can be regarded as a bane, especially in task allocation, multi-table joining, inter-row computation, multi-level query, and ordered grouping, as well as implementing other algorithm alike. So, it is quite difficult for HiveQL to implement the real-time Hadoop application for big data.

There are also some other products with SQL-like languages - MongoDB as an example - they are still worse than Hive yet.


The big data computation methods currently available are no more than these 4 types of API, Script, SQL, and SQL-like languages. Wish they would develop steadfastly and there would be more and more cost-effective, powerful, practical and usable products for data computing.


Top 3 Salespersons Ranking by Monthly Sales Amount R language VS esProc

Both R language and esProc have the outstanding ability to perform the stepwise computations. However, in the particulars they differ from each other. A comparison between them will be done by the following example:
A company’s Sales department wants to select out the outstanding salespersons through statistics, that is, the salespersons whose sales amounts are always among the top 3 in each month from the January this year to the previous year. The data is mainly from the order table of MSSQL database: salesOrder, and the main fields include the ID of order: ordered, name of salesperson: name, sales amount: sales, and date of order: salesDate.
The solution is like this substantially:
  1. Compute the beginning dates of this year and this month, and filter the data by date.
  2. Group by month and salesperson, and compute the sales amount of each salesperson in each month.
  3. Group by month, and compute the rankings of sales amount in each group.
  4. Filter out the top 3 salespersons from each group.
  5. Compute the set of intersections of each group, that is, salespersons always among the top 3 in each month.
The solution of R language is as shown below:
01   library(RODBC)
02   odbcDataSources()
03   conn<-odbcConnect(“sqlsvr”)
04   originalData<-sqlQuery(conn,’select * from salesOrder’)
05   odbcClose(conn)
06   starTime<-as.POSIXlt(paste(format(Sys.Date(),’%Y’),’-01-01′,sep=”))
07   endTime<-as.POSIXlt(paste(format(Sys.Date(),’%Y’),format(Sys.Date(),’%m’),’01′,sep=’-’))
08   fTimeData<-subset(originalData,salesDate>=starTime&salesDate<endTime)
09   gNameMonth<-aggregate(fTimeData$sales,list(fTimeData$name,format(fTimeData$salesDate,’%m’)),sum)
10   names(gNameMonth)<-c(‘name’,’month’,’monthSales’)
11   gNameMonth$rank<- do.call(c, tapply(gNameMonth$monthSales, gNameMonth$month,function(x) rank(-x)) )
12   rData<-subset(gNameMonth,rank<=3)
13   nameList<- split(rData$name, rData$month)
14   Reduce(intersect, nameList)
The solution of esProc is as shown below:
esProc
esProc
Then, let’s compare the two solutions by checking the database access firstly:
R language solution implements the data access from Line01 to 05 through relatively a few more steps, and this is acceptable considering it as the normal operations.
esProc solution allows for directly inputting SQL statements in the cell A1, which is quite convenient.
In respect of database access, R language and esProc differ to each other slightly. Both solutions are convenient.
Secondly, compare the time function:
R language solution computes the beginning dates of this year and this month through line 06-07. Judging from this point, R language is abundant in the basic functions.
esProc solution completes the same computation in A2 and B2, in which pdate function can be used to compute the beginning date of this month directly, which is very convenient.
In respect of date function, it seems that esProc is slightly better, while R language has a huge amount of 3rd-party-function library, and maybe there is any date function that is easier to use.
The focal point is step computation:
Firstly, filter by date, group by month and sales person and then summarize by sales amount. The above functionalities are implemented respectively in line 8-9 for R language and cell A3-A4 for esProc. The difference is not great.
Proceed with the computation. According to the a bit straightforward thought of analysis, the steps followed should be: 1 Group by month; 2 Add the field of ranking in the group, and compute the rankings; 3 Filter by ranking, and only keep the salespersons that achieved the sales amounts ranking the top 3 in each group; 4. Finally, compute the set of intersection on the basis of the data in each group.
The corresponding codes of R language are from line 10 – 14 in the order of 2->3->1->4. In order words, rank the data in each group throughout the whole table, and then group. Have you noticed anything awkward? Although it is the ranking within the group, users of R language have to sort first and then group! This is because R language is weak in the ability to group first and then process. To barely compose the statements following the train of thought of 1->2->3->4, users of R language must have a strong technical background to handle the complex iteration statement expressions. The style of reverse thinking on this condition will greatly simplify the codes.
esProc solution completes the similar computation in the cell A5 – A8, not requiring any reverse thought. esProc users can simply follow their intuitive thinking of 1->2->3->4. This is because that esProc provides the ingenious representing style of ~. The ~ represents the current member that takes part in the computation. For this case, the ~ is each 2-dimension table in the group (corresponds to the data.frame of R language or the resutSet of SQL). In this way, ~.monthSales can be used to represent a certain column of the current 2-dimension table. By compassion, users of R language can only resort to some rather complicated means like loops to access the current member, which is more troublesome for sure.
With regard to this comparison, esProc is more intuitive with relatively more advantageous.
Next, let’s study on their abilities in computing the intermediate results.
R language allows users to view the result of each step by clicking the variable name at any time, with RStudio and other tools.
esProc provides only one official tool, that is, click the cell to view the result of this step.
Regarding this ability, esProc does not differ from R language much. Considering that R language supports for a great many of 3rd party tools, maybe there is any tool capable of providing the better observed results.
Then, let compare their abilities to reference the result.
R language users are only required to define a variable for the result of computation in each step to conveniently reference the result in the steps followed with regard to the R language solution.
esProc users can also define variables to reference, however, using the cell name as the variable name is more convenient and saves the trouble of finding a meaningful name.
Next, let’s compare their performances on set of intersection.
In the last step, the intersection set of data of every group are to be computed. R language provides the intersect function at the bottom layer, using together with Reducefunction, the intersection set of multiple groups of data can be computed.
esProc provides isect function to compute the set of intersection on multiple sets, which is quite convenient.
Comparatively, R language provides the Reduce function of greater imaginary space, and esProc is easier.
As it can be seen from the above case, R language boasts the abundant fundamental functions and a huge amount of library functions from the 3rd party.
In respect of data member access, esProc provides the excellent representing style, in particular the grouping at multi-levels. By comparison, R language relies more on the loop statements.
Both esProc and R language solutions have excellent performances in respect of interaction.

April 21, 2014

Solving Complex Computations in the Report

Reporting tool is good at chart & form design, style of landscaping, query interface, entry & report, and export & print. It is one of the tools that are applied most extensively. However, there are quite often complex computations in the report, which raises a very high requirement for technical capabilities of report designers, and is one of the biggest barriers in report design. esProc can cooperate with Java reporting tools and solve with ease the complex computations in the report.


Case and Comparison
A company has a High Growth SalesMan of the Year report, which analyzes, mainly through sales data, the salesmen whose sales amount exceeds 10% for three consecutive months, and demonstrates the indices such as their sales amount(Sales Amount), sales amount link relative ratio(Amount LRR), client count(Client Count), and client count link relative ratio(Client LRR). The report pattern is shown in following table:

 
The main data source of the report is the “monthly sales data”: sales table, which stores the monthly sales record of the salesmen, with salesman and month being the primary key. The structure is shown in the following table:


It can be seen that the calculation of the name-list of the salesmen whose sales amount exceeds 10% for three consecutive months is the most complex part of this report. As long as this name-list is calculated out, it is possible to use the reporting tool to easily present the remaining part. Let’s compare how SQL statement and esProc respectively calculate this name-list.

SQL Solution
01 WITH A AS
02       (SELECT salesMan,month, amount/lag(amount) 
03           OVER(PARTITION BY salesMan ORDER BY month)-1 rising_range 
04           FROM sales), 
05      B AS
06            (SELECT salesMan, 
07                CASE WHEN rising_range>=1.1 AND
08                     lag(rising_range) OVER(PARTITION BY salesMan
09                          ORDER BY month)>=1.1 AND
10                     lag(rising_range,2) OVER(PARTITION BY salesMan
11                          ORDER BY month)>=1.1 
12                THEN 1 ELSE 0 END is_three_consecutive_month 
13      FROM A) 
14 SELECT DISTINCT salesMan FROM B WHERE is_three_consecutive_month=1

1.        1-4 lines: Use SQL-2003 window function to obtain the ”rising_range” of the monthly sales amout LRR of each salesman, where, ”lag” seeks the sales amount relative to the preceding month. Here, ”WITH” statement is used to generate an independnet sub-query.

2.         5-13 lines: Continue to use window function to seek ”is_three_consecutive_month_gains”, the symbol of consecutive gains  of slaesmen in the each record, where, ”rising_ranges” of the recent three months are  biggern than 1.1 at the same time, and this symbol is 1. Otherwise it equals to 0, and here the technique ”case when” is used. Finally, ”WITH” statement is still used to generate independent sub-query B.

3.        Line 14: According to the result in the preceding two steps, seek the salesmen meeting the reporting condition, namely, the record whose “is_three_consecutive_month_gains equals 1. Here it is necessary to use “distinct” to filter duplicate salesmen.


esProc Solution



A1: Group the data according to salesman. Each group is all the data of a salesman, which is sorted by month in ascending order.

A2: Refer to the calcualtion result of the preceding step, and select the group that meets the condition from A1. The condition comes from the last cell of A1 operation area, namely, Cell B3. Both B2 and B3 belong to A1 operation area. By writing the condition step by step in many cells, it is possible to reduce the difficulty.

B3: Conditional judgment. If the LRR of three consecutive months within the group is bigger than 1.1, then this group of data meets the condition. Here “amount [-1]” is for the data of preceding record relative to the data of the current record, amount/amount [-1] represents a LRR comparsion. The pselect() is used to obtain the serial number within the group, and whenever meeting the first piece of data within the group that meets the condition, pselect() immediately returns the serial number and stops repeated calculations.

A4: Obtain the serial number of the salesmen in A2, and this result is returned through JDBC to the reporting tool for use.

Comparison
The method to calculate this case “stepwise” will be very clear, so it is relatively suitable for stored procedure. But report developers often cannot add stored procedure in the database at their discretions, so it is generally still necessary to use SQL statement to solve the problem. For general SQL-92 statement to solve this type of problem, it will be very troublesome. By using here the SQL-2003 standard that is not extensively used, it is possible to reduce the difficulty. Even so, it is still necessary to face large paragraphs of difficult-to-understand SQL. For common report developers, it is no doubt a huge challenge.

It is more agile and easy for esProc to solve this type of calculation. esProc provides an expression formula using grid style similar to Excel®, which naturally proceeds by steps. Cells can refer to calculation result one another, which saves the great efforts of complex nested queries as well as unnecessary and scrambled variables definition. esProc also provides functions on the calculations of mass data, such as relative position, serial number reference, and step-by-step calculation after grouping, which can greatly simplify calculation procedure.
From the above, it is obvious that esProc is better at solving the complex computation in the report.

Feature: a JDBC Interface
esProc is a product with pure JAVA® structure and provides JDBC interface for JAVA reporting tools to conveniently call it. The structure schematic is as follows:


In the system adopting JAVA reporting tool, it is possible for esProc to conduct complex computation, multiple-datasource operation, and dirty data source collation. Then, the reporting tool can obtain the result returned from esProc via JDBC in the form of an access to the database. Finally, the reporting tool can be used to present the data.

Feature: Computational Capabilities Over-perform SQL

esProc is a tool specially designed to calculate mass data, and has SQL statement and stored procedure the capability to. On the one hand, it conducts query, filter, grouping, and statistics just as SQL statement does; on the other hand, it can also conduct loop and branch judgment on analysis process just as stored procedure does.

In fact, SQL statement and stored procedure, which are also mass data calculation tools, have some obvious defects: Stepwise mechanism is incomplete, set-lization is incomplete, and there are lacks of serial number and reference. So in the report where is complex computation, designing a few lines of SQL statement tends to become very difficult, and also has very high requirement for technical capabilities of designers.

esProc overcomes the defects of SQL statement and can comfortably cope with the complex computation in the report.

About esProc: http://www.raqsoft.com/product-esproc

April 20, 2014

How to Facilitate Relational Reference: Generic, Sequence, and Table Sequence

Based on the generic data type, esProc provides the sequence and the Table Sequence for implementing the complete set-lizing and the much more convenient relational queries.

The relation between the department and the employee is one-to-many and that between the employee and the SSN (Social Security Number) is one-to-one. Everything is related to everything else in the world. The relational query is the access to relational dataset with the mathematical linguistics. Thanks to the associated query, the relational database (RDBMS) is extensively adopted.
I Case and Comparison
Case
There is a telecommunications enterprise that needs to perform this analysis: to find out the annual outstanding employees whose line manager having been awarded the president honor. The data are from two tables: the first is the department table mainly consisting of deptName and manager fields; and the second is the employee table mainly consisting of the empName, empHonor, and empDept fields;

For empHonor, three kinds of values can be obtained: First, null value; Second, ”president's award” and PA for short; Third, ”employee of the year” and EOY for short; The corresponding relations are usually belong to either of the two below groups: empDept & deptName, and Manager & empName.

SQL Solution
SELECT A.* 
FROM employee A,department B,employee C 
WHERE A.empDept=B.deptName AND B.manager=C.empName AND A.empHonor=‘EOY’ AND C.empHornor=‘PA’

Complex SQL JOIN query can be used to solve such problems. In this case, we choose the nested query that is brief and clear. The association statements after “where” have established one-to-many relation between deptName and empDept, and the one-to-one relation between manager and empName.

esProc Solution
employee.select(empHonor:"EOY",empDept.manager.empHornor:"PA")

The esProc solution is quite intuitive: select the employees with EOY on condition that the line respective managers of these employees have won the “PA”.

Comparison
Regarding the SQL solution, the SQL statements is lengthy and not intuitive. Actually, the complete associated query statement is “inner join…on…”. We have put it in a rather simplified way or the statements would be even harder to comprehend.

Regarding the esProc solution, the esProc fields are of generic type, which can point to any data and dataset. Therefore, you can simply use ”.” symbol to access the associated table directly. By representing in such intuitive and easy-to-understand way, esProc users can convert the complicated and lengthy SQL statement for multiple table association to the simple object access. This is unachievable if using SQL.

II Function Description:
Generic Data Type

The data in esProc are all of generic type, that is, the data types are not strictly distinguished. Therefore, a data can be a simple data like “1” or “PA” ,or a set like [1,” PA”], or a set composed of sets like the database records.
Sequence






A sequence is a data structure specially designed for the mass data analysis. It is similar to the concept of “array + set” in the senior language. That is to say, esProc users can assess members of any type according to its serial number, and perform the intersection, union, and complementary set operations on these members. The sequence is characterized with two outstanding features: generic type, and being ordered.

For example, let’s suppose that the sequence A is a set of line managers, and the sequence B is a set of award-winning employees. Then, the award-winning departments can be computed as a result of A^B. The top three departments can be obtained as a result of [1,2,3] (Please refer to other documents for the characteristics of being ordered).

esProc provides a great many of easy-to-use functions for sequence. The analysis will be greatly simplified if you grasped the use of sequence well.
Table Sequence
The Table Sequence is a sequence of database structure. As a sequence, it is characterized by being generic and ordered. In addition, Table Sequence also inherited the concept of database table that allows for the access to data with the field and the record.













The characteristics of generic type allow for the associated query in a quite convenient way in which the access to the record of associated table is just like the access to object. For example, to access the line manager of a certain employee, you can just compose “empDept.manager”. By comparison, the counterpart SQL syntax requires quite lots of wieldy association statements: “from…where…” or “left outer/right outer/inner join…on…”

Moreover, the characteristics of being ordered are quite useful and convenient for solving the tough computational problems relating to the Table Sequence and serial numbers, such as computing the top N, year-on-year statistics, and link relative ratio analysis.
III Advantages
The Access Syntax to Convert Complexity to Simplicity
esProc users can use ”.” to access the record in the associated table. Compared with the lengthy and complicated association syntax of SQL, such access method and style is much easier.

Intuitive Analysis is Ideal for Business Specialist
Analyzing from the business aspect, the business specialist can reach the result more correctly and rapidly. esProc users can access to the associated data in an intuitive way following the business descriptions and thus it is ideal for business specialist.

Easy to Analyze and Solve Problem
The sequence and table sequence of esProc is fit for processing the mass data. Even for the complicated multiple-table association, esProc users can solve the problems conveniently in the process of data analysis. 

About esProc: http://www.raqsoft.com/product-esproc

April 17, 2014

esProc Helps Database realize Real-time Big Data computing

The Big Data Real-time Application is a scenario to return the computation and analysis results in real time even if there are huge amount of data. This is an emerging demand on database applications in recent years.

In the past, because there are not so many data, the computation is simple, and few parallelisms, the pressure on the database is not great. A high-end or middle-range database server or cluster can allocate enough resource to meet the demand. Moreover, in order to rapidly and parallelly access to the current business data and the historic data, users also tend to arrange a same database server for both the query analysis system and the production system. By this way, the database cost can be lowered, the data management streamlined, and the parallelism ensured to some extent. We are in the prime time of database real time application development.

In recent years, due to the data explosion, and the more and more diversified and complex application, new changes occur to the database system. The obvious change is that the data is growing at an accelerated pace with ever higher and higher volume. Applications are increasingly complex, and the number of concurrent accesses makes no exception. In this time of big data, the database is under increasing pressure, posing a serious challenge to the real-time application.

The first challenge is the real-time. With the heavy workload on the database, the database performance drops dramatically, the response is sluggish, and user experience is going from bad to worse quickly. The normal operation of the critical business system has been affected seriously. The real-time application has actually become the half real-time.

The second challenge is the cost. In order to alleviate the performance pressure, users have to upgrade the database. The database server is expensive, so are the storage media and user license agreement. Most databases require additional charges on the number of CPUs, cluster nodes, and size of storage space. Due to the constant increase of data volume and pressure on database, such upgrade will be done at intervals.

The third challenge is the database application. The increasing pressure on database can seriously affect the core business application. Users would have to off-load the historic data from the database. Two groups of database servers thus come into being: one group for storing the historical data, and the other group for storing the core business data. As we know, the native cross-database query ability of databases are quite weak, and the performance is very low. To deliver the latest and promptest analysis result on time, applications must perform the cross-database query on the data from both groups of databases. The application programing would be getting ever more complex.

The forth challenge is the database management. In order to deliver the latest and promptest analysis result on time, and avoid the complex and inefficient cross-database programming, most users choose to accept the management cost and difficulty increase - timely update the historic library with the latest data from the business library. The advanced edition of database will usually provide the similar subscription & distribution or data duplicate functions.
The real-time big data application is hard to progress when beset with these four challenges.

How to guarantee the parallelism of the big data application? How to reduce the database cost while ensuring the real-time? How to implement the cross-database query easily? How to reduce the management cost and difficulty? This is the one of hottest topics being discussed among the CIOs or CTOs.

esProc is a good remedy to this stubborn headache. It is the database middleware with the complete computational capability, offering the support for the computing no matter in external storage, across databases, or parallelly. The combination of database and esProc can deliver enough capability to solve the four challenges to big data applications.














esProc supports for the computation over files from external storage and the HDFS. This is to say, you can store a great volume of historical data in several cheap hard disks of average PCs, and leave them to esProc to handle. By comparison, database alone can only store and manage the current core business data. The goal of cutting cost and diverting computational load is thus achieved.

esProc supports the parallel cluster computing, so that the computational pressure can be averted to several cheap node machines when there are heavy workload and a great many of parallel and sudden access requests. Its real-time is equal or even superior to that of the high-end database.

esProc offers the complete computational capability especially for the complex data computing. Even it alone can handle those applications involving the complex business logics. What's even better, esProc can do a better job when working with the database. It supports the computations over data from multiple data sources, including various structural data, non-structural data, database data, local files, the big data files in the HDFS, and the distributed databases. esProc can provide a unified JDBC interface to the application at upper level. Thus the coupling difficulty between big data and traditional databases is reduced, the limitation on the single-source report removed, and the difficulty of the big data application reduced.

With the seamless support for the combined computation over files stored in external storage and the database data, users no longer need the complex and expensive data synchronization technology. The database only focus on the current data and core business applications, while esProc enable users to access both the historic data in external storage and the current business data in database. By doing so, the latest and promptest analysis result can be delivered on time.

The cross-database computation and external storage computation capability of esProc can ensure the real-time query while alleviating the pressure on database. Under the assistance of esProc, the big data real-time application can be implemented efficiently at relatively low cost.

About esProc: http://www.raqsoft.com/product-esproc