June 14, 2013

Why esProc is Created?

Data computing is widely used, and business users hope to complete the data computation independently. Although SQL, R, Java, C and other current solutions have powerful computational ability, coding for complex computing is rather cumbersome(R language is better but too difficult to understand).

Data computing demands are both common and complex
  • Data computing is widely used
  • Abundant data exists in the database but difficult to compute directly
Data analysis and query are essential for data computing. Report data source preparation and data management & ETL also involve data computing. Most of the problems are complex and diverse, and the business computing is usually characterized with timeliness and the unpredication. The computation objects are changing constantly and oftentimes available at any time. Users hope to deal with the data computation conveniently.

Solution: SQL (or MDX)
  • Advantage: Enough computational ability to handle the structured data
  • Disadvantage: Difficult to program and understand
SQL provides the comprehensive computation ability for the massive structured data. However, SQL does not support step by step computation, and cannot handle the set data explicitly, sequence and order, and the function of object reference. SQL completes the computation in an unnatural way for human thinking, thus adding difficulty to the writing and understanding.

Current Solution: High-level Programming Languages
  • Advantage: Powerful enough to control the procedure
  • Disadvantage: Complex application environment
  • Disadvantage: Don’t support structured data with very high coding complexity
JAVA, C#, C++, and other high-level programming languages have a complete mechanism for branch and loop; they are very flexible in term of data computation. However, the application environments of them are too complex. In addition, they don’t support massive structured data well. It is inconvenient to operate on the record, set, dataset, and other data type directly.

Current Solution: R Language
  • Advantage: open-source and massive library functions
  • Disadvantage: difficult to understand and higher technical requirement
R boasts its pretty and agile syntax and the open interface for secondary development, so there are a great number of third party packages. But R lacks the good UI interface. Senior technical background and expertise are required to grasp R. R language is also not specialized for structured data computing, and the related support is not elaborate enough.