June 8, 2014

Features:Parallel computing

esProc supports the multi-thread computing on single node and parallel computing on multiple nodes without center. The big data can be divided into several small data blocks, then used in multiple node machine for parallel computing, and lastly merging and computing.

Scale-up and Scale-out

esProc offers multi-thread computing on standalone machine to meet the need of expanding the hardware capacity. Users can set the number of threads according to CPU cores and the computing workload, and expand the memory and hard disk at any time. esProc has perfect scale-up ability and supports the large memory, local disk file, and redundant data.

In esProc, the multi-node structure with no center is adopted to support the parallel computing capable of scale-out. The requirement of esProc on computing node is relatively low. All kinds of servers can act as the computin gnodes – no matter it is the low end PC or the midrange and high end servers. esProc can run on both Windows and Linux.

The nodes in the cluster can increase or decrease freely as necessary. The node launching the task can also be replaced and assigned by the programmers. One node machine can act as the sub, or the main to distribute the task to the below nodes. By doing so, the calling with multi-level nests can be formed.

Intelligent job Distribution

With the controllable task distribution mechanism, the parallel computing handled with esProc is much more flexible. Programmers can set the scope of node involved in computing, control the scope of subtask flexibly, and distribute the computing workload based on the characteristics of tasks and nodes. In addition, the external parameters can be used in the parallel computing, and users can self-adjust the scope of nodes, number of tasks, and size of task. 

esProc is especially designed for the small and medium size clusters. Such users have relatively less nodes, and less possibility to encounter errors. They usually can run normally for quite a long time. Such users are more eager to have a controllable job distribution mechanism.

esProc supports the node auto-selection and fault tolerance. esProc will search the free node in the user-specified scope, and replace the node automatically to proceed in case of error. esProc supports local files and LAN file to further lower the hardware requirements on scale-out. esProc also supports HDFS and other redundant data to ensure the reliability and stability of big data computing. Users can choose between the cost efficiency and reliability according to their computing task.

Data share and exchange

esProc supports the data sharing within the nodes. If the same data is used in every thread of the task, then esProc will allows you to set it as the global variable for sharing in multiple threads of nodes to boost the performance and memory usage efficiency. For example, when node machine is used for the big data computing of first associating and then summarizing, the associated table can act as the global constant in most cases. In order to avoid the conflict of concurrent tasks, esProc allocates the private task space for each task. The global variable of the same name in different tasks will not conflict with each other. Designed to cover both the global variable and the private space, esProc can elevate the performance and effectively guarantee the stability of tasks.

esProc supports data exchange methods between nodes: the direct in-memory exchange and the external storage file buffer. For the small result set, the data exchange between nodes can be implemented directly via in-memory exchange. For the big data set, you can use the file in the external storage to exchange the data. Users can choose the data exchange method based on the task characteristics, so as to keep the balance between the fault-tolerant ability and performance. For the small tasks running concurrently, the in-memory exchange can achieve higher performance. For the individual big task, the exchange in external storage can ensure the reliability.

esProc is especially designed for the middle and small cluster. The fault rate of such cluster is slim, and the reliability is relatively high. So, esProc provides two methods for users to choose freely, and allows users to take the performance as a top priority.

All in all, esProc features the script language specially designed for the distributed computing with complete computing architecture. Users can translate the business requirement intuitively into the (semi) structured language to implement the details of basic algorithm. The development difficulty of parallel computing can be reduced effectively.