Showing posts with label SQL data computing. Show all posts
Showing posts with label SQL data computing. Show all posts

July 24, 2013

How to Leverage Big Data like Google?

Recently, I read Why Big Data Projects Fail by Stephen Brobst at: http://data-informed.com/why-big-data-projects-fail. I can’t agree more with his opinions which exposed the problem I’ve been worried about. In this article, I am going to further discuss this topic to remind the enterprises to beware of falling into such pitfall of failure.

Let’s have a look on a positive example. As a successful enterprise in leveraging big data, how does Google make use of the big data?

1. Collect the row data, capture the contents of each website, e-mail, or Cookie, and extract the key information.

2. Create the complex syndetic index for this information. Needless to say, the advertisement-related index must be also created.

3.  Store these indices and corresponding contents in the distributed servers.

4.  When users are browsing website and searching or viewing e-mails, Google will arrange their requests to go through a complex translation procedure, and several index entries will be located accordingly.

5.  Retrieve data from server according to the index, and return the search result or advertisement.

Of all those above-mentioned contents, what contents are related to Hadoop architecture? They are the No. 3 and the No. 5 items. That is, data storing and data retrieving.

Can the No.3 and the No. 5 items be implemented easily? Yes. The alike Hadoop solution is of good expandability and low purchase cost.

Can I operate like Google once implemented the No.3 and No.5 items? No, you can’t because you have not implemented the key items of No.2 and No.4 yet.

What are the items of No.2 and No.4? They are business analysis algorithm. This is the algorithm designed by business experts meticulously on the basis of data, business knowledge, and market trends, as a core competency and business decision making procedure for many enterprises. This is the “Value” component of the 4V Theory.

Why big data will fall into the pitfall of failure? It is because the current big data only provides the solution for data storage and query. It lacks a good solution for business analysis to enhance the competitiveness, which is the most crucial. There is a great gap in-between. In facts, the current big data is the tool for IT experts. They are able to implement the MapReduce functions with C++ or Java, but unable to reach the ultimate goal – provide the valuable business algorithms.

To avoid the pitfall of failure, enterprises must use the advanced analysis tool that is business-expert-oriented, regardless of user’s technical background, and capable to convert the business logics to the business algorithm rapidly, intuitively, and conveniently. How about NoSQL or SQL? Neither of them is ideal. They are for the IT personnel only, owing to their requirements on the strong technical background, complex operations, and comparatively weak computation capability.

What are the ideal tools for business experts? From the TCO perspective, I would rather choose the lightweight R language and esProc Desktop than pin my hopes on the heavyweight Teradata Aster and SAP Visual Intelligence. Especially esProc, this business computation desktop tool is designed for business experts, as its syntax is easy to use and understand with lower technical requirements. The scripts are aligned automatically, allowing users to observe the results of each step clearly and visually. The results can be referenced directly through the names of the cells, enabling users to compute freely according to business logic.

February 28, 2013

SQL Visualization in the Spreadsheet

SQL is a database query and programming language for retrieving, updating, and managing the data from relational database. SQL was certified to meet ANSI in 1986, and became an international standard in 1987. Nowadays, SQL becomes a basic requirement for every programmer. However, the advantages cannot obscure the disadvantage. SQL is especially designed for technical personnel. SQL syntax is highly abstract, the logic is hard to understand, and only those with strong technical background can grasp it. However, in the business office workspace, the non-technical users will usually need the query and process data by themselves. They hope there is a reporting tool with a technical requirement as low as almost zero, and a computing capability as strong as SQL tool.



For example, a secretary needs to prepare a latest list of big clients for a regularly scheduled meeting. The big clients are those who accounts for 50% of the total sales for the company the secretary is working with. Assign or request an IT team to handle it will cost a great deal of time to coordinate. Therefore, he decides to calculate all by himself. He can use the business language to describe the calculation process:

  • Filter by time: Filter out the data of the half year in the order table.
  • Group and summarize: Group the data by client, and the total sales of orders in the group will be the sales of clients.
  • Set the standard for comparison by calculating: The total sales value is the sales sum of all clients. Multiply the total sales by 0.5, and the result is the standard of comparison.
  • Sort by sales value: Sot the clients by sales descendingly.
  • Calculate the cumulative value: Sum up the sales one by one from the highest to the lowest. Suppose if there is a client in the 3rd place, then the cumulative value of this client is the sum of the three sales values among the top 3.
  • Filter out the big client: The calculation goal is to find the clients whose cumulative value is less than the standard of comparison. Suppose if the standard of comparison is greater than the cumulative value of the 5th clients but less than that of the 6th, then the top 5 clients can all be regarded as the big clients.

Previously, those business users who are not familiar with SQL cannot implement the above calculation. Nowadays, esCalc enables them to operate visually and calculate intuitively in their own business language and thinking pattern. For another example:
  • Find the clients whose annual sales are among the top 10 in every year.
  • Collect statistics on the newly opened retail stores, including: How many retail stores newly opened this year? How many retail stores have profited over one million dollars? Of these retail stores, how many of them have opened their businesses overseas?
  • How much does the sales increase compared with that of the previous month?

The above calculation problems are quite common in the modern business office workspace. In fact, these problems are just the combination of SQL filtering, grouping, summarizing, distinct, horizontal joining / vertical union, and other calculation methods.

Raqsoft makes great contribution for SQL visualization. For example, esCalc is the SQL without any requirement on the technical background of users. esCalc embodies and visualizes the calculation methods innovatively, enabling the business users to be same competent and capable as the IT technicians in some business computing.


Related News from Raqsoft:
Truth behind Ticket Purchasing Rush: Statistical Analysis Works
What Makes Self-service Statistical Computing Tools So Important?
Creative Spreadsheet Software for Data Processing in Development

September 6, 2012

Interactive Analytics and OLAP - Part III


In the part ii of interactive analytics and OLAP, we leave a question: can the narrowed OLAP be used to complete the computation process as follows (marketing and sales data analysis)?

 The first n customers whose purchases from the company account for half of the sales volume of the company of the current year;
 The stocks which go up to the limit for three consecutive days within one month;
 Commodities in the supermarket which are sold out at 5 P.M for three times within one month;
 Commodities whose sales volumes in this month have decreased by more than 20% over those of the preceding month;
       … 
       Of course NOT!
Currently OLAP system has two key disadvantages:
The multi-dimensional cube is prepared in advance by the application system and user does not have the capability to temporarily design or reconstruct the cube, so once there is new analysis demand, it is necessary to re-create the analytics cube.
The analysis actions could be implemented by cube are rather monotonous. The defined actions are quite few, such as the drilling, aggregating, slicing, and pivoting. The complicated analysis behavior requiring multi-steps is hard to implement.
Although the current OLAP tools are splendid regarding its look and feel, few on-line analysis capabilities powerful enough are provided actually.

       Then, what kind of OLAP do we need? What kind of OLAP tools we need?
       It is very simple, and we need a kind of on-line analytical system that can support evaluation process, which SQL data computing or excel computation can handle.

       Technically speaking, steps for evaluation process can be regarded as computation regarding data (query can be understood to be filter computation). This kind of computation can be freely defined by user and user can occasionally decide the next computation action according to the existing intermediate result, without having to model beforehand. Additionally, as data source is generally database system, it is necessary to require this kind of computation to be able to very well support mass structured data (tools like esProcinstead of simple numeric computation. And evaluation process is what business need especially in marketing and sales data analysis.

       Then, can SQL (or MDX) play this role?
       SQL is indeed invented for this aim and it owns complete computation capability and it adopts a writing style similar to natural language.
But, as SQL computation system is too basic, it is very difficult and over-elaborate to achieve complex computation by a SQL data computing, such as problems listed in the preceding paragraphs. It is even not so easy for programmers who have received professional training, so ordinary users can only use SQL to implement some of the simplest queries and aggregate computation (based on the filter and summarization of a single table). This result leads to the fact that the application of SQL has already deviated far away from its original intention of invention, almost becoming the expertise for programmers.
We should follow the working thought of SQL to carefully study the specific disadvantage of SQL and find the way to overcome it in an effort to develop a new generation of computation system, thereby implementing the evaluation process, namely, the real OLAP, instant data analytics.
 Sponsored by Raqsoft.
Related Articles: