August 31, 2012

Interactive Analytics and OLAP - Part I


Many BI practitioners have heard about OLAP which is an important constituent part of business intelligence. And today we will talk about what OLAP is indeed for actual need? What is real OLAP? What is instant OLAP for instant data analytics?




Understood literally, OLAP is online analytical processing, that is, users conduct analytical operation on real-time business data.  

But, currently the concept of OLAP is seriously narrowed, and it only refers to operations such as conducting drilling, aggregating, pivoting and slicing based on multi-dimensional data, namely, multi-dimensional interaction analysis. 
  To apply this kind of OLAP, it is necessary to create in advance a group of topic specific data CUBEs for data analytics in OLAP tool. Then users can display these data in the form of crosstab or graph and conduct in various real-time transformations (pivoting and drilling) on them, with the hope to find in the transformation process a certain law of the data or the argument to support a certain conclusion, thereby achieving the aim of data analytics.

Do we need this kind of OLAP?
   To answer this question, we need to carefully investigate the real application process of the OLAP, thereby finding out what the technical problem the OLAP tools needs to solve is on earth.

Employees with years’ working experiences in any industry generally have some educated guesses about the businesses they engage in, such as: 
       A stock analyst may guess stocks meeting a certain condition are likely to go up.
       An employee of an airline company may guess what kinds of people are accustomed to buying what kind of flights.
       A supermarket operator may also guess the commodity at what price is more suitable for the people around the supermarket.
...
Evidently, this type of computation demand is ubiquitous in business analysis process and all can be computed out from historical database. Then how about instant data analytics, not from historical database?

Related Articles:
Interactive Analysis and Related Tools - Part II
Interactive Analysis and Related Tools Part III

August 27, 2012

Interactive Analysis and Related Tools Part III - Data Analytics Tools


       Following my last two articles, today i will go on with the interactive analysis and related tools. And what about the data analytics tools? Most business intelligence tools manufacturer are engaging in developing the most efficient analytics tools, and there are various data analytics tools targeting at different purposes.
What's the Requirements on analysis tools?
The characteristics of interactive analysis determine its requirements on computation tool:
Abundant library function or fixed algorithm
Provide a convenient interactive procedure
Business expert can handle
Support massive structural data
Here are Common tools for interactive analysis.
Based on the requirements on the interactive analysis tool, we can list some common tools, just name a few: 
Excel. Please refer to: http://office.microsoft.com/en-us/excel/
R language. Please refer to: http://www.r-project.org/
esProc. Please refer to: http://www.raqsoft.com/products
SQL. Please refer to: http://en.wikipedia.org/wiki/SQL
SAS. Please refer to: http://www.sas.com/
SPSS. Please refer to: http://www-01.ibm.com/software/analytics/spss/
Matlab. Please refer to: www.mathworks.com/products/matlab/
There are many other excellent tools for interactive analysis. However, one point to note is that none of them is perfect on all aspects. For example, the analyst may find that it is easy to grasp Excel but hard to compose SQL statements; esProc lacks of the non-linear model, but it can provide a convenient interactive process; SPSS boasts the abundant fixed algorithms but is not as convenient as SQL in relation query.
The tool suitable for your needs is the correct one. Please refer to the Comparison between Interactive Analysis Tools to join us on choosing the analysis tool suitable for your needs.
As you may find that the interactive analysis is similar to OLAP in some respects. And next, i plan to write an article on interactive analysis and OLAP. Wish you like it!
Related Articles:

August 22, 2012

Interactive Analysis and Related Tools - Part II


In my last article, I have talk about interactive analysis from the definition and explains with an example, and today we will discuss the characteristics of interactive analytics.

As we can see from the above examples, the real world business data analysis is far more complex than the theory. The commercial opportunity changes unpredictably and comes and goes in a moment of doze. In fact, the computation on the business activities is usually fuzzy. There are few model algorithms from textbook that can be used to solve the real business situation. The interactive analysis computation is to solve the problem in the real world. Business intelligence tools should be more simple, and most importantly, interactive analysis should be simplied. Let’s check the characteristics of interactive analysis.

Fixed algorithm as bottom layer
Interactive analysis can be always resolved to the fixed algorithm. For example, ranking algorithm is usually used to compute the “Appearance of Large Order”; grouping algorithm is usually used to compute the “which sector sees the intensive procurement by clients”.

Focus on the interactive procedure
The bottom layer of interactive analysis is the fixed algorithm though, the human intervention is necessary. How to break down the target? How to set the priority of branches? Whether to carry on the mining or not? Is the existing result enough to support the decision-making? Is the further computation necessary? Theoretically speaking, the power enough computer programs can implement the above network-like branches, and thus turn it into the fixed algorithm. However, before the The Matrix and Neo born, the analyzers will have to take great effort in it.

Focus on the business expert
Interactive analysis is to solve the problem in the real world. The assumption will have to make on the basis of business status, and the next step computation will be decided on the current data and business experiences. To do this, the abundant business knowledge is required. The qualified analyzer is usually the business expert. The database administer and programmer are more fit to seek the solutions to the fixed algorithm and they are able to provide the assistance in computation but hard to make the most important business decision.

Take massive structured data as the primary goal
The massive structured data is the data capable to be represented with a 2-dimention structure. Of the massive structural data, the typical examples are the data from database and spreadsheet, and text file. In the business activities of real world, these data are the most common and fundamental, acting as the base of business calculation.

This is the End of Part II for interactive analysis. In the next part, I will talk about the related tools for interactive analysis.
To be continued…

Related Reading
Interactive Analysis and Related Tools - Part I

August 20, 2012

Interactive Analysis and Related Tools - Part I



From today, I will talk something about interactive analytics and a series of related tools in business intelligence industry. For those who doesn’t know much about interactive analysis and its tools, I wish my articles are helpful to you. I will explain interactive analytics from its definition and with case examples to help you understand today.

Definition
Interactive analysis is a cycle analysis procedure of assumption, validation, and adjustment to achieve the fuzzy computation goal.
The interactive analysis is the real on-line analysis to solve the complex computation problem in the real world, and it is one of the key points in the business computation.

Example of Case
Let us explain the interactive analysis with a common example in the business activities.

Step 1 Set the goal
Why the sales volume this month greatly exceeds that of the previous month?
Obviously, this is a fuzzy computation goal with several possible answers. You cannot get the result directly using any analysis mode.

Step 2 Guess the possible branch
Since there are several possibilities to give rise to the sales volume increase, the analyzer has to check every possibility, such as:
l         Orders numbers increase
l         Appearance of large orders
l         Intensive consumption of specific customer base, for example the intensive screening the movies of children in the summer holiday
l         Improvement of process
l         Launching a marketing campaign
…...
Obviously, a certain level of business knowledge is required to make these assumptions and the keen sense of smell to the circumstances inside and outside the enterprise. This is a relatively personalized effort.

Step 3 Branch validation
Based on the possibility and characteristics of data, the analyzer will choose a branch to start the analysis, such as Increase of Orders. If the number of orders does not increase through the calculating for validation, then it indicates that this assumption is not correct. You need to validate the next assumption to carry on the cyclic analysis.
For example, by going through the validation on this branch of Appearance of Large Order, the analyzer finds this is correct, and thus this branch can be justified.
Step 4 In-depth exploration and mining
These possibilities are usually the apparent cause instead of the root cause. To really settle the problem, you will have to drill down step by step to reach the core. For example, the appearance of large order may result from:
l         The new salesmen is highly capable
l         The new sales policy of the company boosts the large order
l         Intensive procurement of clients from a certain sector
…...
It is obvious that the process of drill-down is a cyclic procedure. The analyzer must judge on the characteristics of data at that specific point to choose the branch of the highest possibility, so as to progress level by level, until the problem is solved.

Step 5 Solve problem
The procedure of exploration and mining does not require the unlimited drilling down. The whole procedure can put an end once a clear answer enough to make a decision is found. For example, through the validation, the Centralized Procurement in a Certain Sector is determined just the root cause. Then, this is enough for analyzer to make a decision: The sales volume can keep rising by simply beefing up the sales forces and efforts in this sector since the recent sales rise is the result of centralized procurement by the clients in this sector. 

Step 6  More computations
To this step, the computation goal is achieved. However, we can realize more business values through more computation on the basis of the existing results, such as:
l         Find the list of customers in this sector
l         Find the list of salesman which are good at this sector
l         Find the reason why the client in this sector increase the procurement quantities abruptly
l         Find the abnormal actions in the sector related to this sector and the downstream/upstream sector.
And this is the end of first part for Interactive Analysis and Related Tools. 
Thanks everyone for reading and providing comments. If you have any questions, please let me know. Your feedback is valued and appreciated by we Raqsoft!
To be continued...

Related Readings:
Interactive Analysis and Related Tools - Part II
Interactive Analysis and Related Tools - Part III




August 17, 2012

The Big Data Meme: 5 Scenarios for IT


Note: It's a post by GilPress on whatsthebigdata
In today’s New York Times, Steve Lohr surveys the rise of big data as a term and as a marketing tool, from “the confines of technology” to the mainstream. “The Big Data story is the making of a meme,” says Lohr, “and two vital ingredients seem to be at work here. The first is that the term itself is not too technical, yet is catchy and vaguely evocative. The second is that behind the term is an evolving set of technologies with great promise, and some pitfalls.”  
Over at Barron’s, Tiernan Ray writes: “One of my responsibilities as a Barron’s tech columnist is to be the keeper of buzzwords. Buzzwords are the fuel for elaborate stock theses on Wall Street, and they can drive the interest in stocks sometimes more than, say, revenue and earnings. The buzzword du jour is ‘big data,’ the information that piles up in the databases of companies everywhere and that those companies would like to manage and understand better.”
Big data has certainly become a mainstream meme in 2012. Does this mean an opportunity for IT professionals, the people that manage “the information that piles up”? Or will it become a threat, just as the other recent technology-related meme, “cloud computing,” promised to drive IT into obsolescence as companies would rely (so the argument went) more and more on outside (“Public Cloud”) IT services?
It seems to me that big data represents the ultimate manifestation of a new facet of yet another meme, albeit one limited to IT circles and the “trade press,” the “consumerization of IT.” The term is widely used to describe “the growing tendency for new information technology to emerge first in the consumer market,” and only later to be adopted, sometimes reluctantly, by enterprises and their IT functions.
But we see now, with the mainstream press discussing cloud computing and big data, that it can also work in the opposite direction–what used to be the preoccupation of a select group of people managing “data centers,” has become a topic of discussion for everybody. The work of IT, if not IT itself and IT professionals, has been in the limelight like never before in recent years.  Words like “gigabyte” or even “petabyte,” previously uttered only by a few “information handlers,” are now understood, more or less, by anyone with a computer and/or access to the Internet (note that Lohr didn’t bother to explain “exabytes, zettabytes and yottabytes,” a common practice in the mainstream press not so long ago). “Backup” used to indicate “move backwards” or “traffic congestion ahead.” These are not anymore the first possible meanings which come to mind for people not working in IT or in the IT industry.
IT is in the limelight now, and it could use this big opportunity to demonstrate leadership that is based on deep experience with and understanding of what data, big or small, is all about—its management, its analysis, its productive use. Or maybe not.
I would like to offer a few scenarios for the impact of big data on IT or what could be IT’s impact on big data. These scenarios focus not on the technologies of big data, but on the people at the vanguard of using them for the benefit of individuals, enterprises, and society—data scientists. This new breed of data handlers is more important than the exciting big data technologies they use because their success and usefulness will determine whether the big data meme will outlast the typical 2-year half-life of technology-related memes. (I would argue that it’s highly probable that even if they succeed in proving the usefulness of big data, the big data meme will eventually give way in a few years to new memes. But I would also argue that their new discipline, data science, buttressed by professional certification and university-based training, will survive for a very long time).
The question, then, boils down to the relationships between data scientists and IT. Here are five scenarios for how this relationship could evolve:
  1. IT will continue to play a supporting, infrastructure-related role and will not get involved with data science. Data scientists will work in their own, advanced R&D-type function, reporting to a chief strategy officer, chief technology or research officer, or even the CEO. Tom Davenport has argued in support of this scenario here and here. Or see my interview with Mok Oh for how PayPal’s data scientists work in an organization that is not part of IT.
  2. IT will hire and train data scientists that will work in collaboration with data scientists in the enterprise’s business units. See my interview with EMC’s CIO for a description of how it’s done there.
  3. IT will move beyond being just the custodian of the data to becoming the key function responsible for leading the data-driven transformation of the enterprise, with its data scientists leading the charge. See Beth Schultz’s spirited defense of IT. I called this scenario “IT is the new Intel Inside.”
  4. The question is irrelevant, because IT will be absorbed by marketing. With so much of the data coming from outside of the enterprise and used to manage both customer-related and product-related activities, this will be the ultimate incarnation of the “consumerization of IT.”  See here for a prediction that CMOs will spend on IT more than CIOs by 2017.
  5. The question is irrelevant, because data science will evolve to become a service-only business, provided to enterprises by companies with large teams of data scientists and access to vast stores of public data and/or proprietary data collections. This scenario may become the new “IT doesn’t matter.”
“Rising piles of data have long been a challenge,” says Steve Lohr. Indeed, my own surveys of the history of big data and the history of data science found that the term “data science” has been first used in the 1960s (and in its current meaning at least a decade ago). The term “big data” has been used in computer science circles already in the late 1990s, in the context of the large amounts of data generated by computer visualization.
What we see today is the continuation of what began six decades ago and was summarized eloquently in a 2008 paper cited by Lohr, written by three prominent computer scientists who called big data “the biggest innovation in computing in the last decade. We have only begun to see its potential to collect, organize, and process data in all walks of life.”
The same could be said about any other previous stage in the evolution of IT, a constant progress that could not have happened without IT professionals, developing and maintaining the foundation for innovative and productive uses of data. So what will IT do with big data? Lead, follow, or get out of the way?

August 15, 2012

Big Data, Good and Evil


It's a post from Steve Sarsfield from smartdatacollective.
As I get involved more and more in the world of Big Data, I find myself reflecting upon where it all will go.  Big Data could help us live better lives by solving crimes, predicting scientific outcomes, detecting  fraud and, of course, optimize our marketing so that we don’t bother people who don’t want our products and target them when we think they do. While the ‘goodness’ of some of those items are decidedly debatable, that’s the bright side. Big Data does represent a paradigm shift for our society, but since it’s still young, we’re just not sure exactly how big Big Data is yet.

When I write about Big Data, I’m talking about leveraging new sources of data like social media, transaction data, sensor data, networked devices and more.  These data sources tend to be… well, big. Mashing them up with your traditional CRM data or supply chain data can tell you some fascinating things.  They even tell you some interesting things all by themselves. It can give you information that wasn’t possible to attain, until recently, when we achieved the technology nd ability to handle Big Data in a meaningful way. We are already starting to see amazingcase studies from Big Data.

On the other hand, there is potential folly. Despite the absolute evolutionary power that Big Data can bring to us, it’s also human nature for some to abuse.  When technological evolution brought us snail-mail, many abused it with junk mail.  When technology brought us e-mail, a few abused it by spamming us. Abuse is my biggest concern. The potential abuse with Big Data is that corporations completely figure out what makes us tick thereby giving them unprecedented power over our buying decisions. It could lead our social issues, too.  For example, if Big Data says that people who eat cheeseburgers after 9 PM are more likely to get a heart attack, do we justify outlawing cheeseburgers after 9?  I'd rather make my own decisions.

The movie “The minority report” starring Tom Cruise has come to mind.  As truth imitates fiction, I can help but think of the mall scene from the movie which overall painted a fairly grim picture of marketing in the future. Now, I see it as prophetic.

This type of marketing already exists within some free online e-mail systems.  For example, if I’m e-mailing my friends about a trip to Vegas or gambling, or even when I post this blog that mentions Vegas, it’s no mistake when ads for Caesars Palace appear.  It’s cool, but yet I am uneasy. Will future employers use big data to help decide if I am worthy of work. Will my e-mail conversations about Las Vegas lead them to believe I am a compulsive gambler thus giving the edge to someone else?  If so, what is my recourse to set the record straight?

Government has reportedly been getting in on big data, too.  A recent Wired magazine story talked about a huge government facility outside in Utah. While there is clearly a "good" aspect to this big data, namely the catching of bad guys, the most troubling aspect of this might be that the citizens have no control of their own data. Oversight on what can and cannot be done with the wealth of information at this facility is unclear.

That said, I generally have an overall positive view of the good that Big Data will bring to society, and the positive influence it will have on data management professionals. We have a society today that is more open and more willing to post private information to the public. Society is therefore more tolerant today and will be even more so in the future.

Ultimately, when and if Big Data becomes abusive to privacy, overzealous capitalism, social issues, et al, expect capitalism to also solve it. Look for companies who set up online e-mail and promote the fact that they don’t track conversations. Look for utilities to overwhelm any negative information about you in the Big Data universe with positive information. We could be looking at a cottage industry of managing  and protecting your Big Data image.

August 13, 2012

Free Record Acess: Serial Number, Locating and Sorting In esProc


In esProc, the data is ordered. You can access the data freely by locating, ranking, sorting, and with other methods.


"Being ordered" means the data are stored in a specific order. Every piece of data of each record has its own relative or absolute serial number. You can access the data according to its serial number and feel free to perform the order-relating actions on data, such as locating, ranking, and sorting.
The typical algorithms involving "being ordered" include the top N records, year-on-year, and link relative ratio statistics. The ordered data with esProc can help you solve a great number of challenges easily regarding the data analysis.
Check the below examples for further explanation.

I Case and Comparison

Case

Assume that there is a telecommunications product manufacturer which needs to analyze the superior products: to find the products whose sales values are always among the Top 10 in each state. The sales value data are stored in the sales table, and the main fields are amount, product, and state.

SQL Solution

SELECT product
FROM (
SELECT product
FROM (
SELECT product,RANK() OVER(PARTITION BY state ORDER BY amount DESC) rankorder
FROM sales)
WHERE rankorder<=10)
GROUP BY product
HAVING COUNT(*)=(SELECT COUNT(DISTINCT state ) FROM sales)

This type of problems requires running statistics on data group rankings. The currently popular SQL-92 syntax is unable to implement it. Therefore, we use SQL-2003 standard, which is gradually supported by more and more vendors, to achieve this goal. Despite the currently incomplete and imperfect support for SQL-2003, the problem can still be solved. However, the HAVING sub-queries of bad readability are used to compute the intersection of sets. The average developers may find it a bit difficult to understand and leverage.

esProc Solution


A1 Cell: Group the data by state, and each group consists of all products and sales volume of a state
A2 Cell: Compute the Top 10 records of the highest sales value in each group (state) in A1.
A3 Cell: Based on A2, use isect to compute the intersection of product sets of each group, that is, the top 10 products enjoying the highest sales in each state.

Comparison

As we can see in the above example, SQL statements lack the concept of “being ordered” and thus are not fit for this type of analysis relating to serial numbers. Moreover, it is difficult to wade through the SQL statements cobbled together. esProc statements are based on serial numbers and follow the natural pattern of human thought, which empowers the user with a free access to records.

II Function Description:

Access with Basic Serial Number

Get the first three records of sales table: sales([1,2,3]), which is same to sales.m([1,2,3]).
Get the sales field value of the last record: sales.m(-1).(amount)
Get the serial number of record whose sales is above 1000: sales.pselect@a(amount>1000)
Get the serial number of record whose sales is the highest: sales.pmax@a(amount)

Serial Numbers of Mass Data

The serial number of mass data is quite common and thus a specific sign of # is assigned to represent it. For example, with the number of clients increase over time, we need to allocate the clients to 2 departments. In order to allocate it evenly, we need to sort the records by client contribution and then group. One group is for clients with odd number, and the other group is for the clients with even number: customer.group(#%2==1).
The relative number of the mass data is represented with [n]. For example, the link relative ration of sales value increase of this month compared with that of last month:
sales.(amount-amount[-1]).

Rank and Sort

The rank action can be used to compute rankings. For example, to compute the sales value rankings of each record, the esProc user can compose like this: sales.rank(amount).
The sort action is to arrange the data according to a specific rule for reviewing and further-processing. For example, by sorting the data by the alphabetical order of state names in ascending order, put the data from a same state together. The esProc user can compose like this: sales.sort(state). They can also review the sales value sorted in descending order in each state and compose like this: sales.sort(state,amount: -1).
Although sorting may disturb the serial number of existing data, sometimes we do need to know the formal serial number of a record after sorting. For example, regarding the three months of the highest sales value in this year, compute the link relative ratio on a monthly basis. In this case, the esProc user can use psort for a much more convenient computation.

In the Cell A1, records are sorted by month in ascending order.
In the cell A2, sort the data by sales value and compute the serial numbers, then get the first three serial numbers. At this time, psort is a “mimic” sorting, and the formal sequence of record in A1 is not changed. This equals to “Let’s find out the formal serial number of the newly sorted data in the old data sequence by supposing such sorting was already implemented”.
In the cell A3, retrieve the record from the formal data according to the resulting serial number from A2, that is, the top 3 months of the highest sales volumes. At last, execute the “this month subtract previous month” computation to obtain the final results.
The same computation will be a great challenge to SQL. As for the intuitive and convenient esProc, it is just a piece of cake.

III Advantages

Convenient to Locate, Rank and Sort

The data in esProc are all ordered, offering the underlying natural support for the locating, ranking, sorting, and other statistics computations of this type.

Easy to Handle the Complex Computation

In the practical data analysis, a great number of complex computations are related to data order. Compared with esProc, SQL lacks the concept of order, and the order-related computation is comparatively harder for SQL to handle.

Fit for Mass Data Computation

Regarding the mass data access by serial number, both esProc and senior language can implement it, but this is impossible for SQL. Although the senior languages allow for record access by serial number, they are by no means convenient once compared with esProc. Just name a few, the esProc user can represent the serial number so easily and access to multiple records by a set of serial numbers in a quite simple way. esProc also provides lots of other similar convenience to implement the mass data access additionally.

Leave a Reply

if you have any questions, please feel free to leave your ideas below by comments.

August 10, 2012

A better way for record set storage than SQL, what is it?

     What's your way to record set storage by SQL? any better way than SQL? yes, esProc.
      Firstly, count the employee in the R&D department;
      Secondly, count how many employees of Sales department is more than that of the R&D department;
      Thirdly, find out the discrepancy in the average employee ages between the R&D and Marketing departments.

SQL solution:



esProc Solution:




August 8, 2012

esProc vs SQL: Dissociative Record

Here we will show an example about how SQL and esProc to express the same case on dissociative record.

Example:
Firstly, find out the age of Tom;
Then, find out how much Tom is older than David;
Finally, find out how many days Tom’s on-boarding is earlier than Harry’s.


   SQL Solution:



   esProc Solution

In order to simplify the SQL statement as much as possible in the comparison examples, the window functions of SQL 2003 standard are widely used, and accordingly the Oracle database syntax which has best support for SQL 2003 is adopted in this essay.

August 6, 2012

esProc Being Invoked by Report Tool via JDBC


Robust, platform-independent, and easy for clustering, load-balancing, maintenance, and extension, Java reporting tool has been widely used in various applications. Therefore, esProc provides the JDBC interface for external use and being called by Java reporting tools, as shown in the below structure figure:

For a system in which the Java reporting tool is adopted, esProc is an ideal choice to perform the complex computation, compute with multiple data sources, and clean the dirty data sources. The reporting tool will receive the result returned by esProc via JDBC regarding esProc as a database. Then, the ultimate data can be rendered by the reporting tool boasting the form & chart design, beautiful style, query interface, entry and commit, export for print, and other advantages by nature.
Please find some scenarios below about invoking esProc by reporting tools:

Data is easy to render and algorithm is hard to implement


esProc is quite ideal to handle the complex computations on mass data in the reports.
Different men may have different areas of expertise. Same law applies to the tools. Reporting tool is good at data rendering. The key advantage of reporting tool is to render the data in a way compatible with business style. However, reporting tools can only offer limited support for the complex computation on mass data. By compassion, esProc can help reporting tool solve the most complex computation of the business reports by working with the reporting tools.

Computation with Multi-Datasources


Most reporting tools only allows for one data source. In order to handle the computation with multiple data sources, the data will have to be pre-processed through the complicated SQL/stored procedures in the background. In the background, multiple data sources are merged into one to be submitted later for report use. esProc is ideal to handle such situation.

The applications of multi-datasources may not only refer to multiple query results of one database, but also the query results from various databases, and even the data from the database or the Txt/Excel files. Many computations on such data are not supported by SQL/stored procedure, or the support is limited. esProc is good at the interactive computation with multiple datasources.

Dirty Data Source Need Cleaning


If the data source is quite dirty and not fit for the reporting tool to use directly, then you can use esProc to clean up and output the result to the reporting tool.

Dirty data is quite a common thing: two duplicate employee records, the employee name is null, a document having several UIDs, the date is mistaken to “2011-13-01”, and the digit “10” is taken to be “IO” by mistake. Such dirty data cannot be used directly and must be examined and cleaned according to certain rules. esProc can clean these dirty data well, and the reporting tool can be ensured to receive the clean data from esProc.
In this last example, a communication enterprise needs a report that will display the top 10 states of the highest turnover and the respective rankings of the communications product in these states. The turnover in each city and the administrative divisions are stored in different physical tables.
The computation on this report is too complicated to handle well for the traditional BAND style reporting model. The obvious difficulty is to convert the 3-level relation of “state – county, county – city” in the administrative division to the two-level relation of “state – city” so as to facilitate the summarization of the turnover table. The individual reporting tool is hard to implement such complex computation by itself. esProc is more ideal to solve such problem.
  1. Reduce the 3-level relations to the 2-level relations by the loop statements of esProc.
  2. Use the align function to perform the align action and summarizing computations in a mode of 2-level relations regarding the turnover data in each city. Summarize to the state level, and get the result named detail is just the product turnover of each category in each state.
  3. Use the Group function to group the detail by state, and get the list of turnover of each state. Retrieve the top 10 states with the name top.
  4. Leverage the previous computation and use the align function again to align the detail by top. The result, detailOfTop by name, is composed of the top 10 states listed in order as well as the turnover data of each product in each city.
  5. detalOfTop is a result set that is easy to render. Just return the detalOfTop to Java reporting tool via JDBC.
  6. The reporting tool will output the result in the required format.
Leave ur comments below!