September 10, 2015

Preparing Test Data with esProc

Test data preparation is a critical work in software testing. High-quality test data can better simulate the business case. It helps to meet the testing requirements by timely and effective evaluation of software performance, or finding potential issues in the software builds. Most of the time, the amount of data used in testing is relatively large, and the data needs to be randomly generated according to specific requirements. Sometimes there is certain relationship between the data, and there is the need to retrieve data from an existing database. Therefore, the preparation of test data often means complexity and and huge workload.

esProc is a handy tool for test data preparation.

Now we need to prepare the test data for employee’s information in text format, including employee number, name, gender, date of birth, city and state of residence, etc. Through this example, we can understand the way test data are being prepared.

We have the following requirements for test data: the employee numbers are generated sequentially. Name and gender are randomly generated. Birthdays are randomly generated, however we need to ensure that the current age of the employees are between 18 to 55 years old. City and states are randomly obtained from a table in database.


In 3 text files Top100MaleNames.txt, Top100FemaleNames.txt and Top100Surnames.txt, there are 100 most used male and female names, and surnames stored.


The cities of employees need to be retrieved randomly from the CITIES table in database: 

According to the STATEID field in CITIES table, we can retrieve the abbreviations of the states for the employees from STATES table

The code for preparing test data is as follows:

The following is the explanation of the code in the cellcet.


The first two lines generate the raw data of names. Note that when generating the employee information, the name of the employee is related to his/her gender. Therefore we need to retrieve the text data first, combine the most used male and female names, and add the gender field to them:


After data is arranged, we can see in C2 the following table sequence consisting of names and genders

Similarly, the cities and the abbreviations of states are also related. After retrieving data from database in line 3, the abbreviations of states are added to city information:

After data arrangement, the table sequence of city information in C3 is as follows:

Then the basic information for generating data is prepared in line 4, including the data structure for employee information table, and the amount of test data to be generated, etc.:

Among this, the number in C4 is the definition of cache, meaning that after generation of every 1,500 records we need to output data to the text file. This way we can control the use of memory. In B5 the data structure of employee information table is output to the text file.

As the next step, we can now run a loop from line 6 to line 15 to generate the test data for every employee:

B6 generates a random sequence number as reference to a name, while C6 generates one for a surname. They are used to generate the name and gender of an employee. Accordiing to the requirements, B9 randomly generates the age, and according to the age, line 10 selects a random date in the corresponding year as this employee's birthday. In line 11, 12 of the code, randomly select a city and get the city and state for the employee. After the required data is generated, B13 will add the data to the table sequence of employee information created in A4. C14 controls the data output, and write data to text file after every 1,500 records are generated. After data output, A4 is dumped to avoid too much memory use.


After all data output, the text file is as follows:

When preparing test data with esProc, we can run a loop to generate large amount of random data. Meanwhile, in the loop, we can invoke existing database data or text data easily, to generate data according to business needs and to avoid writing complex programs.