June 15, 2015

Combine Text Files Conditionally with esProc


There are multiple text files in a single directory which need to be combined according to specified conditions. The text files include, for example, 12345.txt, 12346.txt, 12347.txt, 2013070312345.txt, 2013070312346.txt, 2013070312347.txt and 2013070412347.txt. The combination result is shown by the following figure:


That is, you need to combine files with the same last five numbers together.

Usually file handling in high-level languages is much too low-level, generating bloated code for this computation with a series of loop and if statements. In contrast, esProc can make the computation quite easy by providing deep encapsulation and effectively supporting set operations. esProc script is as follows: 

In which,
A1: List all files under E:\\test directory, and find those whose name length is greater than 5. 

A2: Run a loop to handle files in A1.

B2: Import a file and append it, through B3, to a text file named after the file’s last five numbers.

In the above script, a single file is loaded into memory in one go because it contains small volume of data (within memory capacity). For a file containing big data that cannot be entirely loaded into memory, esProc provides the approach of stream-style processing of file cursor to handle it. With this approach, file data can be imported and exported in batches. The above script can be modified to accommodate itself to a big file: 


B2 creates a file cursor, in which @s means importing the file as a table sequence comprising one-field strings, and exports the content. The cursor is then closed in B4.

No comments:

Post a Comment