Sometimes we need to fetch certain data
from multiple files of a multi-level directory during text processing. The
operation is too complicated to be well performed at the command line. Though
it can be realized in high-level languages, the code is difficult to write; and
the involvement of big files will increase the difficulty. esProc, however, can
import big files with cursors and call the script recursively and thus can
process the data fetching in batch. The following example will show its way of
doing it.
esProc code for doing this:
First define a parameter, path, and set its initial value as “D:\files” so as to get data from this directory, as shown below:
A1=directory@p(path)
directory function is used to get the file list in the root directory of the parameter, path. @p option means file names should be presented with full path. The following shows some of the results:
A2=A1.(file(~).cursor@s()) . This line of code opens A1’s files respectively
in the form of cursors. A1.(…) means processing A1’s members in proper order; “~”
represents the current member; file
function is used to create a file object and cursor function will return a cursor object according to the file
object.
A3=A2.((~.skip(1),~.fetch@x(1)))This line of code fetches
the second row from A2’s each file cursor. A2.(…) means
computing A2’s cursors one by one. (~.skip(1),~.fetch@x(1))
means computing the expression in the parentheses in order and returning the
last computed result. ~.skip(1) means skipping a row. ~.fetch@x(1) means
fetching the row at the current position (i.e. the second row) and closing the
cursor. @x means closing the cursor automatically after the data are fetched. ~.fetch@x(1)
represents the result which the parentheses operator will return.
skip function skips multiple rows. You can determine how many
rows need to be skipped through a parameter. fetch function fetches multiple rows. Fetch two rows starting from
the 10th row, for example, the code is ~.skip(10),fetch@x(2).
A4=A3.union()This line of code unions the results in A4 together. union function is used to realize the union operation, removing the duplicate data at the same time. For example, the code for computing the union of two sets: [1,2] and [2,3] is [1,2],[2,3]].union() and the result is [1,2,3]. If duplicate data are wanted, conj function (for concatenation) should be used. Some of the results of A4 are as follows:
A5=file("d:\\result.txt").export@a(A4)This line of code
exports the results of A4 to result.txt.
export function is used to write data
to a file. @a option means appending.
At this point, all data have been fetched
as required from the current directory. The rest of the work is to fetch the
subdirectories of the current directory and to call this script recursively.
A6=directory@dp(path)directory function is used to fetch all the
subdirectories from the current directory. One of the options, d, means fetching the subdirectory names
and the other one, p, means fetching
the full paths. Thus A6 gets the subdirectories from D:\files:
A7=A6.(call("c:\\readfile.dfx",~))This
line of code deals with A6’s members (the subdirectories). The operation is to
call the esProc script - c:\\readfile.dfx,
and makes the current member (one of the subdirectories) as the input
parameter. Note that readfile.dfx is
the name of this script.
No comments:
Post a Comment