Problem
source:http://unix.stackexchange.com/questions/161885/using-awk-to-identify-the-number-identical-columns .
There are some text files under /data
directory. Each of them has certain columns. We want to know how many distinct
columns are there in each file. For instance, the number of distinct columns in
f1.txt is 3.
1 0 0 0 0 0
0 1 1 1 0 0
|
Suppose there is only one file. Then the
code could be:
file("/data/f1.txt”).import().fno().((c=#,A3.(~.field(c)))).id().len()
fno function is used to get the number of columns in a two-dimensional
table; ~ represents the loop variable of a loop function; # represents loop
number; and id function is used to
get the distinct columns.
If there are a great number of files under
/data directory, the code will be more complicated:
pjoin((d=directory@p("/data")),d.((f=file(~).import(),f.fno().((c=#,f.(~.field(c)))).id().count())))
This line of code calculates sequentially the
number of distinct values in each file and joins the results with corresponding
file names. The result table is as follows:
|
For the convenience of observing
computational logic, the above code can be written in multiple cells using a
long statement:
== indicates the beginning of the long
statement, whose working range is the indented block of B2-C5. B5 is the last
executable cell whose result will be returned to A2.
No comments:
Post a Comment