June 5, 2015

esProc Joins Text files and Generates a Computed Column

There are two tab-separated structured text files. chr column in AssociatedMarkers.txt is the logical foreign key pointing to Chr column in DiseaseMarkers.txt. We want to create a new structured text file, in which one column comes from AssociatedMarkers.txt’s snps_BCG24 column and the other is a computed column that will get its values through the following algorithm: If a value of AssociatedMarkers.txt’s hg19pos column falls within the startLoc and endLoc in DiseaseMarkers.txt, then output it as inLocus; otherwise output it as an empty string. Selections of the two files are as follows:

AssociatedMarkers.txt

DiseaseMarkers.txt

esProc approach

A1,A2: Import the files into memory. @t means importing column names at the same time.

A3: Perform join operation. Result is as follows:

A4: Retrieve desired columns from A3. _1.hg19pos column corresponds AssociatedMarkers.txt’s hg19pos column. The final result is as follows: