Problem
source:http://stackoverflow.com/questions/28976554/removing-duplicates-from-csv-and-rearranging-the-data-into-columns-with-java .
JAVA doesn’t
have the class library for grouping data from a text file or getting distinct
values, which results rather complicated coding. In dealing with the duplicated
records in a CSV file, esProc can work with JAVA to make it easier. The
following example will tell you how esProc works.
dup.csv
contains 8 columns, in which there are some duplicated values, as shown below:
21458952,a1234,Doe,John,technology,support
staff,work phone,555-555-5555
21458952,a1234,Doe,John,technology,support
staff,work email,johndoe@whatever.net
21458952,a1234,Doe,John,technology,support
staff,work pager,555-555-5555
99946133,b9854,Paul,Jane,technology,administration,work
phone,444-444-4444
99946133,b9854,Paul,Jane,technology,administration,work
email,janepaul@whatever.net
99946133,b9854,Paul,Jane,technology,administration,work
pager,444-444-4444
99946133,b9854,Paul,Jane,technology,administration,cell
phone,444-444-4444
|
We need
to filter away the duplicated records, get the first 6 columns and rearrange
the 7th and the 8th column, according to the rule that
work phone will be made the 7th column and work email the 8th
column for the new file, and if there is more than one work phone or work email
for the same person, the first one will be used.
A1: Import
the file separated by commas.
A2: Filter records to remove the duplicated ones and rearrange them. group function is used to group them. ~ represents each group of records; _1,_2…_8 are default column names; @1 indicates getting the first record from the query result. The result is as follows:
A3: Export
the result to a new CSV file. Or we can use exportxls
function to export data as the Excel format.
Having done all data processing,
the esProc script will then be integrated with JAVA program via JDBC using the
following JAVA code:
// establish a connection via esProc
JDBC
Class.forName("com.esproc.jdbc.InternalDriver");
con=
DriverManager.getConnection("jdbc:esproc:local://");
// call esProc script, whose name is
test and which receives parameters
st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call
test()");
st.execute();//execute esProc stored
procedure
ResultSet set = st.getResultSet();//get
the result
No comments:
Post a Comment