September 25, 2014

A Handy Method of Accessing Excel Files in Java

Processing data from Excel files is common in Java projects. Here is an example to illustrate how to accomplish this. Read information of sales orders from an Excel file orders.xls and select orders that are placed on and after January 1, 2010 and whose SELLERID field is 18. Then write the selected data to order_result.xls.

The content of Excel file orders.xls is as follows:

ORDERID
CLIENT
SELLERID
AMOUNT
ORDERDATE
1
UJRNP
17
392
2008/11/2 15:28
2
SJCH
6
4802
2008/11/9 15:28
3
UJRNP
16
13500
2008/11/5 15:28
tha4
PWQ
9
26100
2008/11/8 15:28
5
PWQ
11
4410
2008/11/12 15:28
6
HANAR
18
6174
2008/11/7 15:28
7
EGU
2
17800
2008/11/6 15:28
8
VILJX
7
2156
2008/11/9 15:28
9
JAYB
14
17400
2008/11/12 15:28
10
JAXE
19
19200
2008/11/12 15:28
11
SJCH
7
13700
2008/11/10 15:28
12
QUICK
11
21200
2008/11/13 15:28
13
HL
12
21400
2008/11/21 15:28
14
JAYB
1
7644
2008/11/16 15:28
15
MIP
16
3234
2008/11/19 15:28
16
AYWYN
4
6566
2008/11/21 15:28

Steps of coding in Java are like this:
1.       Import data from the Excel file by rows and store them in sourceList, the List object.

2.       Traverse sourceList, the List object, and store the eligible data in resultList, the List object.

3.       Traverse resultList and store the data in the output Excel file by rows.
The code contains some statements handling different data types. See the following for detail:

         public static void myExcel() throws Exception {
                   //import data from the excel file
                   HSSFWorkbook workbook=
newHSSFWorkbook(new FileInputStream(new File("d:/file/orders.xls")));
        Sheet sheet=workbook.getSheetAt(0);//  import the data from the first sheet
        List<Map<String,Object>>sourceList= new ArrayList<Map<String,Object>>();
                   List<Map<String,Object>>resultList= new ArrayList<Map<String,Object>>();
                   SimpleDateFormat format = new SimpleDateFormat("yyyy-M-ddHH:mm:ss");
                   for (inti = 1; i<sheet.getPhysicalNumberOfRows(); i++) {//process excel data by rows
                       Row row=sheet.getRow(i);
                       Map<String,Object> order=new HashMap<String,Object>();
                       Cell cell0 = row.getCell(0);
                            cell0.setCellType(Cell.CELL_TYPE_STRING);
//integer data should be converted into the type of txt, otherwise, they would become the floating point
                            Cell cell1 = row.getCell(1);
                            cell1.setCellType(Cell.CELL_TYPE_STRING);
                            Cell cell2 = row.getCell(2);
                            cell2.setCellType(Cell.CELL_TYPE_STRING);
                            order.put("ORDERID",cell0.toString());
                            order.put("CLIENT",cell1.toString());
                            order.put("SELLERID",cell2.toString());
                            order.put("AMOUNT",row.getCell(3).toString());
                            //process data of date type
                            order.put("ORDERDATE",
HSSFDateUtil.getJavaDate(row.getCell(4).getNumericCellValue()));
                            sourceList.add(order);
                   }
                   for (inti = 0, len = sourceList.size(); i<len; i++) {//filter according to the specified condition
                            Map<String,Object> order =(Map<String,Object>) sourceList.get(i); 
                            System.out.println("1order.get(\"SELLERID\")="+order.get("SELLERID"));
                            if ( Integer.parseInt(order.get("SELLERID").toString())==18 &&
                            ((Date)order.get("ORDERDATE")).after(format.parse("2009-12-31 23:59:59")) )
                            {//evaluatethe data’s eligibility
                                     resultList.add(order); //add the eligible orders to resultList, the List object
                            }
                   }
                   //write excel file
                   HSSFWorkbook workbook1 = new HSSFWorkbook();//create an excel file object
                   Sheet sheet1 = workbook1.createSheet();//create a sheet object
                   Row row1;
                   row1 = sheet1.createRow(0);//the first row: headline
                   row1.createCell(0).setCellValue("ORDERID");
                   row1.createCell(1).setCellValue("CLIENT");
                   row1.createCell(2).setCellValue("SELLERID");
                   row1.createCell(3).setCellValue("AMOUNT");
                   row1.createCell(4).setCellValue("ORDERDATE");
                   for (inti = 1, len = resultList.size(); i<len; i++) {//create data rows by loop
                            row1 = sheet1.createRow(i);
                            row1.createCell(0).setCellValue(resultList.get(i).get("ORDERID").toString());
                            row1.createCell(1).setCellValue(resultList.get(i).get("CLIENT").toString());
                            row1.createCell(2).setCellValue(resultList.get(i).get("SELLERID").toString());
                            row1.createCell(3).setCellValue(resultList.get(i).get("AMOUNT").toString());
                            row1.createCell(4).setCellValue(format.format((Date) resultList.get(i).get("ORDERDATE")));
                   }
FileOutputStreamfos = new FileOutputStream("d:/file/orders_result.xls"); 
workbook1.write(fos);// write a file
fos.close(); 
         }

The resulting Excel data after the code is executed are as follows:
ORDERID
CLIENT
SELLERID
AMOUNT
ORDERDATE
432
ERNSH
18
6272.0
2010-1-13 15:28:05
444
SJCH
18
4312.0
2010-1-25 15:28:05
452
HP
18
4312.0
2010-2-01 15:28:05
492
HP
18
27900.0
2010-3-07 15:28:05
512
BTMMU
18
18000.0
2010-3-27 15:28:05
524
PJIPE
18
15600.0
2010-4-09 15:28:05
637
HP
18
9114.0
2010-8-01 15:28:05
638
JOPO
18
294.0
2010-8-01 15:28:05

Most of the code written with myExcel function is responsible for accessing and processing Excel files and data types. Yet it is rather complicated. There is just a small amount of code used for filtering based on conditions,and it is not universal enough. As filtering conditions are fixed, if there is any change about them, the code should be modified accordingly. We have to write code for analyzing dynamic expressions if we want to set dynamic filtering conditions.By doing so, we can get dynamic filtering conditions similar as the SQL theoretically. But programming would be difficult and the adjustment would be comparatively complicated.

If esProc, a programming language specially designed for processing structured (semi-structured) data, is used to help to process Excel data, programming will become much easier. With esProc, we can produce comparatively simple code to process Excel files. Besides, it is also simple for Java to call esProc script.


Programmers may pass the condition that "orders that are placed on and after January 1, 2010 and whose SELLERID field is 18" on to esProc program as the parameter where. See the chart below: 

The value of where is ORDERDATE>=date(2010,1,1) && SELLERID==18. The code ofesProc program is as follows: 



A1Define a file object, import data to it and set the first row as the headline. esProc's IDE can show the imported data visually, as shown in the right part of the above chart. The code for accessing file xlsx is =file("D:/file/orders.xlsx").importxls@xt().

A2Filter according to the condition. Here macro is used to analyze the expression dynamically. where is the input parameter. esProc will first compute the expression enclosed by ${…}, replace ${…} with the computing result acting as the macro string value, and then interpret and execute the result. The code finally executed is =A1.select(ORDERDATE>=date(2010,1,1) && SELLERID==18).

A3Write the eligible result set to the Excel file.

The code needn't to be rewritten when the filtering condition is changed, we just need to modify the parameter where. For example, if the filtering condition becomes "orders that are placed on and after January 1, 2010 and whose SELLERID field is 18, or orders whose CLIENT is PWQ", the value of parameter where can be expressed like this:CLIENT=="PWQ"||ORDERDATE>=date(2010,1,1) && SELLERID==18. After the code is executed, the result set of A2 is as follows: 

This piece of code can be called in Java application by using JDBC provided by esProc. The code called in Java for saving the above esProc program as test.dfx is as follows:
          // create a connection between esProc and JDBC
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
//call the esProc program (the stored procedure) in which test is the file name of dfx
com.esproc.jdbc.InternalCStatementst=(com. esproc.jdbc.InternalCStatement)con.createStatement();
    //set parameters
st.setObject(1,"ORDERDATE>=date(2010,1,1) && SELLERID==18 || CLIENT==\"PWQ\"");//the parameter is the dynamic filtering condition
//execute esProc stored procedure
st.execute();

For the script with relatively simple code, it is not necessary to write specially the esProc script file (test.dfx). esProc code can be written directly to the Java application that calls esProc JDBC.

String where="CLIENT==\"PWQ\"||ORDERDATE>=date(2010,1,1) && SELLERID==18 ";
String resultpath=" D:/file/orders_result.xls ";
String sourcepath=" D:/file/orders.xls ";
ResultSet set = st.executeQuery("=file(\""+resultpath+"\").exportxls@t(file(\""+sourcepath+"\").importxls@t().select("+where+"))");


This piece of Java code calls a line of code from esProc script directly: get data from the Excel file, filter them according to specified conditions and write the result set to the Excel file.