Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
658 views
in Technique[技术] by (71.8m points)

c# - Microsoft.Office.Interop.Excel or EPPlus for read a huge (or not) Excel file

I wrote a code to read a column from a Excel file. I use Microsoft.Office.Interop.Excel on this, first read the entire Range and then write in System.Array after that I do some operations with the System.Array values and finally I convert it to List because I fill a ListBox element. This is the code (only relevant parts):

private List<string> bd = new List<string>();
private static System.Array objRowAValues;

private List<string> bl = new List<string>();
private static System.Array objRowBValues;

private List<string> cm = new List<string>();
private static System.Array objRowCValues;

private List<string> pl = new List<string>();
private List<string> bdCleanList;
private static Microsoft.Office.Interop.Excel.Application appExcel;

Excel.Application xlApp;
Excel.Workbook xlWorkBook;
Excel.Worksheet xlWorkSheet;
Excel.Range rngARowLast, rngBRowLast, rngCRowLast;

long lastACell, lastBCell, lastCCell, fullRow;

private void btnCargarExcel_Click(object sender, EventArgs e)
    {
        if (this.openFileDialog1.ShowDialog() == DialogResult.OK)
        {
            if (System.IO.File.Exists(openFileDialog1.FileName))
            {
                Stopwatch stopWatch = new Stopwatch();
                stopWatch.Start();
                Thread.Sleep(10000);

                filePath.Text = openFileDialog1.FileName.ToString();

                xlApp = new Microsoft.Office.Interop.Excel.Application();
                xlWorkBook = xlApp.Workbooks.Open(openFileDialog1.FileName, 0, true, 5, "", "", true,
                                                  Microsoft.Office.Interop.Excel.XlPlatform.xlWindows, "", false,
                                                  false, 0, true, 1, 0);
                xlWorkSheet = (Excel.Worksheet)xlWorkBook.Worksheets.get_Item(1);

                fullRow = xlWorkSheet.Rows.Count;
                lastACell = xlWorkSheet.Cells[fullRow, 1].End(Excel.XlDirection.xlUp).Row;
                rngARowLast = xlWorkSheet.get_Range("A1", "A" + lastACell);
                objRowAValues = (System.Array)rngARowLast.Cells.Value;

                foreach (object elem in objRowAValues)
                {
                    if (elem != "")
                    {
                        bd.Add(cleanString(elem.ToString(), 10));
                    }
                }

                nrosProcesados.Text = bd.Count().ToString();
                listBox1.DataSource = bd;

                xlWorkBook.Close(true, null, null);
                xlApp.Quit();

                releaseObject(xlWorkSheet);
                releaseObject(xlWorkBook);
                releaseObject(xlApp);

                stopWatch.Stop();

                TimeSpan ts = stopWatch.Elapsed;
                executiontime.Text =
                    String.Format("{0:00}:{1:00}:{2:00}.{3:00}", ts.Hours, ts.Minutes, ts.Seconds,
                                  ts.Milliseconds / 10).ToString();
            }
            else
            {
                MessageBox.Show("No se pudo abrir el fichero!");
                System.Runtime.InteropServices.Marshal.ReleaseComObject(appExcel);
                appExcel = null;
                System.Windows.Forms.Application.Exit();
            }
        }
    }

I test with a Excel file with ~800 000 cells and take less than 2 minutes. Then I test samples from EPPlus and are faster than my approach so I think in use EPPlus instead of Microsoft.Office.Interop.Excel I think also in use OpenXML SDK (but can't find any example to suite my goals so I leave for now). In the example they use this code to read from a Excel file:

ExcelWorksheet sheet = package.Workbook.Worksheets[1];

var query1= (from cell in sheet.Cells["d:d"] where cell.Value is double && (double)cell.Value >= 9990 && (double)cell.Value <= 10000 select cell);

of course they use LINQ here, but my questions regarding this topic are:

  • Which approach did yours use?
  • What are your recommendations on this?
  • Any help to write the same using EPPlus or OpenXML SDK?

I'm newbie in C# world coming from PHP world and this is my first project

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Which approach did yours use? -EPPlus

What are your recommendations on this? -I've found EPPLus to be hugely faster. It is also an easier API to work with in my opinion. For many reasons, one being the lack of COM interop(both for speed and ease of use). Also has less requirements, especially when deploying to a server environment: no installing Excel junk.

Any help to write the same using EPPlus or OpenXML SDK? -EPPlus API is fairly straightfoward. Make an attempt and post more specific questions with what you've tried so far.

Another way to loop through cells:

var firstColumnRows = sheet.Cells["A2:A"];

// Loop through rows in the first column, get values based on offset
foreach (var cell in firstColumnRows)
{
    var column1CellValue = cell.GetValue<string>();
    var neighborCellValue = cell.Offset(0, 1).GetValue<string>();
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...