Scraping html tables in excel part 2: first blood

Posted in hacks and kludges on Monday, April 27 2015

Last time I dealt with VBA I came up with a horrible kludge for getting html into my excel spreadsheet. After noodling around I have found a better way.

My first attempt at copying the contents of an HTMLTable into an excel spreadsheet was slow as molasses. It took literal minutes to populate the table and I abandoned that ship pretty quick for kludgy workaround. For reference my first attempt looked like this:

Dim row as Object
Dim col as Object
Dim some_row as Range
Dim some_cell as Range
Set some_row = Sheet("Sheet1").Range("A1")      'where I want to put the data
Set some_cell = some_row
For Each row in tbl.Rows
    For Each col in row.Cells
        some_cell.Value = col.innerText         ' set the cell value
        Set some_cell = some_cell.Offset(0,1)   ' move the cell right 1
    Next col
    Set some_row = some_row.Offset(1,0)         ' move the cell down 1
    Set some_cell = some_row
Next row

There were two problems with this: one I wanted to copy the dates as text and not have excel parse them, and two excel was super slow. I had a thought on how to solve the first one, and make the code somewhat easier to read, by copying the contents of the HTMLTable into an array, casting the content as strings, then putting the array into the worksheet.

Dim destination As Range
Set destination = Sheet("Sheet1").Range("A1:E1")

Dim row As Object
Dim col As Object
Dim arr() As String

For Each row In tbl.Rows
    i = 0
    ReDim arr(0 To row.Cells.Length)
    For Each col In row.Cells
        arr(i) = col.innerText                  ' copy the row contents into an array
        i = i + 1
    Next col
    destination.Value = arr                     ' put the array into the worksheet
    Set destination = destination.Offset(1, 0)  ' move down a row
Next row

That solves the problem of the date formatting, but it is still painfully slow. But wait, every time a row in Sheet1 populates the whole workbook recalculates. This code is just a small part of a larger dashboard for data, so having it recalculate 1500 times (as there are 1500 lines of data) might have something to do with why this is so slow.

Dim destination As Range
Set destination = Sheet("Sheet1").Range("A1:E1")

Dim row As Object
Dim col As Object
Dim arr() As String

Dim orig_calc_mode as XlCalculation
orig_calc_mode = Application.Calculation
Application.Calculation = xlCalculationManual    ' with this set, excel does not recalculate

For Each row In tbl.Rows
    i = 0
    ReDim arr(0 To row.Cells.Length)
    For Each col In row.Cells
        arr(i) = col.innerText
        i = i + 1
    Next col
    destination.Value = arr
    Set destination = destination.Offset(1, 0)
Next row

Application.Calculation = orig_calc_mode         ' put it back the way we found it

All I do is turn off automatic calculation while populating the table, then return it to whatever the previous state was at the end. After making this change the code runs in 10s or so. It isn't as fast as the query table kludge (for whatever reason) but at least now I'm not generating unnecessary cache files and other garbage.

tags: excel, vba,