Posted in hacks and kludges on Monday, October 26 2015

One unfortunate thing about using excel for coding at work is the difficulty in version controlling spreadsheets. Excel files are big binary blobs and, unlike the source files for most programming languages which are text files, cannot be easily diffed. This raises all sorts of issues with the maintainability of the spreadsheets and the macros that are embedded in them. I hacked together a simple solution to this problem in powershell, for newer versions of Excel (I have Excel 2013 on my work computer and it works there!)

Newer Excel files (post 2007 I think?) are actually zipped containers of xml files and other junk. So step one is to simply unpack that container into a folder and most of the contents of your spreadsheet are now in plain text (xml). Don't believe me? Rename a random .xlsx file to .zip and see for yourself. There is a convenient tool to do this for you on bitbucket. By unpacking the spreadsheet you can track the changes to the individual xml files that make up the spreadsheet, and only those changes are commited in version control. If instead you put the final spreadsheet into git, a new copy would be uploaded everytime a change was made to the spreadsheet. Not only is this not helpful if you are trying to track down when a change was made, it also makes your repo much bigger than it needs to be.

Unfortunately all the VBA code is stored in a vbaproject.bin file inside that zip file. This is a binary blob, an OLE file, and has all the same maintainability issues as the spreadsheet as a whole. There are various tools online for viewing and extracting code from this file. I looked at what it would take to read/write directly into the file and ugh no. Too much work given that Excel can import and export the contents of a VBA project easily and with a few keystrokes.

Putting all this together, I took my spreadsheet for doing compressible pipeflow calculations unpacked it and put it on github. Then I wrote a short powershell script to do the following:

1. Take the unpacked directory of xml files and zip it into temp.zip
2. Rename temp.zip to temp.xlsm, it is now a real spreadsheet (minus the vba code)
3. Open Excel in background and open the temp.xlsm file
4. Walk through the directory of basic modules and import each into the Excel file
5. Walk through the directory of class modules and import each into the Excel file
6. Save and close temp.xlsm, quit Excel
7. Rename temp.xlsm to Pipeflow.xlsm

This short script takes that directory and re-makes a spreadsheet. By unpacking and re-packing the spreadsheet I can maintain it with a normal version control system, tracking only the small changes to the individual text files.

So far it works pretty well, the only giant caveat is that Excel, by default, does not allow outside programs to muck about in the VBA side (probably for good security reasons). So you will have to open your own copy of Excel, migrate through the options and turn on "Trust access to the VBA project object model". (Also you need windows powershell, and have its permissions set so you can run scripts (duh!))

$wb_src = Join-Path (pwd) Pipeflow.xlsm.extracted\$wb_dst = Join-Path (pwd) temp.zip
$wb_temp = Join-Path (pwd) temp.xlsm$wb_final = Join-Path (pwd) Pipeflow.xlsm

[System.IO.Compression.ZipFile]::CreateFromDirectory($wb_src,$wb_dst)

Rename-Item -Path $wb_dst -NewName$wb_temp

$xl = New-Object -comobject Excel.Application$xl.Visible = $False$wb = $xl.Workbooks.Open($wb_temp)

foreach ($i in get-childitem Modules\*.bas) {$wb.VBProject.VBComponents.Import($i.fullname) | Out-Null } foreach ($i in get-childitem Classes\*.cls) {
$wb.VBProject.VBComponents.Import($i.fullname) | Out-Null
}

$wb.Save()$xl.Quit()

Rename-Item -Path $wb_temp -NewName$wb_final


tags: excel, vba, powershell,