Design choices and trying to roll my own process simulator

Posted in musings on Saturday, September 19 2015

A while back, when I was still in school doing my chemical engineering thang, I slapped together some code in matlab for doing flash calculations -- i.e. determining how a mixture will partition into phases at a given temperature and pressure. I mostly did this because the process simulator I was using wouldn't do all the things I wanted it to, not easily, and I got annoyed by how obtuse it was. I wanted to do really exploratory pre-design stuff and I wanted to be able to just try large combinations of possible (naive) configurations of a process to inform where I would go with more detailed designs.

This was the seed for a perpetually not done project that I poke at every couple months, trying to generalize my code (now in Python) to make it easier to do what I did: write scripts to explore the space of possible things you could do with a process simulator without getting bogged down in the constraints of actually specifying everything.

Recently I went back and re-jiggered a bunch of the under the hood math stuff, as I get more comfortable with numpy I periodically realize there is a better way of doing whatever and I go and fix, and this got me thinking about why my project is perpetually unfinished. Partly it is because it is a huge task to build all the libraries for all the thermo you might what to do, but mostly it is because all those tasks have large data requirements: every correlation has correlation constants. It becomes both a big database problem, which I am not super interested in, and a problem of picking winners. For any given property there are often lots of possible correlations that could be used, e.g. for the ideal gas head capacity there are three different correlations between my thermo textbooks, if I pick any particular correlation then I greatly diminish the usefulness of my code to someone who doesn't have all the data I have.

One approach is to just embrace this and bundle the data with the code, make it so the user isn't ever exposed to what is going on under the hood. Basically continue on and build a full fledged simulator.

I don't really want to do this as I would end up having to lock down the feature set to doing just what I imagine doing, as it is married to the data that I provide, and fighting against that is what got me started on all this 2.5 years ago. Also I would have to supply all that data, which I don't have access to now that I am no longer a student (industry doesn't buy licenses to all the things in the same way university libraries do) and may not be entirely legal given the licensing agreements around databases and the like.

What I settled on instead is trying to provide as fully featured of a set of crap nobody wants to do bundled in classes that others can extend to do what they want to do. For example there is a Mixture class that has the necessary functions to do the flash calculations using any arbitrary model that returns the right thermodynamic properties.

When I want to use the Peng-Robinson equation of state (an old favourite as it was developed at my alma mater) I merely extend the Mixture class and fill in the details on the model: define how the constants of the EOS are calculated, define the mixing rule, define the mixture departure functions. All the code needed to instantiate the mixture objects and make them sufficiently list-ish, along with the code needed to do the flash calculations and determine if phases are stable just comes along for the ride.

It isn't as fully featured, but it is a lot more flexible. Which I like, even if I will likely be the only person who ever uses it.