Playing around with Edmonton's Open Data Portal

Posted in civics on Sunday, June 05 2016

Recently I sat down and made some maps of Edmonton with overlays for various and sundry bits of the census. This got me interested in looking into the API for the open data portal and seeing what I could do with that.

One possibility that jumped out at me was to map out the property values dataset in a way that is easier for individuals to engage with. The whole dataset is quite large, it clocks in around 35Mb for the CSV, and just plunking all of that on a map would be both a heinous waste of the users bandwidth and not very informative. I figured the easiest thing would be to filter down and just show the properties around a given point of interest.

This lets users zoom in on an area of the city and examine what the city assesses the properties at. Potentially useful if you are in the market for a house. I figure I can throw this together in Angular2 using leaflet to serve the map, making simple http requests to populate the map with data.

This sounds great but my one problem has spawned two sub-problems:

  1. How do I get a set of coordinates for an address, supplied by the user with all the vagueries and inconsistencies that come with that?
  2. How do I filter the dataset down, on the server end, so I am not loading such a large request to the client?

Problem one has largely been solved by google, and they have made their API available for this purpose. I can simply supply the user with a text box and have google try and figure out the corresponding location based on their mad scribblings.

This is already implemented by Haoliang Yu in his angular2-leaflet-starter project which I just forked and modified. The original service simply passed the user's input to google without any additional parameters, however I want this to be preferential to addresses within Edmonton so I need to add some bounds. I got these by just eyeballing where I think the bounds of Edmonton are more-or-less.

@Injectable()
export class GeocodingService {
    http: Http;
    edmontonbounds: string;

    constructor(http: Http) {
        this.http = http;
        this.edmontonbounds = '53.4357,-113.6021|53.6560,-113.3638';
    }

    geocode(address: string) {
      let params: URLSearchParams = new URLSearchParams();
      params.set('address', address)
      params.set('bounds', this.edmontonbounds)

      return this.http
          .get('http://maps.googleapis.com/maps/api/geocode/json', { search: params})
          .map(res => res.json())
          .map(result => {
            // some more code to process the google response
          });
    }
}

Problem two is slightly irritating to me. The property values dataset has the latitude and longitude for each property in the dataset, however, these are stored as plain numbers and not a geojson Point. If they were stored as a geojson point I could make an API call like so: https://data.edmonton.com/property.json?$where=within_circle(location, mylat, mylong, myradius) where mylat, mylong and myradius are parameters of my choosing. That one API call would solve the whole problem.

Instead, I have to filter each column based on a bounding box that I have to figure out through the raw application of geometry. What I want to know is how many degrees of latitude and longitude equals a given distance from a point. There are fairly straight forward formulae to give you the number of meters per degree of latitude and longitude at a given latitude $\phi$ (taking into account that the Earth is not a sphere):

$$ \Delta_{LAT} = {{\pi a (1-e^2)} \over {180 \left(1- e^{2} \sin^{2} \phi \right)^{1.5}}} $$ $$ \Delta_{LNG} = {{\pi a \cos \phi} \over {180 \left(1- e^{2} \sin^{2} \phi \right)^{0.5}}} $$

Therefore a box with width $d$ in meters, centered on the line of latitude $\phi$ has an angular width (in degrees): $$ deg_{LAT} = { {180 d \left(1- e^{2} \sin^{2} \phi \right)^{1.5}} \over {\pi a (1-e^2)} } $$ $$ deg_{LNG} = { {180 d \left(1- e^{2} \sin^{2} \phi \right)^{0.5}} \over {\pi a \cos \phi} } $$

With those in the bag I can make a bounding box, which I intentionally make too large by 10% because I am suspicious of the precision of the above formulae. When it comes time to map the points I use the distance finding methods avaiable in Leaflet to filter it down to just the circle of radius $d$. The dataset has over 300,000 properties and this rough filtering brings that down to about 1,000 for the area around my apartment, which is in Oliver -- the densest neighbourhood in Edmonton. I also add a parameter to only retrieve at most 2000 properties, since few parts of the city would even have that many within a small radius, so I'm willing to make the sacrifice to ensure the data returned is not too big.

@Injectable()
export class AdjPropertyService {

    constructor(private http: Http) {}

    degLat(latitude: number, distance: number){
      const phirad = latitude*pi/180; // latitude in radians
      const a = 6378137;
      const ee = 0.00669437999014;
      return (180*distance*(1-ee*(Math.sin(phirad))^2)^1.5)/(Math.PI*a*(1-ee));
    }

    degLng(latitude: number, distance: number){
      const phirad = latitude*pi/180; //latitude in radians
      cont a = 6378137;
      const ee = 0.00669437999014;
      return (180*distance*(1-ee*(Math.sin(phirad))^2)^0.5)/(Math.PI*a*Math.cos(phirad));
    }

    getAdjProperties(location: ILatLng, distance: number){
        // Do some geometry to get the bounds box of properties we want to return
        const deltalat = this.degLat(location.latitude, 1.1*distance);
        const deltalng = this.degLng(location.latitude, 1.1*distance);

        const latlower = location.latitude - 0.5*deltalat;
        const latupper = location.latitude + 0.5*deltalat;
        const lnglower = location.longitude - 0.5*deltalng;
        const lngupper = location.longitude + 0.5*deltalng;

        let params: URLSearchParams = new URLSearchParams();
        params.set('$where', "latitude between " + latlower.toString() + " and " + latupper.toString() +
                             " AND longitude between " + lnglower.toString() + " and " + lngupper.toString() )
        params.set('$limit', '2000')

        return this.http.get('https://data.edmonton.ca/resource/3pdp-qp95.json', { search: params })
         .map(res => res.json())
         .map( res => {
           // do some more processing of the json
         });
    }
}

That solves the problem of finding the individual property value points to add to the map. This creates a rather boring map, though, so I would also like to add some data from the most recent city census. This creates a minor bother as each of those datasets is indexed by neighbourhood, but how do I know what neighbourhood the user is interested in?

This is where it is wonderful that the neighbourhood dataset is a geojson, with the neighbourhood boundaries represented as MultiPolygon objects. I can simply query which neighbourhoods intersect with the coordinates given by the google API. Since no neighbourhoods overlap, this is reasonably fool-proof. Worst case scenario the user picks a point exactly on the boundary between two neighbourhoods, in which case there is no reason to prefer one over the other and simply taking the first neighbourhood returned works just as well.

@Injectable()
export class NeighbourhoodService {

    constructor(private http: Http) {}

  getNeighbourhood(location: Location){
    let params: URLSearchParams = new URLSearchParams();
    params.set('$where', "intersects(the_geom,'POINT( " + location.longitude.toString() + " " + location.latitude.toString() +")')")

    return this.http.get("https://data.edmonton.ca/resource/nckr-nnqj.json", { search: params } )
                    .map((res:Response) =>  res.json())
                    .map(res => {
                      if (Object.keys(res).length === 0){
                        throw new Error('Cannot find neighbourhood for that location');
                      } else {
                        return new Neighbourhood(res[0])
                    }});
  }
}

With this I can go query other data sets to my hearts content and present that data to the user as a panel of "interesting stats" for the area they have chosen. I would like to do all the queries concurrently instead of consecutively, which is the perfect use case for forkjoin (available somewhere in rxjs/Rx, I just import the whole thing)

@Injectable()
export class NeighbourhoodService {

  // etc.

  getTransitStats(neighbourhood: number){
    let params: URLSearchParams = new URLSearchParams();
    params.set('neighbourhood_number', neighbourhood.toString())

    return this.http.get('https://data.edmonton.ca/resource/khwt-yn74.json', { search: params })
                    .map((res:Response) => res.json())
                    .map( // some more processing );
  }

  // and a bunch more methods calling other apis in the same way as getTransitStats()

  getNeighbourhoodStats(location: Location){
    return this.getNeighbourhood(location)
               .flatMap( (neighbourhood) => {
                  return Observable.forkJoin( Observable.from([neighbourhood]),
                                              this.getTransitStats(neighbourhood.number),
                                              this.getCitizenshipStats(neighbourhood.number),
                                              this.getDwellingStats(neighbourhood.number),
                                              this.getEmploymentStats(neighbourhood.number));
    });
  }
}

That is basically the hard part. The rest of the code is basically just glue between the values returned here, the leaflet map, and some d3 barcharts I threw together.