The Effectiveness of Edmonton Transit

Posted in civics on Monday, August 08 2016

I was having a conversation, the other day, about how much of a pain it is to take ETS to and from the 'burbs. I am a big fan of not driving as much as possible and I resisted owning and driving a car for years in Edmonton (notably when I lived downtown and either worked downtown or at the University) but no longer, I have a car. My particular breaking point was working in a business park that wasn't really transit accessible -- by bus, train, bus, and then walking my trip to work took over an hour each way, with a car that dropped to 20 minutes max. I figure this experience generalizes well and exlains why transit ridership is really low in Edmonton. Transit takes forever and it sucks, whereas everywhere is a 20-30min drive from everywhere else in this town.

So that's a supposition, but how to test it? People live in lots of places and go to lots of places, I can't exactly check all possible trips (or even a reasonabe sub-sample). However I can drill down into a particular and common use-case for transit: going to work. Since a lot of people work downtown (I think it is the singularly most popular work location) I am going further check transit time from anywhere in the city to downtown.

I am taking Churchill square as a proxy for downtown. It has bus stops within half a block and an LRT station underneath. It is also pretty much the heart of downtown. Basically I am picking what is probably the easiest place to get to in Edmonton, by ETS. I am going to match that head-to-head with driving.

So that's the problem definition: figure how long it takes to get from anywhere in the city to Churchill Square by transit or by car.

Where is anywhere?

First off I need to think about how I'm going to aggregate my data. I want to aggregate by neighbourhood, so do I pick the geometric center of each neighbourhood or what?

That seems problematic, the geometric center of a neighbourhood could be a park or something. What I want are actual street addresses. Say randomly sample 10 locations from each neighbourhood and look at that. It's a big enough number to meaningfully average over while also being small enough to be manageable.

First I load up the City of Edmonton neighbourhood shapefile and sort out an index based on neighbourhood number (I have previously downloaded the shapefile). This is going to be helpful for mapping and aggregating data by neighbourhood.

import glob
import pandas as pd
import numpy as np

from geopandas import GeoDataFrame
from collections import defaultdict
from itertools import repeat

edmonton_hoods_shapefile = glob.glob('Neighbourhood_shapefiles/*.shp')[0]

edmonton_hoods = GeoDataFrame.from_file(edmonton_hoods_shapefile)
edmonton_hoods['Neighbourhood'] = edmonton_hoods['name'].apply(lambda x: x.upper())
edmonton_hoods['number'] = pd.to_numeric(edmonton_hoods['number'])
edmonton_hoods.index = edmonton_hoods['number']

neighbourhood_table = defaultdict(repeat(np.nan).__next__)
neighbourhood_table.update({ key:value for key,value in zip(edmonton_hoods['Neighbourhood'], edmonton_hoods['number'])})

Next I need some addresses grouped by neighbourhood. I can get these through the City of Edmonton property assessment database (they might have another datasource, but I already downloaded this for another project). Before I do that I need to make some corrections for spelling differences between the two data sources. Furthermore I slice up the dataset to only include residential properties (people's homes) and I remove neighbourhoods with less than 10 residential properties. This should give me a good dataset of where people live in the city.

#Fixing some erratic spelling in the two datasets
neighbourhood_table['ANTHONY HENDAY SOUTHEAST'] = neighbourhood_table['ANTHONY HENDAY SOUTH EAST']
neighbourhood_table['PLACE LA RUE'] = neighbourhood_table['PLACE LARUE']
neighbourhood_table['RAPPERSWIL'] = neighbourhood_table['RAPPERSWILL']
neighbourhood_table['SOUTHEAST (ANNEXED) INDUSTRIAL'] = neighbourhood_table['SOUTHEAST INDUSTRIAL']
neighbourhood_table['WESTBROOK ESTATE'] = neighbourhood_table['WESTBROOK ESTATES']
property_values_csv = 'datasets/Property_Assessment_Data.csv'
residential = (pd.read_csv(property_values_csv)
                 .where(lambda df: df['Assessment Class'] == 'Residential')
                 .assign(name = lambda df: df['Neighbourhood'])
                 .replace({'Neighbourhood': neighbourhood_table})
                 .filter(lambda group: len(group) > 10)

How long does it take to get from anywhere to downtown?

Now that I have a dataframe full of addresses to sample from, I need to find a route from each address to downtown either by ETS or by car. Thankfully I can just use the google maps api. This should give me the best choice for each transit method. I have specified arriving by 8:00am on Wednesday just to keep everything on an even footing, and this lines up with my "transit for commuters" model. Given that the bus schedules are really set around rush-hour this is probably giving ETS the best possible score.

I am using the requests library, it makes goofing off with strange APIs super easy and convenient. Though before you get started you will need to get an API key (strictly this is only needed for transit directions)

import requests

API_KEY = 'your api key here'
API_URL = ''

I'm going to request directions for a bunch of addresses at a time so I need a way to unpack the json that comes back into a DataFrame like object. I'm just going to first take all the travel times (durations) and put them in a list, checking first that the API actually returned something (not all places in Edmonton are transit accessible at all actually)

def unpack_times(request):
    distance = []

    for row in request['rows']:
        if len(row['elements']) > 0 and 'duration' in row['elements'][0]:

    return distance

Next I'm going to write the larger function that makes the request (this is how I usually code, start with the smaller problem and build out to the larger one). So I need to do the following:

  1. Randomly sample 10 address from the property database
  2. Convert those addresses into a location string that the google maps API can handle
  3. Set the API options for destination, mode, and arrival time
  4. Make a request for 'driving'
  5. Make a request for 'transit'
  6. Unpack those requests into a dict

I chose a dict as a return object because it can be converted into a DataFrame really easily, if I collect a list of these dicts then collapsing it all into one DataFrame is a single line of python.

def transit_calc(name, neighbourhood):
    smpl = neighbourhood.sample(10)
    strs = ['{},{}'.format(row['Latitude'], row['Longitude'])
            for index, row in smpl.iterrows()]
    payload = {'origins': '|'.join(strs),
               'destinations': 'Sir Winston Churchill Square, Edmonton, AB',
               'mode': 'driving',
               'arrival_time': 1470837600, #8am this coming Wednesday
               'key': API_KEY}

    driving = requests.get(API_URL, params=payload).json()

    payload['mode'] = 'transit'
    transit = requests.get(API_URL, params=payload).json()

    results = {'Neighbourhood': [name]*10,
               'Driving': unpack_times(driving),
               'Transit': unpack_times(transit)}

    return results

Now comes the bulk of the work. The easiest way is to iterate through the property database, making a request neighbourhood by neighbourhood. This works fine but it is slow, making one request at a time. Alternatively you can do this asynchronously, which is much faster but a little more tricky. Either way works.

This is also where you may want to break up the work into chunks just so you don't exceed your API limit. The google maps API is limited both by hourly rates and a daily limit, at least for freebie access.

results = []
for name, group in residential:
    res = transit_calc(name, group)

df = pd.concat([pd.DataFrame(row) for row in results])

I saved my dataset immediately after generating it so I can play around all I want later without having to re-request thousands of transit times.

Now I can aggregate the driving and transit time by neighbourhood. I calculated the mean driving time and mean transit time, and converted them to minutes (google returns it as seconds). Then merged this aggregated dataset with the neighbourhood geometry data so I can map it easily

transit = (pd.read_csv('datasets/transit_data.csv')
             .assign(Diff = lambda df: df['Transit'] - df['Driving'])

agg_transit = pd.DataFrame()
agg_transit.loc[:, 'Mean Driving'] = transit['Driving'].aggregate(np.nanmean)/60
agg_transit.loc[:, 'Mean Transit'] = transit['Transit'].aggregate(np.nanmean)/60
agg_transit.loc[:, 'Mean Diff'] = transit['Diff'].aggregate(np.nanmean)/60
agg_transit.loc[:, 'Std Diff'] = transit['Diff'].aggregate(np.nanstd)/60

aggregated = GeoDataFrame(pd.concat([agg_transit, edmonton_hoods], axis=1))

Mapping the Results

I want to map the results on top of a map of Edmonton, so you get a spatial sense of where each neighbourhood is, and I also want to map both the driving data and the transit data with the same colour bar, so the difference between the two is very apparent.

Mean Driving Time

Mean Transit Time

As you can see my gut instincts for driving were pretty accurate. Most neighbourhoods are within a 30min drive from downtown, even in the morning.

However transit is a lot worse of an option. Few neighbourhoods are less than 30 minutes by transit and most of those are in the core of the city. The exceptions being, I'm assuming, places near major transit centres and LRT stations. Which is also why you see neighbourhoods where the mean transit time is ~30min right next to ones where it is almost an hour. In general though, for most of Edmonton, if you want to take transit for your morning commute you are looking at >40min.

It's easy to see why people don't really take transit.

As usual the ipython notebook is on github