(Go straight to the list of maps.)
The AEC provides booth-level data in digital form for federal elections since 1993. By interpolating these booth results and putting them on a colour scale, we can make some fun pictures, showing which parts of an electorate or city favour a particular party or candidate, and I've animated them so that you can watch any changes over the last twenty years. These maps don't always make it obvious which party is dominant in seats where the population density varies widely. The booth locations are plotted lightly, with larger booths appearing larger, so in principle it would be possible to count the booths in the Labor red region and the booths in the Coalition blue region. But an electorate that includes both a Labor-favourable town and a large, thinly-populated area of Coalition voters will look mostly blue on the two-party-preferred map, while the actual TPP result may be close to 50-50. If you want representations of booth results which show both the geography and number of voters, I suggest looking at Ben Raue's maps at The Tally Room.
The first map on each electorate or city page is the two-party-preferred, on a blue-white-red colour scale. The numbers are presented as Labor's share of the TPP vote, with red being high, and white at 50%. The second map is the TPP with a moving colour scale – white is at (close to) the average Labor TPP vote across the region being estimated. The moving-colour-scale plots I find quite interesting. Often not much seems to happen apart from redistributions of the electorate, which shows that the swings across an electorate can be quite uniform: a relatively good area for one major party tends to stay that way compared to the surrounding region, even as that whole region swings one way or the other.
If at any time over 1993-2013 an electorate was "non-classic" (i.e., the last two candidates weren't Coalition v Labor), then two-candidate-preferred maps are also shown, with the colour scales matching the top two parties (or grey for an independent). Apologies to anyone with red-green colourblindness if the Melbourne TCP maps aren't clear!
The remaining maps are of primary vote percentages of various parties I find of interest. On any animated map, you can click on the image to bring up a page showing the individual frames as stills.
Maps are available for every seat (Prospect is included under "McMahon") and selected urban areas. See the list of maps here.
Data sources and quality
The election results from 1993 to 2001 are available for download from the AEC at this page, while for 2004 onwards the results are in their currently-standard spreadsheet form at the relevant year's page (links here).
For 1993, two-party or two-candidate preferred results are available at the division level, but only the primary votes are given at the booth level. To estimate TPP or TCP at booth-level in 1993, I applied the average preference flow for each division to all of its booths. For all subsequent elections both TPP and TCP results are given at booth-level by the AEC.
More problematic are the locations of the booths. From 2007, the AEC provides latitude and longitude coordinates for each regular polling place (though the 2007 spreadsheet contains several errors, corrected by 2010). For 2004, there are no coordinates given, but the spreadsheet uses the same set of unique PollingPlaceID's as in later years, meaning that the positions can be found by matching up the ID's. But there are about 150 regular booths used in 2004 that were abolished in 2007, and for these booths I used the (often partial) addresses listed by the AEC and spent time on Google Maps to fill in the coordinates.
For 1998 and 2001, the AEC just has a list of booths with (partial) addresses. Since there are usually around 8000 booths per election, I matched up as many from 2001 as I could by name to the 2004 locations, and then matched as many as possible from 1998 to the 2001 locations. This will almost certainly introduce some errors: a booth named after a particular suburb might move from a church hall to a primary school, say, but my code won't have noticed and will report the more recent location. For the unmatched booths (180odd for 2001, 260odd for 1998), I spent more time on Google Maps looking up coordinates.
For 1993 and 1996 the AEC's downloads only provide the names of the booths. Once again I tried to match by name; to get the (partial) addresses of the booths left over I spent some time in the Fryer Library at UQ and the State Library of Queensland, scanning from newspaper microfilm the AEC's election-eve advertisements listing all the candidates and the (partial) addresses of polling places. Unfortunately the advertisements don't tell us the names of the booths as given in the AEC downloads – they're just grouped by suburb or town. When there are many booths in a suburb or town, that meant painstakingly working through the addresses to work out which was [suburb] North, which was [suburb] East, and so on. Complicating matters were all the booths having the name of a suburb that were actually located in a neighbouring suburb. Referring to the 1998 addresses made this easier, but there were still plenty of cases where I just had to guess the address of a particular booth. And once I had a partial address, it often wasn't obvious where on the street or road it was – a lot of halls, schools, and hospitals have been demolished over the past couple of decades, replaced by houses or commercial buildings. Sometimes I was patient enough to sift through enough old documents to nail down the location, but often I wasn't.
In the zip file of my data and code (see below), I have included lists of the booths that I couldn't match by name, along with the coordinates I used and flags for whether I had to guess the address and, given the address, whether I had to guess the coordinates. (The latter was also quite common for booths in localities which look from the satellite like they have a population of a few dozen. In those cases I often just wrote down the coordinates of where Google Maps landed me.)
And in addition to any errors listed above, it is almost certain that I made some transcription errors in copying the coordinates from Google Maps.
The maps for 2013 incorporate data that is copyright © Commonwealth of Australia (Australian Electoral Commission) 2013. The shapefiles for current electoral boundaries are available for download from the GIS data page on the AEC's website.
For earlier elections, I used the shapefiles available for download from the ABS. The 2010 boundaries are in 1270.0.55.003 – despite the date of "2011" in the filename, the boundaries do not agree with the redistributions from that year.
For 2004 and 2007 I used the shapefiles from ABS 2923.0.30.001.
Digital boundaries for earlier elections are apparently collecting dust on an old CD-ROM somewhere, but I didn't get a reply from the geography division of the ABS to my email. These gaps are partially filled by Antony Green, who has old boundaries for 39 seats on his 2013 election guide pages (see, for example, his guide to the upcoming Griffith by-election). I asked Antony about these maps, and he said that he drew them manually! What an effort that must be, along with the rest of Antony's work – how lucky we are to have him.
Antony also said that the ABS's boundaries don't always match the true boundaries, but they look close enough for my purposes.
So, for many electorates from 1993 to 2001, I have to guess the boundaries of each electorate. My interpolations are defined on grids; a grid cell was assigned to an electorate if the nearest booth to it belonged to the electorate. (If the nearest booth was shared between two or more electorates, then I went to the next-nearest booth.) Often this works pretty well – an electorate bounded by a river might have a few grid cells from the other side of the river assigned to it, but it's not too serious. Occasionally I end up with an electorate broken up into disjoint subsets, with Petrie perhaps being the most egregious example – a relatively narrow strip of the electorate covers the wetlands of Hays Inlet and the Pine River, and the nearest booth to many grid cells in this region is often on dry land in a neighbouring electorate. I suspect that errors in my booth locations make this general problem worse.
For the interpolations over urban areas, I used the Significant Urban Area shapefiles from ABS 1270.0.55.004, simplifying the geometry in QGIS, to define the boundaries of the region to interpolate over.
The place names (towns or suburbs) that appear on the maps are for the most part randomly sampled. The positions are the centroids of the suburb polygons from ABS 1270.0.55.003.
The shapefile used to define the land area of Australia and the state and territory borders was from ABS 1259.0.30.001, with the geometry simplified in QGIS.
The simplification of the geometries for two of the above files was necessary to stop the enormous polygons from eating up too much of my computer's RAM. These simplifications cause at least two issues: Stradbroke Island becomes part of the mainland, and often the (unsimplified) electorate boundaries don't perfectly match up with a (simplified) coastline or river, meaning that a handful of isolated cells are not assigned to a seat when they obviously should be.
Interpolation method and quality
I converted the positions from latitude and longitude to easting and northing before creating the grid and performing the interpolation. This conversion is usually harmless, and it's easier to work in metres rather than latitude/longitude. But it diverges rather a long way from best practice for very wide electorates such as Durack or Kalgoorlie: the UTM zones are only 6 degrees of longitude wide, and Western Australia covers about 15 degrees of longitude! I convert to the UTM zone of the median-by-longitude booth for each electorate and live with any distortions that creates; interpolations across the very wide, sparsely populated electorates shouldn't be taken too seriously anyway.
I used inverse-distances (power 2) with declustering weights to perform the interpolation. The declustering weights aren't particularly important, mostly being useful in the rural electorates – most booths will be spread out, but a large town might have several booths close together. By using the declustering weights, the apparent influence of the town doesn't extend so far into the countryside. The declustering weight attached to a booth is the reciprocal of the number of booths within a specified radius of it (including the booth itself). The radius chosen is half the median nearest-neighbour distance between booths in the electorate. So, for example, if the median nearest-neighbour distance is 25km, and there are 5 other booths within 12.5km of the Regional Town Central booth, then Regional Town Central gets a declustering weight of 1/(5+1) ≈ 0.167.
For each grid cell estimated, this declustering weight is multiplied by the inverse-square of the distance from booth to grid cell. The nearest 10 booths are used for the estimation, with the result being the weighted average of the votes at the 10 booths. The choice of the power and the number of samples to use doesn't make a huge amount of difference, but the parameters I ended up using gave the best or near-best results from leave-one-out cross-validation for the Sydney and Brisbane urban areas.
Those LOOCV results were a little on the disappointing side: estimating an urban booth's TPP from the surrounding booths gave a mean absolute error of usually around 4.5 percentage points, with a standard deviation of the errors around 5.5 or 6 percentage points. (The errors are, not surprisingly, substantially larger in predominantly rural electorates.) I was expecting a standard deviation closer to 3 or 4 percentage points, but I suppose that with the TPP across an electorate varying by 30 or more points, a standard deviation of 6 means that at least all's not hopelessly lost.
I interpolated over a grid that is quite fine – probably quite a bit finer than is statistically justified, especially given the relatively large LOOCV errors. Still, I think the fine grid looks nice, and that's good enough for me when I'm doing something on my own time.
Some readers, especially if they're colleagues or former students of mine, might suggest fitting a variogram model and Kriging. Kriging should generate better results than inverse-distances on the average, but I'm reluctant to use it for these datasets because of problems with negative weights. The variogram for the TPP has a low nugget (relative to the booth TPP variance) and long range (relative to the typical booth spacing in metro areas). A neighbourhood of just four samples can give average negative weights of more than 10%, and I'd prefer to use 10 or more samples, where negative weight problems are even greater. The risk is that one rogue booth could have a large negative weight attached to it, and the resulting TPP estimate being outside the range of the individual booths' TPP's. (A highly exaggerated conceptual example: suppose you work with a neighbourhood of 2 samples. The nearest has TPP 20% and is given weight 1.4; the furthest has TPP 80% and is given weight -0.4. The estimated TPP at the grid cell would be -4%. I doubt you'd see anything that bad, but I didn't want to have to perform lots of sanity checks on the estimates.)
Some booths attract more voters than others. I can see an argument for weighting large booths more heavily than small booths. On the other hand, a booth with a large number of voters may simply be in a high-density area. I'm happy enough without weighting the results by booth size. I did remove booths with less than 40 formal votes, and also any booths with less than 100 formal votes unless it was more than 3 kilometres from the nearest booth. (Values chosen arbitrarily.)
Minor point: there was no Coalition candidate in the seat of Newcastle in 1998, and so there is no TPP interpolation for the seat of Newcastle that year. Nevertheless, I couldn't be bothered handling this special case for the interpolation over the Newcastle urban area, and so the urban area does have a TPP interpolated across all of Newcastle in 1998, using booths in the surrounding electorates where necessary. Similarly, when a party doesn't contest a seat, I use booths from surrounding seats when interpolating the primary vote across an urban area (this applies to the Liberal Party in Newcastle in 1998, and to various minor parties over the years).
One final point is that a lot of people don't vote at their nearest booth. I'm putting this one in the too-hard basket; perhaps you could make some improvements by incorporating census data, but I'll leave that to others to play with. There's clearly enough continuity in the way people vote across cities that the pictures can tell decent stories, even if you wouldn't want to rely on them too heavily to precisely estimate the votes of the people in a particular block of one suburb.
Data and code for download
For those who want to play along at home, I've made a zip file with data and code which can be downloaded here (10 MB). Included are:
- Booth data with locations in CSV format; one set of files lists each shared booth separately, and one adds up all the votes at a single location. Only the parties that you see in the maps here are in the dataset, though it's easy for me to generate the CSV files and maps for other minor parties.
- CSV files of booths which I couldn't match by name and had to find or guess locations for.
- The shapefiles with simplified geometries as described above, and Antony Green's KML files (sometimes these got a slight edit by me, to ensure that all polygons were closed).
- R code to generate the maps. This includes some shell calls to make the animated gifs; I'm aware that there's an R package that creates gifs, but I decided to trust ImageMagick's optimisation over the R package. The shell calls are in DOS format and would need to be adapted to Linux, but I think that everything else will work regardless of OS.
This was the first time I'd written anything substantial in R, so it's probably not pretty reading. I'm also not overly happy with the feel of the code more generally – I wrote the main routine to do TPP and primary votes for electorates, and then grafted on extra features: TCP, moving colour scales, urban areas. As a result there are lots and lots of if statements, as I swap back and forth between the requirements for seats and the requirements for urban areas, etc. There are also quite a few aesthetically displeasing workarounds with ggplot's colourbars.
Not included in the download are:
- The multiple stages of matching booths by name or PollingPlaceID, and generating the final CSV files of the booth data. I can make this available if anyone's actually interested in what it looks like.
- The shapefiles that I didn't simplify and which are available for download at the links given above. (I think the code will run without them, and it will guess the electoral boundaries, but I haven't tested it.)
- Possibly other things that I've forgotten that the code relies on, but hopefully not.
If you have questions, you can send them by email to email@example.com, or if the question fits inside a small multiple of 140 characters, then you can try tweeting me at @pappubahry.