Home > Maps

R code for electorate maps

I have a few sets of maps of Australian electorates, most of which I made with the R code below. The basic idea is that I'll want to draw the same regions every time I make a set of maps, just changing the variables used to make the colour scale. There are two slightly tricky things: drawing the labels so that they don't overlap one another, and making sure that the zoom on each region is the same each time I make a set of maps.

QGIS does automatic labelling with little effort, but I couldn't find a way to get repeatable zooms (it looks like it might be possible with Print Composer, but I found it very hard to use). R with ggplot2 makes it easy to set the bounds on the plot axes, but you have to tinker with the label positions manually. The latter is annoying, but it only needs to be done once (unless the length of the label text changes, as might happen sometimes).

I have a zip file here containing a shapefile of the electorates – it's the ABS's CED_2011_AUST shapefile, with simplified geometries to reduce the filesize and a few pathological features (no-usual-address-type things) removed. It also contains an example CSV file, which has the percentage of people who had changed address in the 12 months prior to the 2011 Census. The ABS's shapefile is only an approximation to the boundaries defined by the AEC, but since I use ABS data to calculate things about electorates, I follow the ABS boundaries.

The ABS spells, e.g., 'McMillan' as 'Mcmillan'. This is the spelling in the shapefile, and the spelling in CSV files generated by TableBuilder. The code below joins a CSV file to the imported shapefile by electorate name, and the spellings have to be consistent. If your CSV file has proper Mc spellings, then I would suggest moving the re-naming of the Mc electorates to just before the join.

Drawing complicated polygons is something that takes ggplot a little bit of time compared to GIS software – a full set of eight plots takes about four to five minutes on my computer.

# This is a file that uses the ABS's electorate shapefile, joins a CSV file of data to it, and
# plots maps for the country as a whole, then zoomed in on the capital cities and Tasmania. 
# Shading is on a continuous scale.

# The ggplot objects are created by the return_plot function, which takes a very specific set of
# arguments.  Shading and variable names should be changed in that function.
#
# The label text is defined in the main section of the code, just after importing the polygons and
# tidying up some of the capital letters.

# I never try to remember which of these libraries I actually need:
library(rgdal)
library(maptools)
library(plyr)
library(ggplot2)
library(grid)

return_plot = function(poly_df, poly_shp, label_df, x_min, x_max, y_min, y_max) {
  # Creates a ggplot object to be printed.
  # Input data is assumed to be of precisely the correct format.
  
  # Rainbow colour scales are bad:
  # colour_vec = c("#FF00FF", "#0000FF", "#00FF00", "#FFFF00", "#FF8800", "#FF0000")
  colour_vec = c("#00FFFF", "#FF8000")
  
  # Change the fill in the aesthetic ('Perc_moved') to the variable in the CSV file as needed.
  # For a categorical variable, use fill=factor(variable) and delete the scale_fill_gradientn().
  map_plot = ggplot() + geom_polygon(data=poly_df,
                                     aes(x=long, y=lat, group=group, fill=Perc_moved)) +
    geom_path(data=poly_shp, aes(x=long, y=lat, group=group), color="#888888") + coord_equal() +
    coord_map(xlim=c(x_min, x_max), ylim=c(y_min, y_max)) + 
    geom_text(data=label_df,
              aes(x=x_label, y=y_label, label=label_text, hjust=hjust, vjust=vjust), colour="#000000") + 
    theme(legend.position="top", legend.title=element_blank()) +
    scale_fill_gradientn(colours=colour_vec, space="rgb")
}


png_print = function(out_file, img_width, img_height, ggplot_img) {
  # Function to print a ggplot object to file
  png(filename=out_file, width=img_width, height=img_height, units="px")
  print(ggplot_img)
  dev.off()
  return(TRUE)
}

# In pixels:
print_width = 700
print_height = 500


# The CED_2011_AUST shapefile is created by the ABS and is only an approximation to the true
# boundaries that are set (and available for download from) the AEC.  It is nevertheless 
# appropriate to use the ABS version for mapping Census data, since that is what the ABS
# uses (as can be verified, for example, from the Community Profiles on their website.)
electorates_poly = readOGR(".", "CED_2011_AUST_nomigratory_simpl", stringsAsFactors=FALSE)

electorates_csv.df = read.csv("CED_moved_in_last_year.csv", sep=",", header=TRUE, stringsAsFactors=FALSE)

# Join the CSV data to the shapefile:
electorates_poly = merge(x=electorates_poly, y=electorates_csv.df, by.x="CED_NAME", by.y="CED")

# The ABS has some irregular capitalisation on Mc names:
electorates_poly$CED_NAME[which(electorates_poly$CED_NAME == "Mcewen")] = "McEwen"
electorates_poly$CED_NAME[which(electorates_poly$CED_NAME == "Mcmillan")] = "McMillan"
electorates_poly$CED_NAME[which(electorates_poly$CED_NAME == "Mcmahon")] = "McMahon"
electorates_poly$CED_NAME[which(electorates_poly$CED_NAME == "Mcpherson")] = "McPherson"

# Extracting point locations for each polygon via some black-box process that someone
# probably posted to StackOverflow once:
electorates_poly@data$id = rownames(electorates_poly@data)
electorates.points = fortify(electorates_poly, CEDNAME="id")
electorates.df = join(electorates.points, electorates_poly@data, by="id")

label_points = coordinates(electorates_poly)
x_label = label_points[,1]
y_label = label_points[,2]
label_text = sprintf("%s", electorates_poly$CED_NAME)


# ****************************************************************************
# * Hopefully everything that follows will run automatically and not require *
# * changes.  But it is likely that at some point the labels will overlap,   *
# * and in those cases they will have to be adjusted, either by hjust's or   *
# * vjusts's, or by setting the coordinates directly.                        *
# ****************************************************************************

labels.df = data.frame(x_label, y_label, label_text, stringsAsFactors=FALSE)

# Basic idea: hjust = 0 on east coast, 1 on west coast, 0.5 otherwise
labels.df$hjust = 0.5
labels.df$vjust = 0.5

remove_labels = c("Mallee", "Murray")
labels.df$label_text[which(electorates_poly$CED_NAME %in% remove_labels)] = ""

right_align = c("Pearce", "Forrest", "Barker", "Braddon", "Riverina", "Wannon", "Wills", "Higgins",
                "Gellibrand")
left_align = c("Capricornia", "Flynn", "Wide Bay", "Page", "Lyne", "Hunter", "Eden-Monaro", "Lyons",
               "New England", "Wentworth", "Kingsford Smith", "Warringah", "North Sydney",
               "Jagajaga", "Chisholm")

labels.df$hjust[which(electorates_poly$CED_NAME %in% right_align)] = 1
labels.df$hjust[which(electorates_poly$CED_NAME %in% left_align)] = 0

# Vertical adjustments (set individually)
labels.df$vjust[which(electorates_poly$CED_NAME=="New England")] = 0.3
labels.df$vjust[which(electorates_poly$CED_NAME=="Page")] = 0.3
labels.df$vjust[which(electorates_poly$CED_NAME=="Lyne")] = 0.2
labels.df$vjust[which(electorates_poly$CED_NAME=="Calare")] = 0.3
labels.df$vjust[which(electorates_poly$CED_NAME=="Indi")] = 0.2

# Keep a copy of the labels for re-use later:
labels.df_orig = labels.df

x_min = 107
x_max = 162

y_min = -44
y_max = -9

plot_area = (x_max - x_min)*(y_max - y_min)

keep_electorates = c("Solomon")
remove_electorates = which((electorates_poly$SQKM / plot_area < 6) & !(electorates_poly$CED_NAME %in% keep_electorates))
labels.df$label_text[remove_electorates] = ""

map_plot = return_plot(electorates.df, electorates_poly, labels.df, x_min, x_max, y_min, y_max)

png_print("national.png", img_width=print_width, img_height=print_height, ggplot_img=map_plot)


# ***** Brisbane *****
x_min = 151.8
x_max = 153.8
y_min = -28
y_max = -26.8

labels.df = labels.df_orig

labels.df$y_label[which(electorates_poly$CED_NAME == "Bonner")] = 
  labels.df$y_label[which(electorates_poly$CED_NAME == "Bonner")] - 0.03

map_plot = return_plot(electorates.df, electorates_poly, labels.df, x_min, x_max, y_min, y_max)

png_print("brisbane.png", img_width=700, img_height=500, ggplot_img=map_plot)


# ***** Sydney *****
x_min = 150.55
x_max = 151.6
y_min = -34.24
y_max = -33.6

labels.df = labels.df_orig

labels.df$vjust[which(electorates_poly$CED_NAME == "Wentworth")] = -0.6
labels.df$vjust[which(electorates_poly$CED_NAME == "Sydney")] = -0.6
labels.df$vjust[which(electorates_poly$CED_NAME == "Grayndler")] = -0.1
labels.df$vjust[which(electorates_poly$CED_NAME == "Greenway")] = 1
labels.df$vjust[which(electorates_poly$CED_NAME == "Barton")] = 0.1
labels.df$vjust[which(electorates_poly$CED_NAME == "Bennelong")] = -0.2
labels.df$vjust[which(electorates_poly$CED_NAME == "Kingsford Smith")] = -0.6
labels.df$vjust[which(electorates_poly$CED_NAME == "Chifley")] = -0.6

labels.df$x_label[which(electorates_poly$CED_NAME == "Cunningham")] = 151
labels.df$y_label[which(electorates_poly$CED_NAME == "Cunningham")] = -34.15

map_plot = return_plot(electorates.df, electorates_poly, labels.df, x_min, x_max, y_min, y_max)

png_print("sydney.png", img_width=print_width, img_height=print_height, ggplot_img=map_plot)

# ***** Melbourne *****
x_min = 144.5
x_max = 145.6
y_min = -38.2
y_max = -37.5

labels.df = labels.df_orig

labels.df$vjust[which(electorates_poly$CED_NAME == "Melbourne")] = -0.4
labels.df$vjust[which(electorates_poly$CED_NAME == "Chisholm")] = -1
labels.df$vjust[which(electorates_poly$CED_NAME == "Higgins")] = -0.4
labels.df$vjust[which(electorates_poly$CED_NAME == "Melbourne Ports")] = 1
labels.df$vjust[which(electorates_poly$CED_NAME == "Deakin")] = -0.2
labels.df$vjust[which(electorates_poly$CED_NAME == "Goldstein")] = -0.2

labels.df$x_label[which(electorates_poly$CED_NAME == "McEwen")] = 145.3
labels.df$y_label[which(electorates_poly$CED_NAME == "McEwen")] = -37.57

map_plot = return_plot(electorates.df, electorates_poly, labels.df, x_min, x_max, y_min, y_max)

png_print("melbourne.png", img_width=print_width, img_height=print_height, ggplot_img=map_plot)


# ***** Adelaide *****
x_min = 138.1
x_max = 139.3
y_min = -35.3
y_max = -34.5

labels.df = labels.df_orig

labels.df$vjust[which(electorates_poly$CED_NAME == "Sturt")] = -0.4

labels.df$x_label[which(electorates_poly$CED_NAME == "Wakefield")] = 138.6
labels.df$y_label[which(electorates_poly$CED_NAME == "Wakefield")] = -34.6

labels.df$x_label[which(electorates_poly$CED_NAME == "Barker")] = 139.25
labels.df$y_label[which(electorates_poly$CED_NAME == "Barker")] = -34.9

map_plot = return_plot(electorates.df, electorates_poly, labels.df, x_min, x_max, y_min, y_max)

png_print("adelaide.png", img_width=print_width, img_height=print_height, ggplot_img=map_plot)


# ***** Perth *****
x_min = 115.4
x_max = 116.5
y_min = -32.5
y_max = -31.6

labels.df = labels.df_orig

labels.df$vjust[which(electorates_poly$CED_NAME == "Hasluck")] = 1

labels.df$x_label[which(electorates_poly$CED_NAME == "Canning")] = 116
labels.df$y_label[which(electorates_poly$CED_NAME == "Canning")] = -32.25

map_plot = return_plot(electorates.df, electorates_poly, labels.df, x_min, x_max, y_min, y_max)

png_print("perth.png", img_width=print_width, img_height=print_height, ggplot_img=map_plot)


# ***** Tasmania *****
x_min = 144.3
x_max = 148.6
y_min = -43.8
y_max = -40

labels.df = labels.df_orig

labels.df$vjust[which(electorates_poly$CED_NAME == "Hasluck")] = 1

labels.df$x_label[which(electorates_poly$CED_NAME == "Canning")] = 116
labels.df$y_label[which(electorates_poly$CED_NAME == "Canning")] = -32.25

map_plot = return_plot(electorates.df, electorates_poly, labels.df, x_min, x_max, y_min, y_max)

png_print("tasmania.png", img_width=print_width, img_height=print_height, ggplot_img=map_plot)


# ***** Canberra *****
x_min = 148
x_max = 149.6
y_min = -36
y_max = -35

labels.df = labels.df_orig

labels.df$vjust[which(electorates_poly$CED_NAME == "Hasluck")] = 1

labels.df$x_label[which(electorates_poly$CED_NAME == "Riverina")] = 148.3
labels.df$y_label[which(electorates_poly$CED_NAME == "Riverina")] = -35.4

labels.df$x_label[which(electorates_poly$CED_NAME == "Hume")] = 149.1
labels.df$y_label[which(electorates_poly$CED_NAME == "Hume")] = -35.07

labels.df$x_label[which(electorates_poly$CED_NAME == "Eden-Monaro")] = 148.8
labels.df$y_label[which(electorates_poly$CED_NAME == "Eden-Monaro")] = -35.96

map_plot = return_plot(electorates.df, electorates_poly, labels.df, x_min, x_max, y_min, y_max)

png_print("canberra.png", img_width=print_width, img_height=print_height, ggplot_img=map_plot)

Posted 2014-11-11.


Home > Maps