Home > Pseph > Vote Compass 2016

Counting Vote Compass pixels

You can download a zip file of the screenshots and code here (3.4MB).

In my post on the Vote Compass questions, I mentioned automating the counting of pixels to work out the loadings of each question on the economic and social ideology scales. In this post I work through the process start to finish. There is no fancy maths involved – just a series of sometimes tedious steps. It took me about three hours in total; I've done similar sorts of basic image processing in the past, so I had no conceptual barriers that totally puzzled me at any step. (On the other hand, perhaps there's a clever convolution-based method that a more proficient image processing person could use that would speed it up substantially both in running and coding time.)

The goal is to isolate the effect of each question in the Vote Compass survey on the ideology scores. I started with question 1, choosing the answer 'strongly agree', and then chose 'neutral' for all other questions. I then clicked to the results page, and took a screenshot. I then clicked back to the questions, reset my answer to q1 to 'neutral', chose 'strongly agree' for q2, clicked over to results, screenshot. Set q2 to 'neutral', .... This isn't ~fun~ but I get into a rhythm and the promise of data at the end keeps me going. (Vote Compass records the timestamps and uses cookies, so their quality-control processes should filter out my super-fast answers before they do their post-election analysis.)

Here's a screenshot. My laptop screen is 1366 × 768 pixels.

Note the scrollbar on the right of the screenshot – I'd scrolled down a little to get the whole graph in view. If I'd thought about it ahead of time, I probably would have made sure to scroll down a specific amount (I don't know, one page-down then four up-arrows, something like that). Instead I just scrollwheeled and added an extra step that I'll get to shortly.

After working through each of the thirty questions, I have a series of screenshots q1.png, q2.png, ..., q30.png. I also have a q0.png (all answers 'neutral') to verify that an all-neutral set of answers gives an ideology at the centre of the plot.

(Optional step.) The compass only takes up a minority of the width of the screen, and it's convenient to crop to it. In Pinta (sort of like MS Paint for Ubuntu; I'd use Paint if I was doing this in Windows), I just read some pixel coordinates from the mouse location – I wanted to crop to a 480-pixel wide image, starting about 222 pixels from the left (the details here aren't important). After saving a backup copy of my folder of screenshots (important!!), I used ImageMagick's mogrify:

mogrify -crop 480x768+222+0 *.png

(I highly recommend ImageMagick's convert and mogrify tools for image manipulation from the command line. Be careful with mogrify though – it alters the images in place, so if you enter some wrong coordinates, you can wreck all images in a folder in a few seconds. Hence the backing up step!)

Now I have images narrow enough that if I zoom in in Pinta, I'm immediately on the y-axis of the compass instead of something meaningless to the right.

R time. The next step is to crop to the compass itself. The text just above it, "How you fit in the political landscape", is distinctive and safely out of the way of the compass itself (maybe it's possible that the 'YOU' circle and caption can go over the top of the square, amd maybe in the future I'll want to run my code on such an image). In Pinta, that text is roughly in rows 215 to 240 of my q0.png. The algorithm is then:

(The 435 is roughly the height of the compass.) The hope is that at some point there'll be an exact match of the pixels in the title. Code (the EBImage package is not on cran, but the installation is still easy):

library(EBImage)

# Load a picture whose title pixel rows are known:
test_img = readImage("q0.png")

base_title_rows = 215:240

# Pixels of an image loaded with EBImage are accessed
# in the form [x, y, channel], where channel is 1 for red,
# 2 for green, and 3 for blue, and y=0 is the top row.
# The ordering convention of [x, y] instead of [y, x] is
# infuriating if you're used to thinking of an image as a
# matrix of rows and columns.

title_img = test_img[ , base_title_rows, ]
num_title_rows = length(base_title_rows)

image_numbers = 0:30

for (i in image_numbers) {
   # Progress update:
  print(i)
  
  infile = sprintf("q%d.png", i)
  outfile = sprintf("new_q%d.png", i)
  
  this_img = readImage(infile)
  
  img_height = dim(this_img)[2]
  
  # min_deviation will store the current smallest difference between
  # the possible title pixels and the reference title pixels.  
  # Initialise to a very high number to ensure that there'll be an
  # update of the value.
  min_deviation = 1e100
  
  # min_j will hold the current guess at the start of the title pixels:
  min_j = -1
  
  for (j in 1:(img_height - num_title_rows + 1)) {
    this_test_rows = j:(j + num_title_rows - 1)
    test_img = this_img[ , this_test_rows, ]
    
    this_deviation = sum(abs(title_img - test_img))
    
    if (this_deviation < min_deviation) {
      # Woohoo, new guess for when the title starts.
      min_deviation = this_deviation
      min_j = j
    }
  }
  
  # Crop and write the cropped image to file:
  cropped_img = this_img[ , min_j:(min_j + 435), ]
  writeImage(cropped_img, outfile)
}

Cropped image (in the code, I've called these – creatively – new_q0.png, new_q1.png, ...):

I now have a set of images where the origin of the graph axes is at the same pixel. The goal now is to locate the grey dot, representing 'YOU', in each image. Note that the axis lines do not appear when they go behind the circle surrounding the dot. By contrast, a faded axis line is visible when it goes through the bubble surrounding the 'YOU'. The dot inside the circle is therefore the easiest thing to search for, since the pixel values won't ever be polluted by axis lines.

The procedure is similar conceptually to the cropping described above. In Pinta, I zoom in on new_q0.png and find that a bounding box for the dot is (244, 251)-(250, 258). The next step is load new_q0.png in R, extract that little box of pixels, then loop through the other 30 images, searching for a 7 × 8 rectangle with matching pixels. Because of aliasing, there will not necessarily be exact matches, so (as in the cropping) I just calculate the sum absolute differences and write to file the location of the smallest such value.

To check the accuracy of the code, I also replace what it thinks is the dot with black pixels, for manual inspection afterwards.

library(EBImage)
library(readr)

# Offsets based on manual inspection of the results:
offsets.df = read_csv("location_offsets.csv")

# Image with known location of grey dot:
neutral_img = readImage("compasses/new_q0.png")

# Extract the grey dot:
grey_dot_img = neutral_img[244:250, 251:258, ]

# For checking, make a black dot:
overwrite_dot_img = grey_dot_img
overwrite_dot_img[overwrite_dot_img < 1] = 0

# These will become vectors that store the locations of the dots:
locate_j = integer()
locate_k = integer()

image_numbers = 0:30

for (i in image_numbers) {
  print(i)
  infile = sprintf("compasses/new_q%d.png", i)
  outfile = sprintf("compasses/located_q%d.png", i)
  
  this_img = readImage(infile)
  
  # min_deviation will store the current smallest
  # difference between a 7*8 sub-image and the known
  # grey dot.
  min_deviation = 1e10
  
  # min_j and min_k will store the location of that
  # smallest difference:
  min_j = -1
  min_k = -1
  
  # For current purposes, the grey dot will be pretty close
  # to the centre of the picture, so the search need only
  # be in a fairly small window around it:
  for (j in 211:276) {
    for (k in 222:277) {
      this_block = this_img[j:(j+6), k:(k+7), ]
      this_deviation = sum(abs(this_block - grey_dot_img))
      
      if (this_deviation < min_deviation) {
        # New best guess for the dot!
        min_deviation = this_deviation
        min_j = j
        min_k = k
      }
    }
  }
  
  # Create a new image, over-writing the guessed grey dot
  # with a black dot, and write to file.
  new_img = this_img
  new_img[min_j:(min_j + 6), min_k:(min_k + 7), ] = overwrite_dot_img
  writeImage(new_img, outfile)
  
  # Append the found coordinates to the relevant vectors:
  locate_j = c(locate_j, min_j)
  locate_k = c(locate_k, min_k)
}

# Add the manual offsets:
offset_j = locate_j + offsets.df$offset_x
offset_k = locate_k + offsets.df$offset_y

# The first entry in the vectors is for all-neutral answers:
base_j = offset_j[1]
base_k = offset_k[1]

# Subtract the all-neutral location from all values
# (and flip the y-axis for convention):
final_j = offset_j - base_j
final_k = -(offset_k - base_k)

# Write loadings to file:
final.df = data.frame(econ = final_j, soc = final_k)
write.csv(final.df, file="question_loadings.csv", row.names=FALSE)

I haven't defined the manual offsets that I read in the code above; at first I ran the code without those lines. As well as writing the derived economic and social weightings of each question to file, the code also writes files located_q0.png, located_q1.png, .... Zoomed out, the grey dot in new_q1.png appears to have been well-located:

But zoomed in 16x, it's clear that it's not quite right – some grey pixels of the dot are visible to the left of the black replacement:

Instead of being 7 pixels wide, the grey dot in new_q1.png is 8 pixels wide (this is clear when zooming in on new_q1.png); presumably the true locations of the dots are fractions, which aliased when plotting in my web browser. I therefore introduce a manual offset of -0.5 pixels in the economic direction. And then I zoom in on all the other located_q images, typing offsets into a CSV file as needed:

image_num,offset_x,offset_y
0,0,0
1,-0.5,0
2,0,0
3,-0.5,0
...

Re-run with the offsets defined, and I have the desired CSV of question loadings.

Posted 2016-05-12.


Home > Pseph > Vote Compass 2016