Home > Misc > Piano diaries

2014-12-07: Hooktheory's API

Hooktheory is a website with several features. Its core is a database of transcribed pop songs, and built on top of this is a "Trends" tool. You click on a chord that you want to start with, and the graphic will update to show how likely each possible next chord is, according to its database. The idea is that if someone who doesn't understand chord progressions wants to try to write a pop song, they (I) can try clicking on some chords and come up with a "safe" progression that won't sound weird. (Alternatively, you could click on a rarely-used progression, and listen to the songs which use it.) All progressions are stored in relative notation, but can be displayed in any key.

Missing is the ability to work backwards, and I plug this gap below. Say I don't want a four-chords song, and instead I want to go out of key somehow. I read the other day about secondary dominants: a subprogression might go V/vi --> vi. e.g., when playing in C major, you have an E major chord followed by an A minor chord. But how do we get to that E major, when we're playing in C?

To answer this, I scraped some next-chord probability data from Hooktheory's API, as well as "overall probability" numbers, and calculated approximate previous-chord probabilities. The API currently only returns overall probabilities for the 28 most common chords (sevenths and inversions are all treated separately), and to simplify my task I only calculated previous-chord probabilities for these 28 chords. This shouldn't be much of a problem for commonly-used chords, but there are some pretty obscure chords in the database, and so I wouldn't trust the numbers in all cases – the probabilities will always sum to 100%, but they might be ignoring some chord changes.

There are some '??'s in the list of chords in the dropdown menu below. My guess is that these were supposed to be degree symbols (i.e. for diminished chords), but usually the API did return a degree symbol when appropriate, so I'm not sure. Also there are some chords with the same text that are only differentiated by the internal chord_ID field, and I have no idea what these mean. I have sometimes indicated these with the letter used in the chord_ID (b, C, D, L, M).

Previous chordProbability

If I scroll down to the V/vi entry, I see that I have quite a bit of freedom in getting to my E major chord. Most common (29%) is F --> E, then 20% for C --> E, 15% for Dm --> E, 15% for Am --> E, 9% for G --> E, and small probabilities for other chords (some of these numbers would be a bit higher if I put all the inversions and sevenths in together).

I'm not getting too ambitious yet – here I am playing a six-second long piece with chord progression C G E Am F (I wrote this before discovering the Hooktheory API, but it's too short to warrant a separate page and it fits in here):

R code follows.

# Grabs list of most common chords from hooktheory, then
# for each of those, the list of most common following chords.
#
# Then calculates for each chord, the most common previous chord,
# and outputs some Javascript and HTML for insertion into an
# appropriate webpage.

library(httr)

base_url = "http://www.hooktheory.com/api/trends/stats"

chord_list = GET(url=base_url)
chord_content = content(chord_list)

chord_id = sapply(chord_content, function(x) x$chord_ID)
probability = sapply(chord_content, function(x) x$probability)
chord_html = sapply(chord_content, function(x) x$chord_HTML)

# Data frame that contains the 28 chords with overall probabilities:
base_chords.df = data.frame(chord_id, probability, chord_html, stringsAsFactors=FALSE)

# Data frame to contain next-chord probabilities:
chords.df = data.frame(first_chord=chord_id, stringsAsFactors=FALSE)

num_first_chords = length(chord_id)

# The first set of chords downloaded isn't the complete set;
# the following vectors will contain all the chord ID's and 
# HTML is the same order as in the main data frame that will
# be created in the loop below.
master_chord_ids = character()
master_chord_html = character()

# Main loop:
for (ct1 in 1:num_first_chords) {
  print(sprintf("Starting chord %d", ct1))
  new_url = sprintf("%s?cp=%s", base_url, chord_id[ct1])
  this_chord_content = content(GET(url=new_url))
  
  this_chord_id = sapply(this_chord_content, function(x) x$chord_ID)
  this_probability = sapply(this_chord_content, function(x) x$probability)
  this_chord_html = sapply(this_chord_content, function(x) x$chord_HTML)
  
  num_second_chords = length(this_chord_id)
  
  for (ct2 in 1:num_second_chords) {
    if (!(this_chord_id[ct2] %in% names(chords.df))) {
      # If we don't have this chord yet, add it to the data frame
      # and the master list vectors of ID's and HTML.
      chords.df[[this_chord_id[ct2]]] = 0
      master_chord_ids = c(master_chord_ids, this_chord_id[ct2])
      master_chord_html = c(master_chord_html, this_chord_html[ct2])
    }
    
    chords.df[[this_chord_id[ct2]]][ct1] = this_probability[ct2]
  }
}

html_ids.df = data.frame(chord_id=master_chord_ids, chord_html=master_chord_html, stringsAsFactors=FALSE)

# Now work out previous chord probabilities.
num_chords = length(chords.df[1, ]) - 1
prev_chords.df = chords.df[, 2:(num_chords+1)]
prev_chords.df = sweep(prev_chords.df, 1, base_chords.df$probability, "*")
prev_chords.df = sweep(prev_chords.df, 2, colSums(prev_chords.df), "/")
prev_chords.df$prev_chord = chords.df$first_chord
prev_chords.df = prev_chords.df[ , c(num_chords+1, 1:num_chords)]

# Sort and output some Javascript and HTML
# (not a complete HTML file).

js_function = "function update_table() {\n var start_table = \"<table><th>Previous chord</th><th>Probability</th>\";\n var end_table = \"</table>\";\n var x = document.getElementById(\"chord_select\").value;\n switch(x) {"

dropdown_menu = "<select id=\"chord_select\" onchange=\"update_table()\">\n <option value=\"0\">Choose a chord"

for (ct in 1:num_chords) {
  temp.df = prev_chords.df[ , c(1, ct+1)]
  temp.df = temp.df[order(-temp.df[, 2]), ]
  
  prev_chord_id = names(temp.df)[2]
  prev_chord_html = html_ids.df$chord_html[which(html_ids.df$chord_id == prev_chord_id)]
  
  js_function = sprintf("%s\n  case \"%s\":", js_function, prev_chord_id)
  dropdown_menu = sprintf("%s\n <option value=\"%s\">%s", dropdown_menu, prev_chord_id, prev_chord_html)
  
  chords_html_line = "\n   var chords_list = ["
  probs_html_line = "\n   var probs = ["
  
  for (ct_prevchord in 1:num_first_chords) {
    this_chord_html = html_ids.df$chord_html[which(html_ids.df$chord_id == temp.df$prev_chord[ct_prevchord])]
    this_chord_id = temp.df$prev_chord[ct_prevchord]
    this_prob = sprintf("%.1f%%", 100*temp.df[ct_prevchord, 2])
    
    if (ct_prevchord == num_first_chords) {
      comma_or_end = "];"
    } else {
      comma_or_end = ", "
    }
    
    chords_html_line = sprintf("%s\"%s\"%s", chords_html_line, this_chord_html, comma_or_end)
    probs_html_line = sprintf("%s\"%s\"%s", probs_html_line, this_prob, comma_or_end)
  }
  
  js_function = sprintf("%s%s%s\n   break;", js_function, chords_html_line, probs_html_line)
  
}

js_function = sprintf("%s\n  default:\n   var chords_list = [\" \"];\n   var probs = [\" \"];\n }", js_function)
js_function = sprintf("%s\n Put JS looping stuff and getElement here\n}", js_function)


dropdown_menu = sprintf("%s\n</select>\n", dropdown_menu)

out_file = file("temp.html")
writeLines(c(js_function, dropdown_menu), out_file)
close(out_file)

Recorded 2014-11-30; written and posted 2014-12-07.


Home > Misc > Piano diaries