This post was written on 2014-01-03, and is based on version 1.1 of the Twitter API.
(Note: It is likely that some of my code below relies on libraries that I haven't named. Here is the full list of libraries I've used, possibly more than what is needed for the code below: httr, rjson, digest, base64enc, stringr, ggplot2, scales, plyr.)
The easy way to study Twitter with R is to use the twitteR package. But I didn't know of the existence of that package when I decided I wanted to download tweets from R, so I built things from something closer to scratch. I relied on the httr package, which is essentially a wrapper for RCurl, which requires libcurl to be installed. Perhaps seeing some of the nuts and bolts of the Twitter authentication process will be interesting for some people; insofar as this post has a target audience, it's people who can write a bit of code but probably haven't done much with web API's before (i.e., people like me).
The Twitter API is divided into several parts. The Streaming API's require a connection to remain open, which I think is a little above my ability and possibly above my Internet bandwidth for the sorts of things I'd want to do with it. I'll walk through two examples: Search, usng app-only authentication; and posting a tweet, building the OAuth procedure from scratch. (Twitter strongly recommends using an already-existing OAuth library for reasons that became obvious during a long day of ironing out bugs.)
"Creating" a Twitter app only takes a click or two. Log in to dev.twitter.com, go to "My Applications", and click "Create a new application". Fill in the name, etc., and you have an app! By default it is set to Read Only; if you want to post tweets (or access direct messages), then you'll need to make the appropriate change on the application's Settings page.
On the application's Details page, there is a "Consumer key" and a "Consumer secret"; these will be used (only once) in the app-only authentication procedure. Posting tweets (or doing lots of other things which I haven't coded) requires an OAuth access token and access token secret; these are generated by clicking the relevant button at the end of the Details page.
Accessing search (and hence generating a corpus of thousands of tweets) only requires app-only authentication, which I found quite straightforward. In schematic form, you send your app's consumer key and secret to Twitter, and Twitter replies with a "bearer token". For all subsequent calls to the API, you only use the bearer token to authenticate. The bearer token doesn't expire and you only need to get a new one if for some reason it gets compromised.
Twitter is pretty strict about the format of calls to its API, but I think the cURL syntax is straightforward and my major difficulty was getting the syntax of httr's POST function correct. All calls to Twitter's API are https, but I had no problems with SSL certificates.
The consumer key and secret are concatenated with a colon separating them, and the resulting string is base-64 encoded.
# This script should only be run once, to get the app-only authentication token
# from Twitter.
library(base64enc)
library(httr)
consumer_key = "[redacted]"
consumer_secret = "[redacted]"
key_secret = sprintf("%s:%s", consumer_key, consumer_secret)
key_secret_enc = base64encode(charToRaw(key_secret))
auth_str = sprintf("Authorization: Basic %s", key_secret_enc)
twtr_dtls = POST(url="https://api.twitter.com/oauth2/token",
config=add_headers(c("Host: api.twitter.com",
"User-Agent: [app_name]",
auth_str,
"Content-Type: application/x-www-form-urlencoded;charset=UTF-8",
"Content-Length: 29",
"Accept-Encoding: gzip")),
body="grant_type=client_credentials")
bearer_token = content(twtr_dtls)$access_token
Once you have the bearer token, you can close the above script from your RStudio window, and simply store the bearer token. I have a file full of various functions used in calling the Twitter API, and I store the bearer token in a function that returns what I need to send in the header of a GET:
app_only_auth = function() {
# Returns the values that need to be sent in a GET
# using application-only authentication.
bearer_token = "[redacted]"
GET_headers = c("Host: api.twitter.com",
"User-Agent: [app_name]",
sprintf("Authorization: Bearer %s", bearer_token),
"Accept-Encoding: gzip")
return(GET_headers)
}
Each application can only make so many calls to an API in a given 15-minute window. In particular, you can make call Search 450 times per window. You can only get a maximum of 100 tweets per Search call, and while I'm not on a particularly fast Internet connection, it's still quick enough so that I can download 40000 tweets in about 7 or 8 minutes. Building up larger datasets therefore requires regular checks of how close we are to being rate-limited (and then sleeping when appropriate). Since Search is the only place I'm ever in danger of being rate-limited, checking the Search rate-limit is the only thing I've coded, but it wouldn't be too hard to generalise by looking at the API documentation. Note that this function calls the app_only_auth() from above.
search_rate_limit = function() {
# Returns the number of Search calls remaining in the current
# window, and the time left in the current window.
GET_headers = app_only_auth()
base_url = "https://api.twitter.com/1.1/application/rate_limit_status.json"
full_url = URLencode(sprintf("%s?resources=search", base_url))
rate_limit_results = GET(url=full_url,
config=add_headers(GET_headers))
results_content = content(rate_limit_results)$resources$search$"/search/tweets"
remaining = results_content$remaining
time_reset = results_content$reset
time_left = time_reset - as.numeric(Sys.time())
return(list(remaining=remaining, reset=time_left))
}
We've now got enough building blocks to call Search and receive some tweets, but I'll first present a couple of functions that I use to process the data returned by the API. The amount of data that you get with the tweets is surprisingly large, given Twitter's famously small character limit. The data is in JSON format; an example of five tweets converted to an R list is here. There's 884 non-blank lines for just 5 tweets! And it'd be even longer if some of the tweets contained links or pictures (an image itself is not part of the data downloaded, but there are URL's given for the image at various sizes).
The first little processing function I'll show replaces any emoji characters with the letter "U" (for Unicode). You might reasonably wonder why I'd pick on emoji characters rather than any of the other character sets at large Unicode values. For reasons totally unclear to me, Twitter sends an emoji character as a six-byte sequence, with the first and fourth bytes both 0xED. Reading the resulting text following the UTF-8 standard will turn it into a pair of three-byte characters, and it may look like the tweet goes over the 140-character limit. I also found that doing a gsub on the string, even one that had nothing to do with the emoji characters, transformed the emoji into extroardinarily long strings of Chinese characters (containing many more bytes than the original string had).
remove_emoji = function(text_str) {
# Function to take a character string and replace emoji with "U".
# Won't work on a vector of character strings.
#
# Emoji end up as bytes in the form
# 0xED 0xnn 0xnn 0xED 0xnn 0xnn
byte_vec = charToRaw(text_str)
ED_bytes = which(byte_vec == 0xED)
num_ED = length(ED_bytes)
if (num_ED > 0) {
bytes_to_remove = numeric()
replacement_bytes = numeric()
for (i in 1:num_ED) {
# We'll be altering some bytes as we go, so re-check that we're at 0xED
if ((byte_vec[ED_bytes[i]] == 0xED) & (byte_vec[ED_bytes[i]+3] == 0xED)) {
replacement_bytes = c(replacement_bytes, ED_bytes[i])
bytes_to_remove = c(bytes_to_remove, (ED_bytes[i]+1):(ED_bytes[i]+5))
# Replace the second 0xED so that we don't then think it's the
# first 0xED in a pair next time through the loop:
byte_vec[ED_bytes[i]+3] = as.raw(0x00)
}
}
byte_vec[replacement_bytes] = as.raw(0x55)
byte_vec = byte_vec[-bytes_to_remove]
new_str = rawToChar(byte_vec)
Encoding(new_str) = "UTF-8"
return(new_str)
} else {
return(text_str)
}
}
The next function takes the raw data (in JSON format, before conversion to an R list) and returns a data frame with the values that I think I'll care about. The main tricky thing here is that many values can be null and these have to be treated carefully or R will pretend they don't exist and then complain that the vectors are of different length. Note that the tweet ID's are given in both 64-bit integer and string form; I use only the strings because R's integers only run to 32 bits.
wanted_tweet_data = function(tweet_results) {
# Function to extract the data I want from the Twitter search.
# Input is the result of the GET to the search API.
search_content = content(tweet_results)
tweets = search_content$statuses
timestamp = sapply(tweets, function(x) x$created_at)
timestamp = gsub(" \\+[^ ]*", "", timestamp)
timestamp = as.POSIXct(timestamp, format="%a %b %d %H:%M:%S %Y", tz="GMT")
rt_id = sapply(tweets, function(x) x$retweeted_status$id_str)
rt_id = sapply(rt_id, function(x) ifelse(is.null(x), "0", x))
username = sapply(tweets, function(x) x$user$screen_name)
id = sapply(tweets, function(x) x$id_str)
status = sapply(tweets, function(x) x$text)
status = unname(sapply(status, remove_emoji))
html_entity_codes = c("&", ">", "<")
html_entity_chars = c("&", ">", "<")
length_entities = length(html_entity_codes)
if (length(html_entity_chars) != length_entities) {
print("Don't have matching HTML entities and characters.")
stop()
}
for (i in 1:length_entities) {
status = gsub(html_entity_codes[i], html_entity_chars[i], status)
}
iso_lang_code = sapply(tweets, function(x) x$metadata$iso_language_code)
tw_source = sapply(tweets, function(x) x$source)
tw_source = gsub("<a href[^>]*>", "", tw_source)
tw_source = gsub("</a>", "", tw_source)
in_reply_to_status_id = sapply(tweets, function(x) x$in_reply_to_status_id_str)
in_reply_to_status_id = sapply(in_reply_to_status_id, function(x) ifelse(is.null(x), "0", x))
in_reply_to_username = sapply(tweets, function(x) x$in_reply_to_screen_name)
in_reply_to_username = sapply(in_reply_to_username, function(x) ifelse(is.null(x), "", x))
user_followers = sapply(tweets, function(x) x$user$followers_count)
user_listed = sapply(tweets, function(x) x$user$listed_count)
user_created_at = sapply(tweets, function(x) x$user$created_at)
user_created_at = gsub(" \\+[^ ]*", "", user_created_at)
user_created_at = as.POSIXct(user_created_at, format="%a %b %d %H:%M:%S %Y", tz="GMT")
user_statuses_count = sapply(tweets, function(x) x$user$statuses_count)
user_lang = sapply(tweets, function(x) x$user$lang)
rt_count = sapply(tweets, function(x) x$rewteet_count)
rt_count = sapply(rt_count, function(x) ifelse(is.null(x), 0, x))
fav_count = sapply(tweets, function(x) x$favorite_count)
fav_count = sapply(fav_count, function(x) ifelse(is.null(x), 0, x))
lang = sapply(tweets, function(x) x$lang)
lang = sapply(lang, function(x) ifelse(is.null(x), 0, x))
# The coordinates field is a funny thing - it's either NULL or
# a structured list. When you naively try to replace the NULL's
# like is done for the other fields above, you get weird behaviour.
# The workaround is to create a list of the right size and fields
# with the default values set, and then copy across the non-NULL's
# into it.
coords_temp = sapply(tweets, function(x) x$coordinates)
coords = rep(list(structure(list(type="Point", coordinates=c(-999,-999)))), length(coords_temp))
got_coords = which(!sapply(coords_temp, is.null))
coords[got_coords] = coords_temp[got_coords]
latitude = sapply(coords, function(x) x$coordinates[2])
longitude = sapply(coords, function(x) x$coordinates[1])
# I think you can only have one media_url, but you can definitely have
# multiple link_url's, so keep them as lists:
media_url = lapply(tweets, function(x) sapply(x$entities$media, function(y) y$expanded_url))
media_tco = lapply(tweets, function(x) sapply(x$entities$media, function(y) y$url))
link_url = lapply(tweets, function(x) sapply(x$entities$urls, function(y) y$expanded_url))
link_tco = lapply(tweets, function(x) sapply(x$entities$urls, function(y) y$url))
results.df = data.frame(timestamp,
tw_source,
id,
rt_id,
username,
in_reply_to_username,
in_reply_to_status_id,
user_followers,
user_listed,
user_created_at,
user_statuses_count,
status,
I(media_tco),
I(media_url),
I(link_tco),
I(link_url),
rt_count,
fav_count,
iso_lang_code,
user_lang,
lang,
latitude,
longitude,
stringsAsFactors=F)
return(results.df)
}
We're now ready to call Search and ask for some tweets. The URL to GET is https://api.twitter.com/1.1/search/tweets.json with the various search inputs appended, along with the GET_headers for the app-only authentication. The details of the search parameters are given on this page.
Most of the parameters are self-explanatory, but the max_id is important. You can only retrieve 100 tweets at a time from Search. These are returned in reverse-chronological order. To retrieve tweets from further back in time, you set the max_id of the second call equal to one less than the last id from the first set of tweets. (Since the id's are stored as strings, I messed around with regex a bit to get a decrement function.) If you just want the n most recent tweets for a particular search, then you don't have to specify the max_id, and you can just let the code automatically set it as it pages through the search results. But if you are interested in tweets at a particular time of a day, then you'll want to find an appropriate tweet id and set that as the max_id. (I usually use @big_ben_clock for this; the tweet id is in the URL of the status of any tweet.)
Note that Search will only retrieve tweets up to about 9 days old, so you won't be able to build an enormous corpus of tweets about Martian geology.
The tweet_search function below also has an argument return_df, by default set to TRUE, so that it makes a data frame using the wanted_tweet_data() function above. If you just want to see the raw results of a search, then call tweet_search with return_df=FALSE.
decrement_id = function(id_str) {
# Reduces an id by 1.
id_str = as.character(id_str)
num_chars = nchar(id_str)
last_digits = substr(id_str, num_chars-1, num_chars)
# The tweet ID is too long for R's integer format, so mess
# around with strings to decrease min_id by 1:
if (last_digits == "00") {
last_nonzero = regexpr("0(?=0*$)", id_str, perl=T)[1] - 1
next_id = gsub("0(?=0*$)", "9", id_str, perl=T)
next_id = sprintf("%s%s%s",
substr(next_id, 1, last_nonzero-1),
as.character(as.numeric(substr(next_id, last_nonzero, last_nonzero)) - 1),
substr(next_id, last_nonzero+1, num_chars))
} else {
new_digits = sprintf("%02d", as.numeric(last_digits) - 1)
next_id = sprintf("%s%s",
substr(id_str, 1, num_chars-2),
new_digits)
}
return(next_id)
}
tweet_search = function(search_terms, result_type="recent", geocode="", max_id="", lang="", n=100, return_df=TRUE) {
# Function to perform Twitter searches and return either a data
# frame of results or the JSON data.
# First check that we won't get rate-limited.
ratelimit = search_rate_limit()
if (ratelimit$remaining < (n/100 + 1)) {
print("May hit rate limit.")
print(sprintf("Try again in %1.0f seconds.", ratelimit$reset))
return(F)
} else {
GET_headers = app_only_auth()
base_url = "https://api.twitter.com/1.1/search/tweets.json?"
search_str = sprintf("q=%s", search_terms)
if (lang != "") {
search_str = sprintf("%s&lang=%s", search_str, lang)
}
if (result_type != "") {
search_str = sprintf("%s&result_type=%s", search_str, result_type)
}
if (geocode != "") {
search_str = sprintf("%s&geocode=%s", search_str, geocode)
}
nmax = min(n, 100)
count_str = sprintf("&count=%d", nmax)
max_id = as.character(max_id)
if (max_id != "") {
max_id_str = sprintf("&max_id=%s", max_id)
} else {
max_id_str = ""
}
full_url = URLencode(sprintf("%s%s%s%s",
base_url,
search_str,
max_id_str,
count_str))
search_results = GET(url=full_url,
config=add_headers(GET_headers))
if (return_df) {
tweets.df = wanted_tweet_data(search_results)
total_results = length(tweets.df$id)
print(sprintf("Found %d tweets", total_results))
} else {
total_results = n
}
if (total_results == 0) {
# Skip the next loop if there are no search results.
total_results = n
}
while (total_results < n) {
# We only want tweets up to and not including the earliest
# tweet so far returned. So we find the id of the earliest
# tweet and set max_id to its value minus 1.
min_id = tweets.df$id[total_results]
max_id = decrement_id(min_id)
max_id_str = sprintf("&max_id=%s", max_id)
nmax = min(100, n - total_results)
count_str = sprintf("&count=%d", nmax)
full_url = URLencode(sprintf("%s%s%s%s",
base_url,
search_str,
count_str,
max_id_str))
search_results = GET(url=full_url,
config=add_headers(GET_headers))
newtweets.df = wanted_tweet_data(search_results)
num_results = length(newtweets.df$id)
if (num_results == 0) {
# Originally had this as: if (num_results < nmax)
# But for some reason sometimes the GET returns
# 99 or 98 results instead of 100.
total_results = n
print("End of results.")
} else {
total_results = total_results + num_results
print(sprintf("Found %d tweets", total_results))
}
tweets.df = rbind(tweets.df, newtweets.df)
}
if (return_df) {
return(tweets.df)
} else {
return(content(search_results))
}
}
}
The tweet_search function should work for searches of up to around n=44900 tweets. Anything much larger will hit the rate limit. So to generate larger datasets, we have to search for somewhere close to that upper bound, sleep for a while, then search again. The following code sources a "twitter_functions.R" file, which just contains all of the above functions. After downloading 400,000 tweets in French, it then plots the distribution of tweets by character length, separating out tweets from the official iPhone or Android apps, and tweets from the web.
library(ggplot2)
library(scales)
library(rjson)
library(plyr)
source("twitter_functions.R")
searches = c("le OR la OR les OR de OR des OR du OR je OR t.co -RT")
search_names = "fr"
language = "fr"
coords = ""
tweets_per_loop = 40000
num_loops = 10
for (loop_ct in 1:num_loops) {
for (i in 1:length(searches)) {
temp_tweet.df = tweet_search(searches[i], lang=language, max_id=max_id, geocode=coords, n=tweets_per_loop)
temp_tweet.df$search = search_names[i]
if (i*loop_ct == 1) {
tweet.df = temp_tweet.df
} else {
tweet.df = rbind(tweet.df, temp_tweet.df)
}
}
N = length(temp_tweet.df$id)
min_id = temp_tweet.df$id[N]
max_id = decrement_id(min_id)
rm(temp_tweet.df)
if (loop_ct < num_loops) {
ratelimit = search_rate_limit()
if (ratelimit$remaining < 450) {
sleep_time = ratelimit$reset + 60
print(sprintf("Sleeping for %d seconds; end of loop %d.", floor(sleep_time), loop_ct))
Sys.sleep(sleep_time)
}
}
}
tweet.df$charlength = nchar(tweet.df$status)
source.df = ddply(tweet.df, "tw_source", summarise, N=length(charlength))
source.df = source.df[order(-source.df$N), ]
sources_to_plot = 3
keep_sources = source.df$tw_source[1:sources_to_plot]
tweet2.df = tweet.df[-which(!(tweet.df$tw_source %in% keep_sources)), ]
# Following assumes Top three sources are "Twitter for (iPhone|Android)" and web.
tweet2.df$tw_sourcetype = "Mobile"
tweet2.df$tw_sourcetype[which(tweet2.df$tw_source == "web")] = "web"
base_plot = ggplot() + theme(axis.title.x = element_text(size=18),
axis.text.x = element_text(size=16),
axis.title.y = element_text(size=18),
axis.text.y = element_text(size=16),
plot.title=element_text(size=18))
chars_freqpoly_plot = base_plot + geom_freqpoly(data=tweet2.df,
aes(x=charlength,
y=..density..,
colour=factor(tw_sourcetype)),
binwidth=1) +
xlim(0, 140.5) + xlab("tweet length (characters)") +
ggtitle("Tweets by character length (French)") +
theme(legend.title = element_blank(),
legend.text = element_text(size=14))
png(filename="charlength_bysource.png", width=600, height=600, units="px")
print(chars_freqpoly_plot)
dev.off()
It works! Tweets from phones are more likely to be short and less likely to be very long. I haven't studied whether this is because more photos are tweeted from mobiles, or how the difference in distribution varies through the day, but it's clear that this procedure basically works, and that any bugs are at least not obvious to me.
Searching Twitter via the API is easy: put your app's bearer token in a header and GET your search URL. Posting a tweet is not easy, at least it isn't easy if you don't use existing libraries. I didn't think it'd be as frustrating as it was, and so while I am now able to post tweets from RStudio, I didn't bother generalising the code so that it can perform functions other than tweeting text. What follows is unlikely to be useful except as a warning to follow Twitter's advice and use an OAuth library.
To post a tweet, you need your app's OAuth token and secret in addition to the consumer key and secret (all of these are available on the application's Details page). But it is not simply a case of putting them all in a header and sending the tweet. Instead there is a very exacting procedure described in these two pages which you have to follow to build the header correctly.
In addition to the consumer key, OAuth token, and timestamp, there's also a signature that you have to generate and include in the header. Very roughly speaking, you concatenate various parameters, percent-encode the string, append it to a concatenation of more parameters, percent-encode it again, then hash it, then base-64 encode it.
Since there is a hashing step, it is vitally important that the string getting hashed is identical to the string that Twitter wants you to be hashing. The URLencode function from R's utils package does not percent-encode the way Twitter wants. For example, URLencode("é") = "%e9", but Twitter expects that "é" will be precent-encoded to "%C3%A9". So, here's a percent-encoding function (which only works for one- or two-byte UTF-8 characters, but I don't care enough to add the couple of lines necessary to make it work more generally):
percent_encode = function(text_str, reserved=TRUE) {
# The URLencode() function from utils encodes, e.g.,
# "é" as "%e9", instead of the UTF-8 "%C3%A9".
# So we need a better percent-encode function.
# This function will encode reserved characters;
# perhaps one day I will used the 'reserved' variable.
unreserved = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-_.~"
# I originally used grepl on the string above, but you'd have to escape
# the various regex characters (question mark, colon, brackets, ...) to
# have it work properly. So instead I used charmatch.
unreserved = strsplit(unreserved, "")[[1]]
encoded_str = ""
num_chars = nchar(text_str)
for (i in 1:num_chars) {
this_char = substr(text_str, i, i)
if (is.na(charmatch(this_char, unreserved))) {
posn = utf8ToInt(iconv(this_char, localeToCharset(), "UTF-8"))
if (posn < 128) {
enc_str = toupper(sprintf("%%%s", as.hexmode(posn)))
encoded_str = sprintf("%s%s", encoded_str, enc_str)
} else {
# Should really handle larger posn's sometime....
byte2 = toupper(as.hexmode(bitwAnd(posn, 0x3F) + 0x80))
byte1 = toupper(as.hexmode(bitwShiftR(posn, 6) + 0xC0))
enc_str = sprintf("%%%s%%%s", byte1, byte2)
encoded_str = sprintf("%s%s", encoded_str, enc_str)
}
} else {
encoded_str = sprintf("%s%s", encoded_str, this_char)
}
}
return(encoded_str)
}
The base-64 encoding function from the base64enc package takes as input a raw vector; the hmac (hashing) function from the digest package outputs a string of hexadecimal characters. We therefore need a function to convert the hex string into a raw vector.
hmac_to_raw = function(hmac_str) {
# The output from hmac is a character string of hexadecimal digits
if (nchar(hmac_str) %% 2 == 1) {
hmac_str = sprintf("0%s", hmac_str)
}
characters = strsplit(hmac_str, "")[[1]]
byte_str = paste0(characters[c(TRUE, FALSE)], characters[c(FALSE, TRUE)])
byte_str = gsub("^(.)", "0x\\1", byte_str, perl=T)
raw_vec = as.raw(byte_str)
return(raw_vec)
}
We also have to generate a nonce. I wrote this before I learned that most people just use the current time:
generate_nonce = function() {
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890"
nonce_length = 40
nonce = ""
for (i in 1:nonce_length) {
j = sample(1:62, size=1)
random_char = substr(alphabet, j, j)
nonce = sprintf("%s%s", nonce, random_char)
}
return(nonce)
}
With those preliminaries out of the way, here's the code to post a tweet (see the earlier links explaining the protocol for a guide); it doesn't check how many characters the tweet has.
post_tweet = function(tweet_text) {
http_method = "POST"
base_url = "https://api.twitter.com/1.1/statuses/update.json"
consumer_secret = "[redacted]"
token_secret = "[redacted]"
status = percent_encode(tweet_text)
oauth_consumer_key = percent_encode("[redacted]")
oauth_nonce = generate_nonce()
oauth_signature_method = "HMAC-SHA1"
oauth_timestamp = as.character(floor(as.numeric(Sys.time())))
oauth_token = percent_encode("[redacted]")
oauth_version = "1.0"
# Generate the signature:
keys = c("status",
"oauth_consumer_key",
"oauth_nonce",
"oauth_signature_method",
"oauth_timestamp",
"oauth_token",
"oauth_version")
values = c(status,
oauth_consumer_key,
oauth_nonce,
oauth_signature_method,
oauth_timestamp,
oauth_token,
oauth_version)
keys.df = data.frame(keys, values)
keys.df = keys.df[order(keys), ]
num_keys = length(keys)
param_str = ""
for (i in 1:num_keys) {
if (i < num_keys) {
amp_str = "&"
} else {
amp_str = ""
}
param_str = sprintf("%s%s=%s%s",
param_str,
keys.df$keys[i],
keys.df$values[i],
amp_str)
}
sig_base_str = sprintf("%s&%s&%s",
http_method,
percent_encode(base_url),
percent_encode(param_str))
signing_key = sprintf("%s&%s",
percent_encode(consumer_secret),
percent_encode(token_secret))
signature_hex_str = hmac(signing_key, sig_base_str, "sha1")
oauth_signature = percent_encode(base64encode(hmac_to_raw(signature_hex_str)))
oauth_str = sprintf("OAuth %s=\"%s\", %s=\"%s\", %s=\"%s\", %s=\"%s\", %s=\"%s\", %s=\"%s\", %s=\"%s\"",
"oauth_consumer_key", oauth_consumer_key,
"oauth_nonce", oauth_nonce,
"oauth_signature", oauth_signature,
"oauth_signature_method", oauth_signature_method,
"oauth_timestamp", oauth_timestamp,
"oauth_token", oauth_token,
"oauth_version", oauth_version)
tweet_headers = sprintf("Authorization: %s", oauth_str)
posted_tweet = POST(url=sprintf("%s?status=%s", base_url, status),
config=add_headers(tweet_headers),
body=sprintf("status=%s", status))
return(posted_tweet)
}
And that works. It would be better to write a more general OAuth tool (i.e., one which could handle the arguments used in various other Twitter API calls), but I don't think it would be worth my time.