Home > Pseph > Australian stats

## Data

This long page describes various aspects of the dataset.

• Results dataset (5.9 MB), including both the base set of data, the estimated data, and the R scripts to reproduce the latter and the JavaScript data file. (Updated 2015-11-27: After a reader pointed out an anomaly, I modified the TPP estimates to handle partial preference distributions, resulting in mostly minor changes.)
• Shapefiles containing division boundaries (usually the two most recent redistributions are Ben Raue's, with me converting to shapefile and tidying up the geometries so that various spatial functions wouldn't throw errors when trying to calculate on them):

### Results

The AEC has digital results online from the 1993 election onwards. The files for 1993-1998 and for 2001 available for download here, and for 2004 till the present they are available from the Results archive page. For results prior to 1993, I primarily used Adam Carr's Psephos archive. I did various checks on this data (totals versus sum of primary votes; vote tallies versus votes transferred during preference distributions, etc.); some preference flow data from the AEC datasets is missing (!), and there are some miscellaneous typos in the Psephos files. I corrected as much as I could, referring to whichever of Hughes and Graham (Voting for the Australian House of Representatives, 1901-1964 or 1965-1984) or the AEC's Election Statistics was most convenient for me. After having worked carefully through many of these (mostly minor) corrections, I learned that the Parliamentary Library has also been working on a digital dataset, with the AEC cross-checking against its own archives and ironing out errors in published results. When this data is made available, I will try to update mine.

The available data has changed over time, with much more detailed vote counting available today than in the past. The short history is as follows:

• 1901*-1917: First-past-the-post voting.
• 1919-1980: Preferential voting, but preferences were only distributed until a candidate had more than 50% of the total vote; where a candidate won more than 50% of the primary vote, we have no preference data for the seat.
• 1983**-1993: Preferences were distributed to completion, and two-party-preferred counts were undertaken for all seats, even in non-classic divisions (for Newcastle 1987, the two-candidate-preferred count was not done but the two-party-preferred count was done, despite an independent finishing second!).
• 1996-present: As above, but also with preference flow data. That is, for each candidate, we have the percentage of votes going to each of the two final candidates in the count (instead of having to estimate this based on the preference distribution, which mixes up the votes from the various excluded candidates).

*In 1901, South Australia used bloc voting, and Tasmania used Hare-Clark.

**The 1983 election was first counted in the same way as 1919-1980; in 1984, before the ballot papers were destroyed, the AEC conducted a full distribution of preferences, with the results published in General election of members of the House of Representatives, 5 March 1983: result of full distribution of preferences. I thank Mumble (Peter Brent) for sharing his spreadsheet of the 1983 TPP results, which motivated me to track down the full results booklet.

For the years where preferences were not distributed to completion, it would still be nice to have approximate figures for the preference flows and two-party-preferred. There are several different sets of estimates of the TPP; in the spirit of xkcd.com/927, and not necessarily in the spirit of Colin Hughes* (Australian Two-Party Preferred Votes, 1949-82, downloadable after a fashion from ADA), I have added my own, detailed below.

*In the introduction to his book of TPP tables (counted and estimated), Hughes writes, entirely reasonably, "it may matter less whether we say 47.2 percent or 47.5 percent than we all agree to say the same thing and then get on to saying something of greater substance." I figured that with all the power of modern computing behind me, it would be a shame if I didn't make some effort at writing a script to generate such estimates. As I describe below, I don't think I've necessarily improved the results, but it was worth a try.

For candidate names I rely on the Wikipedia candidate pages, and with only occasional exceptions, I have not checked the spellings.

### Preference flow estimates

My two-party-preferred estimates are based first on estimating the preference flows for each party. In three-candidate contests where the third candidates preferences were distributed, the preference flow is known exactly. For full preference distributions involving more than three candidates, I estimate the preference flows in the simplest logical way possible: if some portion of minor candidate A's votes are distributed to minor candidate B, then that portion of A's preferences are assumed to flow in the same proportions as the rest of the votes in B's pile. This gives reasonable results, but of course won't always be correct: to take a modern example, PUP voters who preference the Greens ahead of either major party are unlikely to preference Labor ahead of the Coalition at the same rate as people who vote 1 Green.

For 1996-2013, we can compare the preference flows estimated by this method to the counted flows. The scatter plot of estimated versus true preference flows for each non-major-party candidate in 2013 is typical:

The correlations are good enough for not to give up on the exercise: for the elections 1996-2013, they are respectively 0.88, 0.91, 0.93, 0.93, 0.92, 0.95, 0.92. It looks like there is a slight bias in the results, with particularly strong preference flows being under-estimated – a smooth curve through the scatters would be slightly S-shaped rather than straight.

For comparison, a simpler method to estimate TPP flows would be to just use the preferences that went straight to Labor or Coalition. i.e., if 20% of Candidate A's preferences go to Candidate B, 60% go to the Coalition, and 20% go to Labor, then we could estimate the TPP flow to Labor as 20% / (20% + 60%) = 25%. This method gives slightly poorer correlations: 0.86, 0.90, 0.91, 0.90, 0.89, 0.94, 0.92.

### Two-party-preferred estimates

The above procedure for the preference flow estimates covers all the cases in the graphs or maps where the preference flows are plotted. But for two-party-preferred estimates in seats where preferences were not distributed, we need to guess the preference flow from all of the non-major-party candidates contesting the seat. (Here I'm assuming that there are candidates from both major parties.) I guess these preference flows with some "rough-and-ready" stats that might betray my lack of serious statistical training, and which shouldn't be treated as magic, but which I think does OK anyway.

First, I create a time series – one number for each election – starting with mean known or estimated preference flows for each party. (The mean is calculated as the simple average of the flows for an election, i.e., it is not weighted by number of votes for the party in each seat. This is probably the wrong thing to do.) There may not be many observations in these figures, since preferences were only occasionally distributed, so I regress each flow towards 50% according to some eyeballed/fudged parameters: assume a Bayesian prior of mean 50% and standard deviation 15 percentage points, and an measurement of flow $$f$$ from $$N$$ observations with uncertainty $$12 / \sqrt{N}$$. Then,

\begin{equation*} f_{\text{regressed}} = \frac{50/15^2 + fN/12^2}{1/15^2 + N/12^2}. \end{equation*}

Through this time series I then calculate a smoothed loess curve: the idea is that the true preference flow for the party may vary over time, but should do so fairly gradually. And with such little hard data to draw on for each election, I think it's useful to aggregate across multiple elections somehow. The loess smoothing parameter $$\alpha$$ is set to 0.5 , and each data point is weighted by the number of seats that went into the measurement.

With a preference flow figure now decided for each party at each election, these percentages are applied to candidates whose preferences contributed to the hypothetical TPP. Independents are assumed to split 50-50 (it is tempting to treat independents as any other party and hence allow their guessed flows to differ from 50%, but I worry that the sample of independents whose preferences got distributed would be biased somehow), except for some special cases where I gave for this purpose a temporary party affiliation to a candidate (e.g, if a candidate had recently represented the Liberal Party, and I knew about this – probably after seeing some anomalies in Hughes's TPP estimates – then the candidate was coded as Ind Lib for preference purposes).

(Edit 2015-11-27: In the original version, I applied these estimated preference flows to the primary votes, even if there had been a partial preference distribution. I now use the latest count in which both a Labor and a Coalition candidate were present. This gives mostly minor changes, but fixes at least one anomaly where Labor won the seat but I had estimated them at less than 50% of the TPP – this anomaly was pointed out to me by a reader. The benchmarking numbers below have been updated, with the original page here.)

We have exact TPP results from 1983 onwards to benchmark this procedure. It's not obvious whether it should work better or worse on recent elections. On the one hand, there are many more candidates and parties, so the preference flow estimates are based on many "mixed" piles of votes which started with different parties. On the other hand, there are a lot more candidates who need preferences to get to 50% of the vote.

Since most of the TPP vote comes from the primary votes for Coalition and Labor, even a silly TPP estimation technique will correlate well with the true values. In the following tables I compare the the true-versus-estimated correlations, mean error by seat, and mean absolute error by seat, both for the estimates that I used and for a benchmark of assuming all seats have preferences splitting 50-50.

EstimatesBenchmark
Yearρerror|error|ρerror|error|
19830.99620.030.760.98890.281.15
19840.99660.190.680.97970.831.43
19870.99470.110.800.96961.041.99
19900.9935-0.331.100.9761-0.622.03
19930.9977-0.240.700.9867-0.241.45
19960.99730.571.000.9956-0.471.03
19980.9961-0.211.050.9895-0.931.86
20010.98680.031.130.9765-1.132.07
20040.9949-0.260.970.9824-1.442.09
20070.9942-0.280.860.9828-1.822.19
20100.99510.180.820.9838-2.202.73
20130.99580.520.840.9864-1.882.06

So, where I estimate a seat's TPP, it doesn't look like there's much bias to either side, and the typical error is about a percentage point.

Another question of interest is how the other existing TPP estimates fare against true values. For this, we have only the 1983 election to use as a test*: both Malcolm Mackerras and Adam Carr have their own estimates made after the election but before the full distribution of preferences in 1984. Of course, I have the advantage of making my estimates with the true values already known; I can only assure you that I didn't try to tweak the parameters in my code to get it to closely match the 1983 results, and in any case Mackerras's estimate errors had a smaller spread than mine.

*Joan Rydon compared Mackerras's state- and national-level TPP estimates with the subsequent full results in 'Two-party preferred': The analysis of voting figures under preferential voting, Politics, 21:2, 68-74 (1986).

Estimateρerror|error|
DB0.99620.030.76
Psephos0.99700.520.77
Mackerras0.99780.330.62

I have the consolation of having more unbiased TPP estimates than Psephos or Mackerras, but I wouldn't read too much into that – in three out of the next eleven elections, my estimates were biased at least as much as Mackerras's were in 1983.

(Mackerras's estimates were published in Double Dissolution Election, March 5, 1983: Statistical Analysis.)

Still, it was a fun exercise to try, and there's scope for some improvement, should anyone want yet another set of TPP estimates. In particular, I think that the strong preference flows could be modelled better (following the S curve of the scatter plots mentioned earlier), and I also think that, at least in the case of the DLP, it would be better to separate preference flows by state (or rather, separate by Victoria and rest-of-Australia). DLP preferences in Victoria flowed more strongly to the Coalition than they did in other states, and by using the national averages, I think I've under-estimated the Coalition's share of the national TPP by a smidgen (and the Victorian TPP by quite a bigger smidgen) throughout the DLP's strongest years. Finally, I ignored donkey votes, which severely distort the preference flows of small parties, and which should be accounted for both when estimating average preference flows from a party, and when applying them to a given seat.

There remains the question of what to do with seats not contested by both major parties – an occurrence that was once quite frequent (Labor didn't contest Wimmera for any election between 1914 and 1937 inclusive). In the case of one non-TPP election with TPP elections immediately before and after, I apply the state TPP swings forwards and backwards and average the two figures to generated my guessed TPP. For longer sequences of non-TPP elections, I just chain the swings together. The results might be somewhat fanciful, and they are excluded from scatter plots, but it is useful to have these seats' TPP's guessed so that state and national totals are more reflective of what they would have been if all voters had had the choice between Labor and Coalition.

A page of TPP tables is here.

### Party affiliations

I spent a while checking the party affiliations of candidates. Many of these are obscure and of little interest. I have one section on issues with the Victorian Country Party, followed by some miscellaneous notes. Party affiliations in the early years are sometimes unclear. References to "Hawker" are to Politicians All: The Candidates for the Australian Commonwealth Election 1901: A Collective Biography.

#### Victorian Country Party, 1934-43

I'm writing this section not because I think it's an original contribution to Australian political history, but rather in the hope that it helps provoke a better organisation of the Wikipedia entries and tables on this subject. See also this talk page comment by Frickeg, which gives an overview in the form of a chronological set of Trove references mostly about Alex Wilson, and which I borrow from below. I don't claim that these notes are a properly complete summary – in particular, it will be worth looking up the Labor-UCP relations in the state parliament – but hopefully they'll help.

I'll quote from Ulrich Ellis, A History of the Australian Country Party [1]. From p204, on the 1934 election:

Any hope of an electoral agreement in Victoria was destroyed by a fresh conflict in the state Country Party organization. The central council required all candidates, state as well as federal—to sign a pledge. This committed signatories to stand down from contests if not endorsed; to refrain from voting in parliament against majority decisions of the party caucus even on matters outside official party policy; and to refuse to support a composite government without the approval of the Victorian organization. All sitting federal members (T. Paterson, Q. C. Hill, W. G. Gibson, H. McClelland and Senator R. D. Elliott) refused to conform. Hill announced his retirement and the Echuca seat became a battle-ground. Two candidates supporting the federal faction were nominated. A Labor candidate entered the contest. The Victorian organization entered the lists with a young candidate named John McEwen who had signed the pledge.

The newspapers at the time distinguished Australian Country Party (ACP; Coalitionist) candidates from United Country Party (UCP; the Victorian anti-Coalitionist) candidates, and the Wikipedia tables replicate the affiliations given in the Argus's results tables [2]. But the line between the two camps is often blurry to me as I read the old newspapers (which is perhaps not surprising, for what was essentially an internal party struggle). As an example of this, I present below some news report excerpts concerning the campaign in Echuca, which was contested by three Country Party candidates (and one ALP candidate) – McEwen (UCP), Stewart (ACP in the Argus's results tables) and Moss (also ACP in the tables). This designation of ACP is consistent with the Ellis excerpt above and also with this report on 7 August [3]:

Both Mr Moss and Mr Stewart stated that they would not sign the pledge, but that they were quite prepared to sign the party platform and loyally support its programme.

But (13 August) [4]:

The Echuca branch of the United Country party has agreed to support the following candidates for the Echuca seat:—Messrs. J. McEwen, W. Moss, and Galloway Stewart. It was decided to abide by the decision of the Shepparton conference that members should exercise their preferences according to their discretion.

On the other hand (30 August) [5]:

He (Mr Stewart) was not endorsed by the Country Party because he would not subscribe to its new nomination form. He was not prepared, in the event of being elected, to do what he was told to do by the majority of Country Party members in the House. He must remain free and unshackled to carry out the wishes of the electors.

Very clear ACP/UCP lines appear to be drawn when Earle Page turned up (1 September) [6]:

Dr. Page on his arrival was disturbed to learn that he had been advertised as speaking on behalf of Mr. Galloway Stewart. He made it clear that he spoke on behalf of the two Australian Country Party candidates, and that he urged voters to give first preference to either Mr. Moss or Mr. Stewart, their third preference to Mr. McEwen, Victorian Country party candidate, with Labor last.

But just to throw a spanner in the works, a letter from some Stewart supporters on 14 September says that McEwen and Moss swapped preference recommendations [7]:

We notice by the official "tickets" issued by the other two country party candidates that Mr Galloway Stewart has been relegated to third preference.

And indeed, Moss's preferences split 2:1 in favour of McEwen over Stewart, with McEwen then easily defeating Stewart on Labor preferences.

From p207 of [1]:

When the central council devised a pledge to be signed by all candidates, federal and state, the federal members revolted and challenged its legality and soundness. The party's federal rules merely provided that the party might not 'form an alliance with any other political organization which does not preserve intact the entity of the Australian Country Party Association'. This was the situation existing when nominations were called for the federal elections. The Victorian party nominated its own candidates in a number of seats but only one, John McEwen, succeeded. Upon his election he immediately associated himself with the federal party and incurred the hostility of his Victorian colleagues for urging that the breach be healed.

Still, overall I'm happy enough with the ACP and UCP designations as given in the Argus's results tables, and as currently in the Wikipedia tables. Generally in my dataset, I've tried to designate party affiliations based on their campaigns, and not by their actions in the subsequent parliament (where relevant).

At least superficially, tensions in the party in the leadup to the 1937 election seem (in my reading) lower than in 1934. Only in Wimmera was there more than one CP candidate, with the sitting member (and Coalitionist) Hugh McClelland losing the UCP pre-selection to Alex Wilson [8]:

Despite the result of the ballot, Mr. McClelland announced last night that he would contest the seat as an unendorsed Country party candidate.

...

Mr. Wilson is a member of the central council of the party, and he is claimed as a strong opponent of composite Ministries. Mr. McClelland is a supporter of the Federal Composite Ministry.

Later from the same article:

The Minister for the Interior (Mr. Paterson) and Mr. McEwen, the other retiring Country party candidates in Victoria, have been endorsed by the Victorian central council for Gippsland and Indi respectively. Mr. Paterson and Mr. McEwen are supporters of the Federal composite Ministry.

The Wimmera contest was certainly split along Coalitionist v Anti-coalitionist lines [9]:

Mr. McClelland has received strong assistance from the leader of the Federal County party (Dr. Page), and Mr. Wilson has been assisted by two State Ministers—Mr. Bussau and Mr. Old.

The Argus results tables [10] refer to all of the endorsed Country candidates as UCP. Despite the Wimmera contest, the split doesn't appear as official or official-ish as in 1934 (and 1940, below), and so I have left all candidates in my dataset as "CP" with McClelland "Ind CP".

Of the four elected CP members from Victoria, two were from the anti-Coalitionist side of the party. Ellis writes (p220 of [1]):

A representative of the Victorian Country Party, Alexander Wilson, unseated the sitting member (Hugh McClelland) in Wimmera. As the loss of the Indi seat in 1928 sealed the fate of the Bruce-Page government, so the loss of Wimmera assisted a few years later to defeat a government. Wilson remained aloof from the federal party but G. H. Rankin, the Chief President of the Victorian organization, who won the Bendigo seat from the United Australia Party, incurred the wrath of his colleagues by joining the federal parliamentary party immediately.

(Rankin's subsequent backdown, mentioned in his ADB entry [11], occurred in May 1939 [12], as he ceased meeting with the federal parliamentary Country Party.

Any superficial truce between the two Victorian factions certainly ended soon after the election. John McEwen accepted a position as Minister for the Interior, and the UCP expelled him from the party [13]:

"In view of Mr. McEwen's failure to observe the rules of the Victorian United Country party in the acceptance of a portfolio in the Lyons composite Government and his lack of loyalty to endorsed Parliamentary candidates of the party at the recent Federal election, this central council decides to cancel his membership of the Victorian United Country party."

At the 1938 party conference, Thomas Paterson (Coalitionist) resigned from the UCP in protest, with a hundred others leaving the conference with him [14]. He formed the Liberal Country Party [15], which ended up standing two candidates in the 1940 federal election (Paterson and McEwen themselves).

Meanwhile, Alex Wilson followed the anti-Coalitionist principles of his faction of the UCP. Paterson said of him [16]:

...the attitude of Mr Wilson, M.H.R. for Wimmera, sitting in isolation, refusing to associate himself with those who should be his colleagues, generally voting with the Labour Party against his colleagues and weakening the effectiveness of the Party in that way.

June 1939 [17]:

The secretary of the party (Mr. D. R. Downey) announced yesterday that Mr. A. Wilson, the sitting member, was the only applicant for endorsement for the Wimmera electorate.

Wilson did nevertheless face Country Party opposition in 1940, in the form of Hugh McClelland, whom Wilson had defeated in 1937 and who ran as Ind CP. With a Labour candidate and an independent also nominating, a flavour of the allegiances can be gleaned from this Argus report [18] in the week before the election:

[T]he closest observers admit that it is impossible at this stage to predict whether Mr. Alex Wilson (U.C.P.) will be re-elected, or Mr. McClelland (Ind. C.P.), or Mr. M. M. Nolan (Lab.) will displace him.

...

Nomination of a Labour candidate at this election must take many votes from Mr. Wilson, who, however, probably commands more U.C.P. support than when he displaced Mr. McClelland three years ago.

The election probably will be decided by third position in the primary count. If Labour fills that position Mr. Wilson is almost certain of re-election. If Mr. McClelland is third his votes probably will carry Mr. Wilson in, but if Mr. Wilson is placed third the Labour candidate may draw enough support to win. Because of the splitting of the U.C.P. vote the Labour candidate may lead in primaries.

In the event, Wilson won a fairly commanding 44% of the primary vote, and with over 80% of Labour's preferences, he defeated McClelland 66-34 on two-candidate-preferred. The result of the election was a hung parliament. Ellis writes (p257 of [1]):

Two independents held the balance of power if they chose to use it—A. Wilson (a member of the previous parliament) and A. W. Coles who, as an independent expressing sympathy with the United Australia Party, had captured the seat that had been Sir Henry Gullett's.

The designation of "independent" certainly describes Wilson's actions in parliament, which most notably included crossing the floor to bring down the Fadden government. And in an article about the coming merger of the LCP and UCP, there is mention that [19]

Later, Mr. Wilson, who has consistently supported the Federal Labor Government and whose attitude encouraged Mr Curtin to make his successful bid for office was also 'carpeted' by a capricious central council for daring to label himself an independent.

In April 1943, a union of the Victorian Country parties was close [20]:

In his speech to delegates, Mr McEwen emphasised that the proposal was to form a new party representing country interests. He expressed the view that it would not now be difficult to bridge the gulf between the two parties, particularly as the UCP had reversed its previous policy preventing its Federal parliamentary representatives from other States....

...

"We have seen Major General Rankin and Mr Wilson instructed not to attend meetings of the Australian Country party. We have seen that instruction revoked and these two members authorised to attend ACP meetings. Mr Rankin has continuously attended for at least two years."

Nevertheless, Wilson remained committed to independence in the parliament and also committed to the UCP, where he still commanded some support. ("He can rightly be termed a modern Abraham Lincoln," said one member of the UCP Central Council [21]). He retained Wimmera with over 60% of the primary vote, not opposed by any endorsed CP candidates. The Argus results tables [22] designate him "CP" along with all other endorsed Country candidates; given his unusual position, I have called him "UCP" for the 1943 election in my dataset.

(He quit parliament in 1945, mercifully ending my struggles in deciding how to label Victorian Country Party candidates.)

[1] Ulrich Ellis, A History of the Australian Country Party, MUP (1963).

#### 1901

Lang

Mitchell, Ind. I could find little about this candidate; usually no party affiliation given by SMH [1]; the Muswellbrook Chronicle in their results [2] call him Protectionist. H&G say Ind Prot; Hawker says Prot; I have called him Ind.

New England

Simpson, Ind. Hawker describes him as the "second" freetrader, but I have called him Ind following [1], where he says that "he would give, if elected, the Barton Government a fair trial".

Wannon

Cussen, Ind Prot. "If elected he would give his support to Mr Barton" [1]

#### 1903

Capricornia

Ryan, Ind Prot.

Northern Melbourne

Painter, Ind Prot. 'Protectionist "up to the hilt"'

#### 1906

Batman

Painter, Ind. Perhaps should be Ind Prot again?

Batman

Vernon, Ind Prot.

#### 1910

Oxley

Dent, Ind. The Courier said that "Mr. Dent is standing in the democratic interest", correcting an article in which they called him Labour.

Bass

Storrer, Ind Lib. Apparently Storrer was opposed to the Liberal Fusion, but he had plenty of official Liberal support, and I've called him Ind Lib rather than Ind Prot.

#### 1913

Henty

Hewison, Ind Lib. A Liberal who withdrew after arbitration, leaving Boyd as the endorsed Liberal

#### 1914

Gippsland

Wise, Ind Lib. Called himself a Liberal; was opposed to the Fusion. Apparently he often voted with Labor, but I've called him Ind Lib.

#### 1919

Brisbane

Boland, Ind. "[H]e had emerged... as an independent in search of some congenial and honest party. He regretted that sincerity could not be found in any of the organisations with which he had been associated." [1] Another Boland ran as a state candidate, apparently as a Nationalist [2].

#### 1922

West Sydney

Bryde, Prot Lab. Protestant Labour.

Henty

Francis, Nat. Sometimes called Ind Nat [1]; sometimes just Nat [2]. I've followed H&G and called him Nat, with hesitation.

Kooyong

Best, Nat. Usually referred to as Ind Nat by the papers, but I've left him as Nat, on the grounds that no other Nationalist candidate ran against him.

Northern Territory

Love, NTRL. Northern Territory Representation League

Northern Territory

Nelson, ALP. According to ADB, he ran as an independent with union support, and joined the Labor Party after the election. Following Psephos I've called him ALP.

#### 1925

Calare

Southwick, Ind. H&G say Ind Nat, but I prefer Ind [1]. "The people must get rid of both parties, and get down to solid work."

#### 1928

Gippsland

Wise, Ind Lib. Independent Liberal.

Flinders

Robertson, Ind. I found one reference to him as Ind Nat [1], but generally in what little there was of him in the papers, he was just described as an independent (e.g., [2])

#### 1940

East Sydney

Phillips, Atok. Phillips called himself an "Atokist" [1] and this was good enough for the SMH in its results tables [2], albeit with scare quotes.

Wannon

Crawford, Ind CP.

Yarra

Gibson, Soc. Gibson was a communist; usually referred to as an independent during the campaign [1], but was designated Soc in the results tables [2]

#### 1943

Northern Territory

Murray, Ind Lab. Murray did not have endorsement of the federal party, and called himself an independent Labor candidate

#### 1946

Newcastle

Ellis, Service. The Service Party was a distinct entity from the Services Party, though I think its only candidate was Ellis in Newcastle.

Northern Territory

Wallman, Ind Lab. Endorsed by (some branches of?) NT Labor contra federal party

#### 1949

Wide Bay

McDowell, Ind Lab. McDowell called himself "Democratic Labour"; I've coded this as Ind Lab.

#### 1954

Warringah

White, Ind Lib. White was an unendorsed Liberal.

McPherson

Green, Ind CP.

#### 1975

Werriwa

Keep, Ind. Canberra Times and SMH tables say "HOPP", but don't say what that might stand for; official Election Statistics has blank.

Kingston

Oakley, Ind. Canberra Times and SMH tables say Workers Party, but the official Election Statistics and the Parliamentary Handbook say independent. Oakley later stood as a Progress candidate (the re-named Workers Party); I edited Wikipedia to say WP, but ended up leaving her as an independent in my dataset.

### Maps

I digitised the maps until the recent redistributions covered by the Tally Room myself, working either from Commonwealth of Australia, 1901-1988, electoral redistributions ("the AEC book") or the official redistribution maps. The maps have been heavily simplified for fast loading on the web; the shapefiles available for download above are not so heavily simplified but are riddled with various hopefully minor errors. I had never tried digitising a map before this project, and I expect frequent errors of 10+km in regional areas when working off the AEC book (on one occasion, when joining up a map of Sydney surrounds to the rest of New South Wales, there was a difference of 0.2 degrees between my two georeferencing attempts; I'd like to think that I fixed that isolated mishap, but I can't claim too much confidence either in fixing it well or in calling it isolated). Georeferencing in many outer metro areas is also likely poor, as the printed maps run out of control points for me to use. (The pages in that book are enormous, so I just took photos of them rather than scanning, which probably didn't help.)

I had particular trouble with the Victorian maps, in some cases because I didn't know what I was doing and in some cases because the AEC's maps were drawn incorrectly. For the 1949 redistribution, I used the Argus election supplement of 7 December to help interpret a supposed division labelled Fitzroy and to locate the unlabelled (and very subtly drawn) La Trobe. The AEC book's 1989 redistribution contains at least one clear error of several hundred metres: part of the boundary between Aston and La Trobe runs along the railway in the image below, but the map has it drawn separately.

Antony Green has hand-drawn quite a few maps of various electorates (working, I presume, off the actual descriptions), and put these on his electorate profile pages (e.g., Deakin). I could have lifted Antony's shapefiles to get most of the metro areas accurate to the street in the years that Antony covers, but instead I used them only as an occasional guide where I was totally lost, and as a benchmark to check the quality of my georeferencing. Here is a picture of Deakin 1989 (me: red; Antony: sky blue):

Where I used the (large!) official redistribution maps, I generally expect better accuracy, with metro areas often being accurate to the street. Here is a comparison against Antony for the division of Banks for the redistributions of 1984 (AEC book) and 2000 (large map):

I hope that errors in rural areas are almost always less than 5km for the redistributions from 1992 onwards, and usually no more than 2-3km. I did encounter some anomalies, either in my understanding, in the maps, or in the roads shapefile from Geoscience Australia that I used to define many control points. In the NSW redistribution of 1992, the Sydney surrounds map has co-ordinate lines, which look to me like easting and northing lines. But when I use them as control points, I get systematic errors of several hundred metres relative to the road and railway intersections. My recollection is that I semi-randomly compromised between my systematically differing control points, so treat the boundaries with appropriate caution (any such errors will propagate to the 2000 redistribution, wherever the boundary was not changed).

#### Circle maps

Geographic Australian electoral maps are dominated by the large electorates in rural areas, and it is useful to show the results instead with equal-area shapes. In the UK, there are some nice hexagon maps (example) which expand the size of London and other major cities, and which still make it easy to see roughly where each constituency is geographically. For Australia's electorates of wildly unequal size, the best equal-area representation I've seen is that of Nick Evershed and Gabriel Dance at The Guardian. The idea is to draw a circle of constant size at the centroid of each electorate, and then to allow overlapping circles to move apart. Evershed and Dance's Javascript implements a lightning-fast algorithm to move the circles apart from each other on the fly; I decided instead to slowly pre-calculate the equal-area circle centre locations. (There are a handful of electorate circles in the Guardian's map which I think are in the wrong place. That might be an excessive nitpick, and others might not like my circle locations.)

I did my calculations in R with Rcpp. It was the first time I'd used Rcpp, so I'm quite happy with it, as I always am when I get a new programming thing working. The idea is: in R, load the (geographic) shapefile and compute the co-ordinates of the centroids; then pass the centroids to an Rcpp function. The centroid locations are then evolved as though they are mutually-repulsive point particles with movement being heavily damped so as to move the centroids as little as possible. To further keep movement to a minimum, the centroids don't move when they are sufficiently distant from all other centroids.

I present both the R code, which uses various spatial libraries, and the Rcpp code. The R code won't run as-is unless the various shapefiles are already in the same relative folders as on my computer, but any interested readers should be able to pick out what they need.

# Script to make equal-area shapefiles for the electorates, following the idea here:
# http://www.theguardian.com/world/datablog/2013/sep/06/better-election-results-map

library(sp)
library(rgeos)
library(maptools)
library(rgdal)
library(Rcpp)

sourceCpp("equal_area.cpp")

tau = 2*pi

# In pixels:
polygon_height = 10

# Number of sides:
polygon_sides = 20

# Alphabetical order!
states = c("act", "nsw", "nt", "qld", "sa", "tas", "vic", "wa")
start_year = 1901
end_year = 2013

divisions_count = numeric()

lon_lat_to_web_mercator = function(lon, lat, zoom) {
# Not quite the Google version, which flips the y-component
lon = lon * tau/360
lat = lat * tau/360

x = (lon + tau/2) * 2^zoom * 256 / tau
y = (log(tan(tau/8 + lat/2)) - tau/2) * 2^zoom * 256 / tau
return (data.frame(x, y, row.names=NULL))
}

web_mercator_to_lon_lat = function(x, y, zoom) {
# Not quite the Google version, which flips the y-component
lon = (x*tau/(256 * 2^zoom) - tau/2)*360/tau
lat = (2*atan(exp(tau/2 + y*tau/(256*2^zoom))) - tau/4)*360/tau
return (data.frame(lon, lat, row.names=NULL))
}

make_regular_polygon = function(x, y, n, r)  {
theta = seq(0, -tau, length.out=n+1)
theta[n+1] = 0
x_out = x + r*cos(theta)
y_out = y + r*sin(theta)

return(cbind(x_out, y_out))
}

phi = tau/4 * (1 - 2/polygon_sides)
polygon_side_length = polygon_height / tan(phi)

# Get every redistribution date: states' redistribution dates as
# vectors in a list, and a vector for any change across the country.
redist_dates = list()
all_redists = numeric()

for (i in 1:length(states)) {
state = states[i]
redists = list.files(path=state, pattern="[0-9][0-9]\\.shp")

redists = as.numeric(gsub("[^0-9]", "", redists))
redist_dates[[i]] = redists
all_redists = c(all_redists, redists)
}

all_redists = unique(all_redists)
all_redists = all_redists[order(all_redists)]

first_redist = all_redists[max(which(all_redists <= start_year))]
redists_to_process = c(first_redist, all_redists[which((all_redists > start_year) & (all_redists <= end_year))])

for (year in redists_to_process) {
print(year)
divisions = character()
x = numeric()
y = numeric()
centroids = data.frame(x, y)

national_dir = sprintf("national/%d", year)
dir.create(national_dir)

for (i in 1:length(states)) {
state = states[i]
skip_state = 0
state_redists = redist_dates[[i]]
possible_redists = state_redists[state_redists <= year]
if (length(possible_redists) == 0) {
skip_state = 1
} else {
this_year = max(possible_redists)
}

if (skip_state == 0) {
in_shp_file = sprintf("%s/%s_%d.shp", state, state, this_year)

# Copy files to the national directory:
shp_name = sprintf("%s_%d\\.", state, this_year)
shp_files = list.files(path=state, pattern=shp_name)

for (j in 1:length(shp_files)) {
in_file = sprintf("%s/%s", state, shp_files[j])
out_file = sprintf("%s/%s", national_dir, shp_files[j])
file.copy(in_file, out_file, overwrite=TRUE)
}

this_divisions = as.character(this_shp@data$Division) this_centroids = as.data.frame(gCentroid(this_shp, byid=TRUE)) centroids = rbind(centroids, this_centroids) divisions = c(divisions, this_divisions) } } out_shp_file = sprintf("equal_area/aus_%d_equal", year) out_json_file = sprintf("%s.geojson", out_shp_file) out_pts_json_file = sprintf("%s_pts.geojson", out_shp_file) out_pts_json_layer = sprintf("%s_pts", out_shp_file) # writeOGR doesn't like overwriting geojson files, so delete it instead: if (file.exists(out_json_file)) { file.remove(out_json_file) } if (file.exists(out_pts_json_file)) { file.remove(out_pts_json_file) } centroids_merc = lon_lat_to_web_mercator(centroids$x, centroids$y, 4) # uncluster(x, y, damping_coeff, radius, time-step, max_time) # I've got the max_time set pretty high here, but it should only take t ~ 30. new_points = uncluster(centroids_merc$x, centroids_merc$y, 15, radius, 1e-2, 1000) new_points_lonlat = web_mercator_to_lon_lat(new_points[1:length(divisions), 1], new_points[1:length(divisions), 2], 4) new_points_xy = data.frame(x=new_points_lonlat$lon, y=new_points_lonlat\$lat)

poly_list = list()

for (ct in 1:length(divisions)) {
this_hexagon_merc = make_regular_polygon(new_points[ct, 1], new_points[ct, 2], polygon_sides, radius/2)
this_hexagon_lonlat = as.matrix(web_mercator_to_lon_lat(this_hexagon_merc[, 1], this_hexagon_merc[, 2], 4))
poly_list[[ct]] = this_hexagon_lonlat
}

poly_sp = SpatialPolygons(mapply(function(poly, id) {
Polygons(list(Polygon(poly)), ID=id)
}, poly_list, divisions))

proj4string(poly_sp) = CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")

poly.df = SpatialPolygonsDataFrame(poly_sp, data.frame(Division=divisions, row.names=divisions))
points.df = SpatialPointsDataFrame(new_points_xy, data.frame(Division=divisions, row.names=divisions))

# writeOGR's behaviour when writing GeoJSON files appears to be very dependent on
# the gdal version installed on the system.  The check_exists=FALSE is a workaround
# based on https://trac.osgeo.org/gdal/ticket/5908 when using the current (as at
# time of writing) version of rgdal with gdal 1.11.

writeOGR(poly.df, ".", layer=out_shp_file, driver="ESRI Shapefile", overwrite_layer=TRUE)
writeOGR(poly.df, out_json_file, layer=out_shp_file, driver="GeoJSON", check_exists=FALSE)
writeOGR(points.df, out_pts_json_file, layer=out_pts_json_layer, driver="GeoJSON", check_exists=FALSE)

divisions_count = c(divisions_count, length(divisions))
}

And now the Rcpp:

/* Input to uncluster is a vector of x's and a vector of y's,
along with some parameters of the dynamics and integration.
The program then treats these as point particles which mutually
repel one another according to a potential function, with
motion damped.  The idea is that the final set of points will
be separated enough so that they can be used as locations for
equal-area shapes on a Google Map. */

#include <Rcpp.h>
using namespace Rcpp;

double dist(double x1, double y1, double x2, double y2) {
return sqrt((x1-x2)*(x1-x2) + (y1-y2)*(y1-y2));
}

double potential(double d, double r) {
// d = dist, r = range of potential function, measured in pixels.
// r should be ~4 maybe.

double V;
double k = 5.0;

if (d < r) {
V = k*(d - 1.05*r)*(d - 1.05*r);
} else {
V = 0;
}
return V;
}

NumericVector calc_der(NumericVector r, double mu, double range) {
// Calculates the components of the dr/dt vector.
// First two entries are co-ordinates, next two are velocity components.

long i, j, idx_i2, idx_i3, idx_j0, idx_j1, idx_j2, idx_j3;
long n = r.size();
long num_pts = n/4;
NumericVector drdt(n);

// Needless initialisation?
for (j = 0; j < n; j++) {
drdt[j] = 0.0;
}

double r_ix, r_iy, r_jx, r_jy, d, fx, fy;

for (j = 0; j < num_pts; j++) {
r_jx = r[4*j+0];
r_jy = r[4*j+1];

idx_j0 = 4*j + 0;
idx_j1 = 4*j + 1;
idx_j2 = 4*j + 2;
idx_j3 = 4*j + 3;

// Potential terms:
for (i = j+1; i < num_pts; i++) {
r_ix = r[4*i+0];
r_iy = r[4*i+1];

idx_i2 = 4*i + 2;
idx_i3 = 4*i + 3;

d = dist(r_ix, r_iy, r_jx, r_jy);

fx = (r_jx - r_ix) * potential(d, range) / d;
fy = (r_jy - r_iy) * potential(d, range) / d;

drdt[idx_j2] += fx;
drdt[idx_j3] += fy;

drdt[idx_i2] -= fx;
drdt[idx_i3] -= fy;
}
}

for (j = 0; j < num_pts; j++) {
idx_j0 = 4*j + 0;
idx_j1 = 4*j + 1;
idx_j2 = 4*j + 2;
idx_j3 = 4*j + 3;

if ((drdt[idx_j2] == 0) && (drdt[idx_j3] == 0)) {
// No potential terms: keep the point stationary.
drdt[idx_j0] = 0.0;
drdt[idx_j1] = 0.0;
} else {
// Definition r-dot = v:
drdt[idx_j0] = r[idx_j2];
drdt[idx_j1] = r[idx_j3];

// Damping proportional to v:
drdt[idx_j2] += -1.0 * mu * r[idx_j2];
drdt[idx_j3] += -1.0 * mu * r[idx_j3];
}
}

return drdt;
}

// [[Rcpp::export]]
NumericMatrix uncluster(NumericVector x, NumericVector y, double mu, double range, double h, double t_final) {
// x and y are vectors containing the pixel locations to be unclustered.
// h is time-step

long num_pts = x.size();
long n = 4*num_pts;

long i, j, k;
double r_jx, r_jy, r_kx, r_ky;
int cleared;
double d, min_dist;

double t = 0;
long t_steps = floor(t_final / h);

NumericMatrix xy(num_pts+1, 2);

// Vectors to hold everything:
NumericVector r(n);
NumericVector k1(n);
NumericVector k2(n);
NumericVector k3(n);
NumericVector k4(n);
NumericVector vec_aux(n);

for (j = 0; j < num_pts; j++) {
// Position:
r[j*4+0] = x[j];
r[j*4+1] = y[j];
// Velocity:
r[j*4+2] = 0.0;
r[j*4+3] = 0.0;
}

// Not sure if I need this initialisation:
for (j = 0; j < n; j++) {
k1[j] = 0.0;
k2[j] = 0.0;
k3[j] = 0.0;
k4[j] = 0.0;
vec_aux[j] = 0.0;
}

for (i = 0; i < t_steps; i++) {
// RK4:
k1 = calc_der(r, mu, range);

for(j = 0; j < n; j++) {
vec_aux[j] = r[j] + 0.5*h*k1[j];
}
k2 = calc_der(vec_aux, mu, range);

for(j = 0; j < n; j++) {
vec_aux[j] = r[j] + 0.5*h*k2[j];
}
k3 = calc_der(vec_aux, mu, range);

for(j = 0; j < n; j++) {
vec_aux[j] = r[j] + h*k3[j];
}
k4 = calc_der(vec_aux, mu, range);

for(j = 0; j < n; j++) {
r[j] = r[j] + h*(k1[j] + 2.0*k2[j] + 2.0*k3[j] + k4[j]) / 6.0;
}

t += h;

// See if we've unclustered:
cleared = 1;
for (j = 0; j < num_pts; j++) {
for (k = j+1; k < num_pts; k++) {
r_jx = r[4*j+0];
r_jy = r[4*j+1];
r_kx = r[4*k+0];
r_ky = r[4*k+1];

if (dist(r_jx, r_jy, r_kx, r_ky) < range) {
cleared = 0;
break;
}
}

if (cleared == 0) {
break;
}
}

if (cleared == 1) {
break;
}

if (i % 1000 == 0) {
Rprintf("t = %.1f\n", t);
}
}

for (i = 0; i < num_pts; i++) {
xy(i, 0) = r[4*i+0];
xy(i, 1) = r[4*i+1];
}

// Find the shortest distance between two centroids,
// for sending back to R.
min_dist = 100.0;
for (j = 0; j < num_pts; j++) {
for (k = j+1; k < num_pts; k++) {
r_jx = r[4*j+0];
r_jy = r[4*j+1];
r_kx = r[4*k+0];
r_ky = r[4*k+1];

d = dist(r_jx, r_jy, r_kx, r_ky);

if (d < min_dist) {
min_dist = d;
}
}
}

xy(num_pts, 1) = t;
xy(num_pts, 0) = min_dist;

return xy;
}


Posted 2015-09-03,
updated 2015-11-27,
updated 2016-10-06.

Home > Pseph > Australian stats