Home > Pseph > Senate preference explorer

Custom query documentation

Introduction

This page describes the custom query mini-language of the Senate preference explorer.

A custom query consists of four components, all of which are optional. Here is a slight simplification that is often true (the only exception is N-party-preferred calculations):

Let's consider an example that uses all of these components. Assume that we're working above the line, that we've chosen to show 'Percentages', and that we calculate to a popup table.

Filter:ALP = 1
Rows:groups
Columns:groups
Cell:row = 2 and col = 3

The above query will only consider votes whose first preference was for the Labor Party. The output table will have two axes, with the above-the-line groups on both axes. A vote contributes to a table cell's tally if the second preference is for the group in that row and the third preference is for the group in that column. Here are the results for ACT in 2025:

This table is almost identical to a cross table made from the built-in 'Step forward' table type; in practice, you would probably not use a custom query for this purpose. But it illustrates many aspects of the custom queries and their results:

All keywords and functions in the custom query language are in lower-case, so that there is no conflict with the group or candidate identifiers, which have capital letters. For the latter, you must use the abbreviations as they are shown (case-sensitive) when you click on the 'Abbreviations' button in the main interface. In particular, the appropriate abbreviation for the Coalition varies by state, since sometimes the Liberals and Nationals run a joint ticket LPNP, sometimes the Liberal Party LP is on its own, and the Queensland LNP is a single merged party, like the CLP in the Northern Territory.

In addition to having all groups along an axis (or candidates when below the line), it is possible to define a set of integers (such as preference numbers), or an N-party-preferred axis.

Custom tables can be sent to either a popup window or to the main table and map. My intention, which you do not have to follow, is that tables with many columns will be sent to a popup, while tables with few columns would be sent to the main table, from where you can click on cells and see the results broken down by division and in the map.

A word of warning about large tables

In the background, the main table stores the individual polling place figures for each table cell. For a large table, such as a below-the-line candidates versus candidates for NSW in 2016, when there were 153 candidates, this can lead to a lot of data being stored in memory, especially during the multithreaded calculation. On my computer, the program does quite well when it's starting to run out of RAM – it doesn't crash or freeze the computer, but instead slows the calculation to a crawl, taking maybe a minute instead of a few seconds. But I don't guarantee any behaviour; popup tables are safer in this regard.

Values you have access to

You should think of each expression in the language as applying to a single ballot paper. A group's abreviation (ALP, LP, GRN, etc.) gives the preference number given for that group on the ballot paper being processed. If the group was not preferenced, then the value will be 999. Group abbreviations (or candidate abbreviations below the line) must be identical to how they are written in the 'Abbreviations' table.

The n_prefs identifier gives the number of preferences on the ballot paper. While 'Exh' for Exhaust will appear in tables with a groups axis, you do not refer directly to 'Exh', and instead work with n_prefs.

The constant n_max is the number of groups (or number of candidates when working below the line).

If Rows are defined, then row gives the value for the current entity defined by the row. If Columns are defined, then col gives the value of the entity defined by the column.

When above the line, prefixing a group identifier by id_ gives the position of the group on the ballot paper. For example, if LPNP is group C, then id_LPNP is 3. When below the line, the same applies to id_ followed by the identifier for a candidate; continuing the previous example, if groups A and B each have two candidates, then id_LP_1 would be 5.

Both id_row and id_col are available when the Rows and Columns respectively are groups (above the line) or candidates (below the line). I do not know if id_ is useful.

When working below the line, prefixing a group identifier by nc_ gives the number of candidates in a group. If the Greens are running six candidates, then nc_GRN is equal to 6. When Rows or Columns are set to groups, you have access to nc_row or nc_col.

Expression types

All expressions in the language's syntax either resolve to an integer (such as a preference number) or a boolean (true or false). The Filter and Cell definitions must ultimately be booleans; any non-trivial definition will have integer sub-expressions on which you do comparisons to get true or false. In the earlier example, ALP and 1 are both integers; the equality test means that ALP = 1 returns a boolean on each vote.

The logical operators and and or combine two boolean expressions into one; the not operator flips a boolean from true to false or vice versa.

There is no concept of "truthy" or "falsy" – if a Cell definition resolves to 0 or 1, then there will be an error, because those two quantities are integers, not booleans.

Comparison operators

Two integer expressions x and y may be compared using the following operators (though note that "x" and "y" could never be valid expressions, since you cannot define new variables):

x > yx is greater than y
x >= yx is greater than or equal to y
x < yx is less than y
x <= yx is less than or equal to y
x = yx is equal y
x != yx is not equal to y

You may use <> instead of !=, and consecutive = symbols will be condensed into a single =.

There is also the in operator to test if an integer is within a given range. A range is defined like 2..6, two dots separating the lower and upper bounds (inclusive). An expression could be row in 2..6, for example. The lower and upper bounds must either be integer literals (i.e. typed digits), or the constant n_max – you cannot write row in 1..ALP to test if the row was preferenced ahead of Labor, or is equal to Labor.

Arithmetic operators

Two integer expressions may be added with + or subtracted with -. A Filter definition of ALP + LP = 3 would collect all votes in which the first two preferences were for Labor and Liberal, in either order.

There are no other arithmetic operators available: I do not think it useful to multiply or divide preference numbers.

You cannot define new variables

If you want to work with the sum ALP + LP, then you will have to always write out that sum in full; you cannot store it in a variable and refer to that variable in a later expression.

Functions

The min() function takes one or more integer arguments, and returns the smallest integer of them. The max() function similarly returns the maximum value of its arguments. A Filter of min(ALP, LP) = 1 would collect all votes for the two major parties; a filter of max(ALP, LP) = 999 would collect votes in which at least one major party was not preferenced.

When working below the line, group identifiers and candidate identifiers may be mixed in min() and max(), so that you can write min(ALP_4, GRN, LP_1) and it will return the smallest preference number given to the fourth Labor candidate, any Green candidate, or the first Liberal candidate.

The abs() function returns the absolute value of its integer argument; filtering on abs(ALP - LP) = 1 would collect votes in which Labor was preferenced either immediately after the Liberal Party, or immediately before.

The if() function takes three arguments: a boolean condition, an integer value-if-true, and an integer value-if-false. This function can be useful when combined with an axis definition that takes two (or more) integer values. For example, if the Rows are set to groups and the Columns are set to 0, 1, then a Cell definition of row = 1 and if(ALP = LP, 1, 0) would separate the votes for each group into two categories: those that preferenced at least one major party, and those that didn't. (The only way in which ALP = LP is possible is if both are 999.)

The pi() function takes one integer argument, and returns the "preference index" of that argument, i.e. the position of the group that received that preference number, or 999 if the preference number wasn't given. e.g. if the second preference was given to group D, then pi(2) would be 4. A possible use case (the only one?) is testing for donkey or partial-donkey votes, in which the numbers 1 to 6 are written into consecutive squares along the ballot paper. The following monstrosity of a query separates each group's votes into those that did not have 1-6 consecutive (forwards or backwards), and those that did.

Filter:
Rows:groups
Columns:0, 1
Cell:row = 1 and col = if(abs(pi(2) - pi(1)) = 1 and abs(pi(3) - pi(2)) = 1 and abs(pi(4) - pi(3)) = 1 and abs(pi(5) - pi(4)) = 1 and abs(pi(6) - pi(5)) = 1, 1, 0)

Here are the results for NSW 2025. The ALP drew first position on the ballot paper and 1% of their above-the-line votes were straight donkey. (In Victoria 2025, the Coalition drew first position, and 1.33% of their ATL votes were straight donkey.)


Below-the-line functions

When working below the line, you can still define groups axes, and refer to group identifiers. However, the meaning of such an identifier is ambiguous, because a group will contain more than one candidate, and therefore a group does not have a unique preference number with which you can perform a comparison. (A possible exception to this is the ungrouped candidates, which when working BTL are treated as a group UG if any exist; there may be only one ungrouped candidate. Even here, you will not be allowed to use UG on its own.)

The functions any() and all() can be used to resolve the ambiguity (min() and max() may also be useful; as mentioned above, you can include a group identifier in these functions when working BTL).

Both any() and all() take a single boolean argument, in which exactly one term is an "aggregated identifier" – either a group abbreviation, or row or col if the Rows or Columns are groups. The condition is evaluated for each candidate in the group, with any() returning true if at least one candidate satisfies the condition, and all() returning true if all candidates in the group satisfy the condition.

A cross table of first and second BTL preferences, aggregated to groups, could be defined as follows.

Filter:
Rows:groups
Columns:groups
Cell:any(row = 1) and any(col = 2)

Here are the results for the top five most popular parties BTL in Tasmania in 2016, when Lisa Singh (ALP_6) and Richard Colbeck (LP_5) both ran serious BTL campaigns. Colbeck's voters were much more likely to stay in the Liberal column than Singh's; meanwhile, many Jacqui Lambie voters supported Lambie in particular and not her running mates.


Axis definitions

We have seen axes defined by groups, and axes defined by candidates (below the line) work similarly. The other possibilities are null, numbers, and an N-party-preferred.

Null

You can leave one or both axis definitions empty. Note that if you send a calculation to a popup window, and there are no Columns defined, then a dummy column will be added; this is because I re-used my old cross table code for the popup tables, and I couldn't be bothered handling special cases.

Here is a simple single-column table of ATL primary vote percentages for SA 2025, sent to the main table.

Filter:
Rows:groups
Columns:
Cell:row = 1

Numbers

A numbers axis may be defined by a comma-separated list of integers and/or ranges. One example earlier used numbers as "dummy" placeholders for categories ("preferenced a major party" and "didn't preference a major party"). When non-dummy numbers are used, I expect that common choices for an axis would be 1..n_max to cover all preference numbers, or 1..n_max, 999 to include 999 for not-preferenced as well. The following table shows a distribution of the preferences given to each group in NT 2025.

Filter:
Rows:groups
Columns:1..n_max, 999
Cell:row = col

It is worth studying the row = col Cell definition in case the logic is not immediately obvious. The row is the preference number given to the row group; the col is a number between 1 and the number of groups, which covers all possibilities for a preference that may be written in that group's square, or 999 for not preferenced.

Since all possibilities are covered, all the row Bases are 100%, and we can read, e.g., that 23.21% of above-the-line votes preferenced the Greens at 2, while 18.35% did not preference them at all.

Numbers with an every() definition

If an axis definition is every(), with an expression resolving to an integer as its argument, then the axis will consist of all integers taken by the expression, state-wide. This seemed like a useful idea when I wrote it, but I have not found much need for it. The intention was that, for example, you might be interested in the difference between the Labor and Liberal preference numbers, ALP - LP. The set of possible values taken by this expression varies with the number of groups on the ballot paper, so you would need an axis definition that updates with it, if you want to compare across different states or elections.

In this particular example, many of the numbers will be near -999 or 999, when one or the other major party was not preferenced.

Filter:
Rows:every(ALP - LP)
Columns:
Cell:row = ALP - LP

N-party preferred

N-party-preferred axes work differently from all other types, in that you do not subsequently refer to row or col – the axis definition itself will take care of sending each vote to the appropriate row or column.

The syntax is npp(), with a list of group or candidate (if below the line) identifiers inside the parentheses. Here is an example of showing a 4-party preferred between Labor, LNP, Greens, and One Nation in Queensland 2025, broken down by the number of preferences given on the ballot paper.

Filter:
Rows:1..n_max
Columns:npp(ALP, LNP, GRN, ON)
Cell:row = n_prefs

Note that in the Cell definition, only row is mentioned, not col.

We can see that both the Greens and One Nation do better on votes where more than 6 preferences are given – this is presumably a subset of the electorate that is disproportionately likely to have opinions about various minor and micro parties. The little dip in the Greens' 4PP on 12-preference votes is, I suspect, a real phenomenon, with 12-preference voters including quite a few people who get the above-the-line instructions confused with the below-the-line instructions.

It is possible to have npp() on both axes.

Below the line, you can mix candidates and groups in an npp(), but you can't have a candidate and also the group of that candidate. So you could not run a BTL npp() on Lisa Singh versus rest-of-Labor; you would have to do a 6-candidate-preferred on the candidates individually, and then add up the numbers for the non-Singh candidates yourself.

While the built-in N-party-preferred table type includes a row for the totals as well as the preference flow by group, the custom npp() will only give the totals if the other axis is null.

Conceptual weirdness

Occasionally an output table may be confusing. Here are two cases that surprised me.

Two-party-preferred matrix

I thought that a quick way to get a two-party preferred between every pair of parties would be to run the following query.

Filter:
Rows:groups
Columns:groups
Cell:row < col

The idea is that if the preference given to the row group is less than the preference given to the col group, then the row group is preferred to the col group. However, when I ran this calculation (Vic 2025), I instead saw ridiculously large percentages: the ALP beating the Coalition 81.51-68.19.



The problem was that I was using the default 'Percentages' option. In a custom table, the denominator for a percentage calculation is the row's Base, which consists of every vote that contributed to the row. This base includes every vote in which the party is preferenced! The 81.51% percentage comes from the number of votes in which Labor was preferenced ahead of the Coalition, divided by the number of votes that preferenced Labor at all.

This is a pretty useless calculation. But my idea wasn't completely wrong: I just needed to use the 'Total percentages' option, so that all percentages would be calculated from the total number of formal votes.


This is much better: Labor versus Coalition is 50.12-38.02, Coalition versus Greens is 45.24-39.36, and so on. These numbers could be divided by the Filtered base from the earlier table of 95.05% to get them as percentages of all above-the-line votes.

Mixing pi() and preference numbers

Possible preference numbers written on a ballot paper are 1 to n_max, and the positions of groups on the ballot paper are 1 to n_max. These are both integers, and the query language allows you to mix preferences with preference indices returned by pi(). I am not sure if there is a use case for this, apart from it being a ready source of difficult and uninteresting interpretational puzzles.

I stumbled into the following query, not really knowing what it would do, but figuring that the table (partially shown below in ballot-paper order, WA 2025) might show something.

Filter:
Rows:groups
Columns:
Cell:pi(1) = row

I needed a pen and paper to figure out what my query had asked for. The row is the preference number given to that group, while pi(1) is the position of the group given the first preference. A vote will contribute to the GAP cell if it goes

A complicated calculation, and I didn't even define any columns!


Home > Pseph > Senate preference explorer