Custom query documentation

Introduction

This page describes the custom query mini-language of the Senate preference explorer.

A custom query consists of four components, all of which are optional. Here is a slight simplification that is often true (the only exception is N-party-preferred calculations):

The Filter defines the subset of votes to include in the table.
The Rows defines what the rows mean.
The Columns defines what the columns mean.
The Cell defines a test that is applied for each table cell; a vote counts towards a table cell if it passes the test.

Let's consider an example that uses all of these components. Assume that we're working above the line, that we've chosen to show 'Percentages', and that we calculate to a popup table.

The above query will only consider votes whose first preference was for the Labor Party. The output table will have two axes, with the above-the-line groups on both axes. A vote contributes to a table cell's tally if the second preference is for the group in that row and the third preference is for the group in that column. Here are the results for ACT in 2025:

This table is almost identical to a cross table made from the built-in 'Step forward' table type; in practice, you would probably not use a custom query for this purpose. But it illustrates many aspects of the custom queries and their results:

When you use a group abbreviation, like ALP, its meaning is the preference number given to that group when a vote is being processed.
When you have Rows set to groups, the row identifier (lower-case!) gives the preference number for the group in that row.
When you have Columns set to groups, the col identifier gives the preference number for the group in that column.
A single equals sign = is used to test for equality (but you can use two or more consecutive equals signs if you like).
The and logical operator is used when you require two conditions to be true (or and not are also available; use parentheses around expressions as needed.)
The Filtered base is the percentage of formal votes that pass the filter; here, 27.54% of votes were above-the-line for Labor.
Rows and columns in the output percentages table are not treated symmetrically. All votes that contribute to at least one table cell in a row will be included in the row's Base. The denominator for the Base percentage is the Filtered base: the 59.58% of votes for David Pocock in this table are of the 27.54% of votes that were 1 ALP; multiplying the two precentages together, 16.41% of formal votes went 1 ALP 2 DAVI above the line.
The percentages in the main body of the table are relative to the row Bases: of the votes that went 1 ALP 2 DAVI, 66.83% gave GRN the third preference.
Exhaust ('Exh') is treated as having a preference number equal to one more than the number of preferences given on the ballot paper, or 999 if all squares are numbered. The table shows no votes exhausting at preference 2, even though a small minority (0.37%) of Labor votes stopped at 1. This is because the Cell definition also requires col = 3, and a vote that has a "preference number" of 2 for Exh does not also have a "preference number" of 3 for Exh.

The built-in Step-forward cross table has more intuitive results in this regard: it will include the Exhaust total in the Base column even when there are no non-zero cells in that row. I knew when I wrote the built-in cross table that the exhausting preferences were of potential interest, and so actively included them; with the custom queries, you get exactly what you define, according to the conventions that I have set.

(In the custom queries, you do not directly access Exh; instead, you use n_prefs to get the number of preferences given.)

All keywords and functions in the custom query language are in lower-case, so that there is no conflict with the group or candidate identifiers, which have capital letters. For the latter, you must use the abbreviations as they are shown (case-sensitive) when you click on the 'Abbreviations' button in the main interface. In particular, the appropriate abbreviation for the Coalition varies by state, since sometimes the Liberals and Nationals run a joint ticket LPNP, sometimes the Liberal Party LP is on its own, and the Queensland LNP is a single merged party, like the CLP in the Northern Territory.

In addition to having all groups along an axis (or candidates when below the line), it is possible to define a set of integers (such as preference numbers), or an N-party-preferred axis.

Custom tables can be sent to either a popup window or to the main table and map. My intention, which you do not have to follow, is that tables with many columns will be sent to a popup, while tables with few columns would be sent to the main table, from where you can click on cells and see the results broken down by division and in the map.

A word of warning about large tables

In the background, the main table stores the individual polling place figures for each table cell. For a large table, such as a below-the-line candidates versus candidates for NSW in 2016, when there were 153 candidates, this can lead to a lot of data being stored in memory, especially during the multithreaded calculation. On my computer, the program does quite well when it's starting to run out of RAM – it doesn't crash or freeze the computer, but instead slows the calculation to a crawl, taking maybe a minute instead of a few seconds. But I don't guarantee any behaviour; popup tables are safer in this regard.

Values you have access to

You should think of each expression in the language as applying to a single ballot paper. A group's abreviation (ALP, LP, GRN, etc.) gives the preference number given for that group on the ballot paper being processed. If the group was not preferenced, then the value will be 999. Group abbreviations (or candidate abbreviations below the line) must be identical to how they are written in the 'Abbreviations' table.

The n_prefs identifier gives the number of preferences on the ballot paper. While 'Exh' for Exhaust will appear in tables with a groups axis, you do not refer directly to 'Exh', and instead work with n_prefs.

The constant n_max is the number of groups (or number of candidates when working below the line).

If Rows are defined, then row gives the value for the current entity defined by the row. If Columns are defined, then col gives the value of the entity defined by the column.

When above the line, prefixing a group identifier by id_ gives the position of the group on the ballot paper. For example, if LPNP is group C, then id_LPNP is 3. When below the line, the same applies to id_ followed by the identifier for a candidate; continuing the previous example, if groups A and B each have two candidates, then id_LP_1 would be 5.

Both id_row and id_col are available when the Rows and Columns respectively are groups (above the line) or candidates (below the line). I do not know if id_ is useful.

When working below the line, prefixing a group identifier by nc_ gives the number of candidates in a group. If the Greens are running six candidates, then nc_GRN is equal to 6. When Rows or Columns are set to groups, you have access to nc_row or nc_col.

Expression types

All expressions in the language's syntax either resolve to an integer (such as a preference number) or a boolean (true or false). The Filter and Cell definitions must ultimately be booleans; any non-trivial definition will have integer sub-expressions on which you do comparisons to get true or false. In the earlier example, ALP and 1 are both integers; the equality test means that ALP = 1 returns a boolean on each vote.

The logical operators and and or combine two boolean expressions into one; the not operator flips a boolean from true to false or vice versa.

There is no concept of "truthy" or "falsy" – if a Cell definition resolves to 0 or 1, then there will be an error, because those two quantities are integers, not booleans.

Comparison operators

Two integer expressions x and y may be compared using the following operators (though note that "x" and "y" could never be valid expressions, since you cannot define new variables):

`x > y`	`x` is greater than `y`
`x >= y`	`x` is greater than or equal to `y`
`x < y`	`x` is less than `y`
`x <= y`	`x` is less than or equal to `y`
`x = y`	`x` is equal `y`
`x != y`	`x` is not equal to `y`

You may use <> instead of !=, and consecutive = symbols will be condensed into a single =.

There is also the in operator to test if an integer is within a given range. A range is defined like 2..6, two dots separating the lower and upper bounds (inclusive). An expression could be row in 2..6, for example. The lower and upper bounds must either be integer literals (i.e. typed digits), or the constant n_max – you cannot write row in 1..ALP to test if the row was preferenced ahead of Labor, or is equal to Labor.

Arithmetic operators

Two integer expressions may be added with + or subtracted with -. A Filter definition of ALP + LP = 3 would collect all votes in which the first two preferences were for Labor and Liberal, in either order.

There are no other arithmetic operators available: I do not think it useful to multiply or divide preference numbers.

You cannot define new variables

If you want to work with the sum ALP + LP, then you will have to always write out that sum in full; you cannot store it in a variable and refer to that variable in a later expression.

Functions

The min() function takes one or more integer arguments, and returns the smallest integer of them. The max() function similarly returns the maximum value of its arguments. A Filter of min(ALP, LP) = 1 would collect all votes for the two major parties; a filter of max(ALP, LP) = 999 would collect votes in which at least one major party was not preferenced.

When working below the line, group identifiers and candidate identifiers may be mixed in min() and max(), so that you can write min(ALP_4, GRN, LP_1) and it will return the smallest preference number given to the fourth Labor candidate, any Green candidate, or the first Liberal candidate.

The abs() function returns the absolute value of its integer argument; filtering on abs(ALP - LP) = 1 would collect votes in which Labor was preferenced either immediately after the Liberal Party, or immediately before.

The if() function takes three arguments: a boolean condition, an integer value-if-true, and an integer value-if-false. This function can be useful when combined with an axis definition that takes two (or more) integer values. For example, if the Rows are set to groups and the Columns are set to 0, 1, then a Cell definition of row = 1 and if(ALP = LP, 1, 0) would separate the votes for each group into two categories: those that preferenced at least one major party, and those that didn't. (The only way in which ALP = LP is possible is if both are 999.)

The pi() function takes one integer argument, and returns the "preference index" of that argument, i.e. the position of the group that received that preference number, or 999 if the preference number wasn't given. e.g. if the second preference was given to group D, then pi(2) would be 4. A possible use case (the only one?) is testing for donkey or partial-donkey votes, in which the numbers 1 to 6 are written into consecutive squares along the ballot paper. The following monstrosity of a query separates each group's votes into those that did not have 1-6 consecutive (forwards or backwards), and those that did.

Here are the results for NSW 2025. The ALP drew first position on the ballot paper and 1% of their above-the-line votes were straight donkey. (In Victoria 2025, the Coalition drew first position, and 1.33% of their ATL votes were straight donkey.)

Below-the-line functions

When working below the line, you can still define groups axes, and refer to group identifiers. However, the meaning of such an identifier is ambiguous, because a group will contain more than one candidate, and therefore a group does not have a unique preference number with which you can perform a comparison. (A possible exception to this is the ungrouped candidates, which when working BTL are treated as a group UG if any exist; there may be only one ungrouped candidate. Even here, you will not be allowed to use UG on its own.)

The functions any() and all() can be used to resolve the ambiguity (min() and max() may also be useful; as mentioned above, you can include a group identifier in these functions when working BTL).

Both any() and all() take a single boolean argument, in which exactly one term is an "aggregated identifier" – either a group abbreviation, or row or col if the Rows or Columns are groups. The condition is evaluated for each candidate in the group, with any() returning true if at least one candidate satisfies the condition, and all() returning true if all candidates in the group satisfy the condition.

A cross table of first and second BTL preferences, aggregated to groups, could be defined as follows.

Here are the results for the top five most popular parties BTL in Tasmania in 2016, when Lisa Singh (ALP_6) and Richard Colbeck (LP_5) both ran serious BTL campaigns. Colbeck's voters were much more likely to stay in the Liberal column than Singh's; meanwhile, many Jacqui Lambie voters supported Lambie in particular and not her running mates.

Axis definitions

We have seen axes defined by groups, and axes defined by candidates (below the line) work similarly. The other possibilities are null, numbers, and an N-party-preferred.

Null

You can leave one or both axis definitions empty. Note that if you send a calculation to a popup window, and there are no Columns defined, then a dummy column will be added; this is because I re-used my old cross table code for the popup tables, and I couldn't be bothered handling special cases.

Here is a simple single-column table of ATL primary vote percentages for SA 2025, sent to the main table.

Numbers

A numbers axis may be defined by a comma-separated list of integers and/or ranges. One example earlier used numbers as "dummy" placeholders for categories ("preferenced a major party" and "didn't preference a major party"). When non-dummy numbers are used, I expect that common choices for an axis would be 1..n_max to cover all preference numbers, or 1..n_max, 999 to include 999 for not-preferenced as well. The following table shows a distribution of the preferences given to each group in NT 2025.

It is worth studying the row = col Cell definition in case the logic is not immediately obvious. The row is the preference number given to the row group; the col is a number between 1 and the number of groups, which covers all possibilities for a preference that may be written in that group's square, or 999 for not preferenced.

Since all possibilities are covered, all the row Bases are 100%, and we can read, e.g., that 23.21% of above-the-line votes preferenced the Greens at 2, while 18.35% did not preference them at all.

Numbers with an `every()` definition

If an axis definition is every(), with an expression resolving to an integer as its argument, then the axis will consist of all integers taken by the expression, state-wide. This seemed like a useful idea when I wrote it, but I have not found much need for it. The intention was that, for example, you might be interested in the difference between the Labor and Liberal preference numbers, ALP - LP. The set of possible values taken by this expression varies with the number of groups on the ballot paper, so you would need an axis definition that updates with it, if you want to compare across different states or elections.

In this particular example, many of the numbers will be near -999 or 999, when one or the other major party was not preferenced.

N-party preferred

N-party-preferred axes work differently from all other types, in that you do not subsequently refer to row or col – the axis definition itself will take care of sending each vote to the appropriate row or column.

The syntax is npp(), with a list of group or candidate (if below the line) identifiers inside the parentheses. Here is an example of showing a 4-party preferred between Labor, LNP, Greens, and One Nation in Queensland 2025, broken down by the number of preferences given on the ballot paper.

Note that in the Cell definition, only row is mentioned, not col.

We can see that both the Greens and One Nation do better on votes where more than 6 preferences are given – this is presumably a subset of the electorate that is disproportionately likely to have opinions about various minor and micro parties. The little dip in the Greens' 4PP on 12-preference votes is, I suspect, a real phenomenon, with 12-preference voters including quite a few people who get the above-the-line instructions confused with the below-the-line instructions.

It is possible to have npp() on both axes.

Below the line, you can mix candidates and groups in an npp(), but you can't have a candidate and also the group of that candidate. So you could not run a BTL npp() on Lisa Singh versus rest-of-Labor; you would have to do a 6-candidate-preferred on the candidates individually, and then add up the numbers for the non-Singh candidates yourself.

While the built-in N-party-preferred table type includes a row for the totals as well as the preference flow by group, the custom npp() will only give the totals if the other axis is null.

Conceptual weirdness

Occasionally an output table may be confusing. Here are two cases that surprised me.

Two-party-preferred matrix

I thought that a quick way to get a two-party preferred between every pair of parties would be to run the following query.

The idea is that if the preference given to the row group is less than the preference given to the col group, then the row group is preferred to the col group. However, when I ran this calculation (Vic 2025), I instead saw ridiculously large percentages: the ALP beating the Coalition 81.51-68.19.

The problem was that I was using the default 'Percentages' option. In a custom table, the denominator for a percentage calculation is the row's Base, which consists of every vote that contributed to the row. This base includes every vote in which the party is preferenced! The 81.51% percentage comes from the number of votes in which Labor was preferenced ahead of the Coalition, divided by the number of votes that preferenced Labor at all.

This is a pretty useless calculation. But my idea wasn't completely wrong: I just needed to use the 'Total percentages' option, so that all percentages would be calculated from the total number of formal votes.

This is much better: Labor versus Coalition is 50.12-38.02, Coalition versus Greens is 45.24-39.36, and so on. These numbers could be divided by the Filtered base from the earlier table of 95.05% to get them as percentages of all above-the-line votes.

Mixing `pi()` and preference numbers

Possible preference numbers written on a ballot paper are 1 to n_max, and the positions of groups on the ballot paper are 1 to n_max. These are both integers, and the query language allows you to mix preferences with preference indices returned by pi(). I am not sure if there is a use case for this, apart from it being a ready source of difficult and uninteresting interpretational puzzles.

I stumbled into the following query, not really knowing what it would do, but figuring that the table (partially shown below in ballot-paper order, WA 2025) might show something.

I needed a pen and paper to figure out what my query had asked for. The row is the preference number given to that group, while pi(1) is the position of the group given the first preference. A vote will contribute to the GAP cell if it goes

1 GAP (first preference to first group on the ballot paper, which is GAP), or
1 AUC, 2 GAP (first pref to second group, second pref to GAP), or
1 CEC, 3 GAP (first pref to third group, third pref to GAP), or
1 CYA, 4 GAP (first pref to fourth group, fourth pref to GAP), or
...

A complicated calculation, and I didn't even define any columns!

Filter:	`ALP = 1`
Rows:	`groups`
Columns:	`groups`
Cell:	`row = 2 and col = 3`

Filter:
Rows:	`groups`
Columns:	`0, 1`
Cell:	`row = 1 and col = if(abs(pi(2) - pi(1)) = 1 and abs(pi(3) - pi(2)) = 1 and abs(pi(4) - pi(3)) = 1 and abs(pi(5) - pi(4)) = 1 and abs(pi(6) - pi(5)) = 1, 1, 0)`

Filter:
Rows:	`groups`
Columns:	`groups`
Cell:	`any(row = 1) and any(col = 2)`

Filter:
Rows:	`groups`
Columns:
Cell:	`row = 1`

Filter:
Rows:	`groups`
Columns:	`1..n_max, 999`
Cell:	`row = col`

Filter:
Rows:	`1..n_max`
Columns:	`npp(ALP, LNP, GRN, ON)`
Cell:	`row = n_prefs`