This table is almost identical to a cross table made from the built-in 'Step forward' table type; in practice, you would probably
not use a custom query for this purpose. But it illustrates many aspects of the custom queries and their results:
- When you use a group abbreviation, like
ALP
, its meaning is the preference number given to that group when
a vote is being processed.
- When you have Rows set to
groups
, the row
identifier (lower-case!) gives the preference number for
the group in that row.
- When you have Columns set to
groups
, the col
identifier gives the preference number for the group
in that column.
- A single equals sign
=
is used to test for equality (but you can use two or more consecutive equals signs
if you like).
- The
and
logical operator is used when you require two conditions to be true (or
and
not
are also available; use parentheses around expressions as needed.)
- The Filtered base is the percentage of formal votes that pass the filter; here, 27.54% of votes were above-the-line for
Labor.
- Rows and columns in the output percentages table are not treated symmetrically. All votes that contribute to at least one table
cell in a row will be included in the row's Base. The denominator for the Base percentage is the Filtered base: the
59.58% of votes for David Pocock in this table are of the 27.54% of votes that were 1
ALP
; multiplying the two
precentages together, 16.41% of formal votes went 1 ALP
2 DAVI
above the line.
- The percentages in the main body of the table are relative to the row Bases: of the votes that went 1
ALP
2 DAVI
, 66.83% gave GRN
the third preference.
- Exhaust ('Exh') is treated as having a preference number equal to one more than the number of preferences given on the
ballot paper, or 999 if all squares are numbered. The table shows no votes exhausting at preference 2, even though a small minority
(0.37%) of Labor votes stopped at 1. This is because the Cell definition also requires
col = 3
, and a vote that has a
"preference number" of 2 for Exh does not also have a "preference number" of 3 for Exh.
The built-in Step-forward cross table has more intuitive results in this regard: it will include the Exhaust total in the
Base column even when there are no non-zero cells in that row. I knew when I wrote the built-in cross table that the exhausting
preferences were of potential interest, and so actively included them; with the custom queries, you get exactly what you define,
according to the conventions that I have set.
(In the custom queries, you do not directly access Exh; instead, you use n_prefs
to get the number of preferences
given.)
All keywords and functions in the custom query language are in lower-case, so that there is no conflict with the group
or candidate identifiers, which have capital letters. For the latter, you must use the abbreviations as they are shown
(case-sensitive) when you click on the 'Abbreviations' button in the main interface. In particular, the appropriate abbreviation
for the Coalition varies by state, since sometimes the Liberals and Nationals run a joint ticket LPNP
, sometimes the
Liberal Party LP
is on its own, and the Queensland LNP
is a single merged party, like the CLP
in the Northern Territory.
In addition to having all groups
along an axis (or candidates
when below the line), it is possible
to define a set of integers (such as preference numbers), or an N-party-preferred axis.
Custom tables can be sent to either a popup window or to the main table and map. My intention, which you do not have to follow,
is that tables with many columns will be sent to a popup, while tables with few columns would be sent to the main table, from where
you can click on cells and see the results broken down by division and in the map.
A word of warning about large tables
In the background, the main table stores the individual polling place figures for each table cell. For a large table, such
as a below-the-line candidates
versus candidates
for NSW in 2016, when there were 153 candidates, this
can lead to a lot of data being stored in memory, especially during the multithreaded calculation. On my computer, the program does
quite well when it's starting to run out of RAM – it doesn't crash or freeze the computer, but instead slows the
calculation to a crawl, taking maybe a minute instead of a few seconds. But I don't guarantee any behaviour; popup tables are
safer in this regard.
Values you have access to
You should think of each expression in the language as applying to a single ballot paper. A group's abreviation (ALP
,
LP
, GRN
, etc.) gives the preference number given for that group on the ballot paper being processed. If
the group was not preferenced, then the value will be 999. Group abbreviations (or candidate abbreviations below the line) must
be identical to how they are written in the 'Abbreviations' table.
The n_prefs
identifier gives the number of preferences on the ballot paper. While 'Exh' for Exhaust will appear
in tables with a groups
axis, you do not refer directly to 'Exh', and instead work with n_prefs
.
The constant n_max
is the number of groups (or number of candidates when working below the line).
If Rows are defined, then row
gives the value for the current entity defined by the row. If Columns are defined,
then col
gives the value of the entity defined by the column.
When above the line, prefixing a group identifier by id_
gives the position of the group on the ballot paper. For
example, if LPNP
is group C, then id_LPNP
is 3. When below the line, the same applies to
id_
followed by the identifier for a candidate; continuing the previous example, if groups A and B each have two
candidates, then id_LP_1
would be 5.
Both
id_row
and
id_col
are available when the Rows
and Columns respectively are
groups
(above the line) or
candidates
(below the line). I do not know if
id_
is useful.
When working below the line, prefixing a group identifier by nc_
gives the number of candidates in a group. If
the Greens are running six candidates, then nc_GRN
is equal to 6. When Rows or Columns are set to groups
,
you have access to nc_row
or nc_col
.
Expression types
All expressions in the language's syntax either resolve to an integer (such as a preference number) or a boolean
(true or false). The Filter and Cell definitions must ultimately be booleans; any non-trivial definition will have integer
sub-expressions on which you do comparisons to get true or false. In the earlier example, ALP
and 1
are
both integers; the equality test means that ALP = 1
returns a boolean on each vote.
The logical operators and
and or
combine two boolean expressions into one; the not
operator flips a boolean from true to false or vice versa.
There is no concept of "truthy" or "falsy" – if a Cell definition resolves to 0
or 1
, then there
will be an error, because those two quantities are integers, not booleans.
Comparison operators
Two integer expressions x
and y
may be compared using the following operators (though note that
"x" and "y" could never be valid expressions, since you cannot define new variables):
x > y | x is greater than y |
x >= y | x is greater than or equal to y |
x < y | x is less than y |
x <= y | x is less than or equal to y |
x = y | x is equal y |
x != y | x is not equal to y |
You may use <>
instead of !=
, and consecutive =
symbols will be condensed into
a single =
.
There is also the in
operator to test if an integer is within a given range. A range is defined like
2..6
, two dots separating the lower and upper bounds (inclusive). An expression could be row in 2..6
,
for example. The lower and upper bounds must either be integer literals (i.e. typed digits), or the constant n_max
– you cannot write
row in 1..ALP
to test if the row
was preferenced ahead of Labor, or is equal to Labor.
Arithmetic operators
Two integer expressions may be added with +
or subtracted with -
. A Filter definition of
ALP + LP = 3
would collect all votes in which the first two preferences were for Labor and Liberal, in either order.
There are no other arithmetic operators available: I do not think it useful to multiply or divide preference numbers.
You cannot define new variables
If you want to work with the sum ALP + LP
, then you will have to always write out that sum in full; you cannot
store it in a variable and refer to that variable in a later expression.
Functions
The min()
function takes one or more integer arguments, and returns the smallest integer of them. The
max()
function similarly returns the maximum value of its arguments. A Filter of min(ALP, LP) = 1
would
collect all votes for the two major parties; a filter of max(ALP, LP) = 999
would collect votes in which at least one
major party was not preferenced.
When working below the line, group identifiers and candidate identifiers may be mixed in min()
and
max()
, so that you can write min(ALP_4, GRN, LP_1)
and it will return the smallest preference number
given to the fourth Labor candidate, any Green candidate, or the first Liberal candidate.
The abs()
function returns the absolute value of its integer argument; filtering on abs(ALP - LP) = 1
would
collect votes in which Labor was preferenced either immediately after the Liberal Party, or immediately before.
The if()
function takes three arguments: a boolean condition, an integer value-if-true, and an integer
value-if-false. This function can be useful when combined with an axis definition that takes two (or more) integer values. For
example, if the Rows are set to groups
and the Columns are set to 0, 1
, then a Cell definition of
row = 1 and if(ALP = LP, 1, 0)
would separate the votes for each group into two categories: those that preferenced
at least one major party, and those that didn't. (The only way in which ALP = LP
is possible is if both
are 999.)
The pi()
function takes one integer argument, and returns the "preference index" of that argument, i.e. the position
of the group that received that preference number, or 999 if the preference number wasn't given. e.g. if the second preference was
given to group D, then pi(2)
would be 4. A possible use case (the only one?) is testing for donkey or partial-donkey
votes, in which the numbers 1 to 6 are written into consecutive squares along the ballot paper. The following monstrosity of a query
separates each group's votes into those that did not have 1-6 consecutive (forwards or backwards), and those that did.
Below-the-line functions
When working below the line, you can still define groups
axes, and refer to group identifiers. However, the meaning
of such an identifier is ambiguous, because a group will contain more than one candidate, and therefore a group does not have a
unique preference number with which you can perform a comparison. (A possible exception to this is the ungrouped candidates,
which when working BTL are treated as a group UG
if any exist; there may be only one ungrouped candidate. Even here,
you will not be allowed to use UG
on its own.)
The functions any()
and all()
can be used to resolve the ambiguity (min()
and
max()
may also be useful; as mentioned above, you can include a group identifier in these functions when working BTL).
Both any()
and all()
take a single boolean argument, in which exactly one term is an "aggregated
identifier" – either a group abbreviation, or row
or col
if the Rows or Columns are
groups
. The condition is evaluated for each candidate in the group, with any()
returning true
if at least one candidate satisfies the condition, and all()
returning true if all candidates in the group
satisfy the condition.
A cross table of first and second BTL preferences, aggregated to groups, could be defined as follows.
Axis definitions
We have seen axes defined by groups
, and axes defined by candidates
(below the line) work
similarly. The other possibilities are null, numbers, and an N-party-preferred.
Null
You can leave one or both axis definitions empty. Note that if you send a calculation to a popup window, and there are no Columns
defined, then a dummy column will be added; this is because I re-used my old cross table code for the popup tables, and I couldn't be
bothered handling special cases.
Here is a simple single-column table of ATL primary vote percentages for SA 2025, sent to the main table.
It is worth studying the row = col
Cell definition in case the logic is not immediately obvious. The
row
is the preference number given to the row group; the col
is a number between 1 and the number of
groups, which covers all possibilities for a preference that may be written in that group's square, or 999 for not preferenced.
Since all possibilities are covered, all the row Bases are 100%, and we can read, e.g., that 23.21% of above-the-line votes
preferenced the Greens at 2, while 18.35% did not preference them at all.
Numbers with an every()
definition
If an axis definition is every()
, with an expression resolving to an integer as its argument, then the axis will consist of all integers
taken by the expression, state-wide. This seemed like a useful idea when I wrote it, but I have not found much need for it. The intention was that, for example,
you might be interested in the difference between the Labor and Liberal preference numbers, ALP - LP
. The set of possible values taken by this
expression varies with the number of groups on the ballot paper, so you would need an axis definition that updates with it, if you want to compare across
different states or elections.
In this particular example, many of the numbers will be near -999 or 999, when one or the other major party was not preferenced.
N-party preferred
N-party-preferred axes work differently from all other types, in that you do not subsequently refer to row
or
col
– the axis definition itself will take care of sending each vote to the appropriate row or column.
The syntax is npp()
, with a list of group or candidate (if below the line) identifiers inside the parentheses. Here
is an example of showing a 4-party preferred between Labor, LNP, Greens, and One Nation in Queensland 2025, broken down by the
number of preferences given on the ballot paper.
Note that in the Cell definition, only row
is mentioned, not col
.
We can see that both the Greens and One Nation do better on votes where more than 6 preferences are given – this is
presumably a subset of the electorate that is disproportionately likely to have opinions about various minor and micro parties. The little dip in the Greens'
4PP on 12-preference votes is, I suspect, a real phenomenon, with 12-preference voters including quite a few people who get the above-the-line instructions confused
with the below-the-line instructions.
It is possible to have npp()
on both axes.
Below the line, you can mix candidates and groups in an npp()
, but you can't have a candidate and also the group of that candidate. So you could not run
a BTL npp()
on Lisa Singh versus rest-of-Labor; you would have to do a 6-candidate-preferred on the candidates individually, and then add up the numbers
for the non-Singh candidates yourself.
While the built-in N-party-preferred table type includes a row for the totals as well as the preference flow by group, the custom npp()
will only
give the totals if the other axis is null.
Conceptual weirdness
Occasionally an output table may be confusing. Here are two cases that surprised me.
Two-party-preferred matrix
I thought that a quick way to get a two-party preferred between every pair of parties would be to run the following query.
This is much better: Labor versus Coalition is 50.12-38.02, Coalition versus Greens is 45.24-39.36, and so on. These numbers could be divided by
the Filtered base from the earlier table of 95.05% to get them as percentages of all above-the-line votes.
Mixing pi()
and preference numbers
Possible preference numbers written on a ballot paper are 1 to n_max
, and the positions of groups on the ballot paper are
1 to n_max
. These are both integers, and the query language allows you to mix preferences with preference indices returned by
pi()
. I am not sure if there is a use case for this, apart from it being a ready source of difficult and uninteresting interpretational
puzzles.
I stumbled into the following query, not really knowing what it would do, but figuring that the table (partially shown below in ballot-paper
order, WA 2025) might show something.